Logo by Trifox - Contribute your own Logo!

END OF AN ERA, FRACTALFORUMS.COM IS CONTINUED ON FRACTALFORUMS.ORG

it was a great time but no longer maintainable by c.Kleinhuis contact him for any data retrieval,
thanks and see you perhaps in 10 years again

this forum will stay online for reference
News: Check out the originating "3d Mandelbulb" thread here
 
*
Welcome, Guest. Please login or register. April 20, 2024, 07:30:24 AM


Login with username, password and session length


The All New FractalForums is now in Public Beta Testing! Visit FractalForums.org and check it out!


Pages: [1]   Go Down
  Print  
Share this topic on DiggShare this topic on FacebookShare this topic on GoogleShare this topic on RedditShare this topic on StumbleUponShare this topic on Twitter
Author Topic: VTune Results; cRenderWorker::RayMarching -> OpenMP or Threading Building Blocks  (Read 2712 times)
Description: we are spending much time in spin lock with threads > 100
0 Members and 1 Guest are viewing this topic.
mancoast
Alien
***
Posts: 21


« on: July 27, 2016, 10:05:26 PM »

Hello,

Please review these VTune results.
In my opinion, it appears that the application is reaching 100% usage, but with a few deadly mutex locks.
The Ray Marcher is implemented using QT threads.  There is room for optimization, but I am not spun up on the codebase.
Please consider starting a discussion enumerating all the requirements for OpenMP or TBB.
I'd be more than happy to take on this devop, but I require direction and specifics.

Thanks,
coast




















Logged
Buddhi
Fractal Iambus
***
Posts: 895



WWW
« Reply #1 on: July 28, 2016, 10:51:25 PM »

Thanks for this analyse. It's very interesting. It looks like Mandelbulber uses threads in very efficient way. It's difficult to find here fields for significant improvement (correct me if I'm wrong, because you wrote that you can see room for optimization. Where?).
However I can't understand some of numbers in this report. Why Random() function shows here about 6000s of CPU time when at the same time there is no Compute() function in this report, which the most CPU consuming?
What is CPI rate?
By the way I don't know this tools, so my interpretation of data could be wrong.

About OpenMP, it's already used for Depth Of Field calculation and updating of image preview (scaling of image with interpolation). It's difficult to find another places in program where it could be used, because in that places I use QThreads, which are much more efficient. If you see some place to use this kind of optimization, please let me know.
In the future I'm going to implement GPU support like for Mandelbulber 1.21

Logged

mancoast
Alien
***
Posts: 21


« Reply #2 on: July 30, 2016, 03:26:32 AM »

Hello buddhi,

the report shows much time used for lock and unlock.
I am thinking this probably means there is a mutex somewhere in the raymarching algo.

This report is from the 240 threads of Xeon phi coprocessor

Thanks,
Coast
Logged
Buddhi
Fractal Iambus
***
Posts: 895



WWW
« Reply #3 on: July 30, 2016, 09:30:20 AM »

Only place where I used mutex intentionally is cScheduler class
https://github.com/buddhi1980/mandelbulber2/blob/master/mandelbulber2/src/scheduler.cpp

In cScheduler::NextLine() function and cScheduler::UpdateDoneLines() I use QMutex, because there is one common scheduler for all rendering threads. Mutex is needed in that parts which are responsible for decision which line should be rendered next.

In this report there is showed Random() function which uses rand() function taken from c library. Is this mean that rand() uses any mutex locks?

Logged

Buddhi
Fractal Iambus
***
Posts: 895



WWW
« Reply #4 on: July 30, 2016, 09:45:37 AM »

I have checked how much this Random() function slows down rendering. It's 10% !!! Now I see in the glibc source code, that there are used __libc_lock_lock () function. I need to use my own simple rand() function, which don't need to be accurate and thread safe.
Now I see real benefits from your investigation. Thanks a lot!
Logged

mancoast
Alien
***
Posts: 21


« Reply #5 on: July 31, 2016, 02:29:56 AM »

Hello again buddhi,

I'm very happy that the report is useful.

still curious if openmp or tbb is possible for ray marching algo.
perhaps with the trend toward higher core count, these frameworks offer additional value.
openmp or tbb offers future proof highly parallel foundation.

Thanks,
coast
Logged
quaz0r
Fractal Molossus
**
Posts: 652



« Reply #6 on: July 31, 2016, 04:34:31 AM »

openmp is a kludgy hack.  if you are interested in next-level parallelism and future-proofing why not go all the way and take a look at coroutines or a nice hpc framework built on coroutines like hpx
Logged
Buddhi
Fractal Iambus
***
Posts: 895



WWW
« Reply #7 on: July 31, 2016, 10:02:12 AM »

openmp is a kludgy hack.  if you are interested in next-level parallelism and future-proofing why not go all the way and take a look at coroutines or a nice hpc framework built on coroutines like hpx

@quaz0r, I agree with you. OpenMP is mostly for lazy programmers which want get effect of paralelizm just by adding one line into the code. OpenMP in many cases is not efficient and has many limitations. One the biggest limitation for me is OpenMP doesn't work over the network. My scheduling algorithm allows to share tasks between many computers in very efficient way.

@mancoast, I was a lazy programmer in one piece of the code. It's DOF algorithm. This is the place where I use OpenMP. It doesn't utilize all CPU cores as I want. It reaches not more than 70% of CPU load. This is the place where you can look and do better implementation of paralelizm.

 
Logged

mancoast
Alien
***
Posts: 21


« Reply #8 on: July 31, 2016, 02:57:32 PM »

Thanks for the pointers.
i will investigate the DOF algorithm and see what I can find.
Logged
Pages: [1]   Go Down
  Print  
 
Jump to:  


Powered by MySQL Powered by PHP Powered by SMF 1.1.21 | SMF © 2015, Simple Machines

Valid XHTML 1.0! Valid CSS! Dilber MC Theme by HarzeM
Page created in 0.18 seconds with 25 queries. (Pretty URLs adds 0.013s, 2q)