Title: VTune Results; cRenderWorker::RayRecursion; VolumetricShader vs. RayMarching Post by: mancoast on July 31, 2016, 09:43:29 PM Greetings,
I am curious about the Volumetric Shader. It appears that cLights::GetLight is causing some sort of bottleneck in execution. Please consider the data below. This first screenshot shows a tree view summary of CPU instruction usages by functions. 99.5% of total execution cycles are spent within the scope of cRenderWorker::doWork. For this particular hotspots view in Vtune, percentages are shown in the GUI. (http://i.imgur.com/nIgFlwj.png) We also know that 99.4% of time is spent executing cRenderWorker::RayRecursion. One level deeper inside RayRecursion, we see 24.4% of time allocated to cRenderWorker::RayMarching. Also inside RayRecursion, we see 68.2% of time allocated to cRenderWorker::VolumetricShader. (http://i.imgur.com/no1JRvb.png) With source code view, see 24.4% of time allocated to cRenderWorker::RayMarching. (http://i.imgur.com/mbSLhg4.png) With source code view, see 68.2% of time allocated to cRenderWorker::VolumetricShader. (http://i.imgur.com/xNR6WM9.png) This leads me to believe that we are spending much time in the VolumetricShader. Within cRenderWorker::VolumetricShader there are two loops retiring billions of instructions to self. (http://i.imgur.com/AmvEW6J.png) The loop at line 349 of VolumetricShader contains function call to cLights::GetLight. (http://i.imgur.com/IEHzXq7.png) Also, the loop at line 375 of VolumetricShader contains function call to cLights::GetLight. (http://i.imgur.com/d9TMGy9.png) It appears that over 50% of all CPU instructions are retired by function calls to cLights::GetLight. (http://i.imgur.com/SyMNJ3p.png) The loop at line 349 of VolumetricShader contains function call to cLights::GetLight. This specific call to cLights::GetLight retires approximately 25% of all CPU instructions. (http://i.imgur.com/Pqpbvlc.png) Also, the loop at line 375 of VolumetricShader contains function call to cLights::GetLight. This other call to cLights::GetLight also retires approximately 25% of all CPU instructions. (http://i.imgur.com/RScReG3.png) Why these calls to cLights::GetLight are consuming so many CPU cycles? Any suggestions for optimization? Thanks, coast Title: Re: VTune Results; cRenderWorker::RayRecursion; VolumetricShader vs. RayMarching Post by: Buddhi on July 31, 2016, 10:43:00 PM Could you attach settings which you used for testing?
Title: Re: VTune Results; cRenderWorker::RayRecursion; VolumetricShader vs. RayMarching Post by: mancoast on July 31, 2016, 10:59:03 PM 6mb too big to attach, heres the link
https://github.com/mancoast/mandelbulber2/raw/k1om/_menger-coastn_anim.fract (https://github.com/mancoast/mandelbulber2/raw/k1om/_menger-coastn_anim.fract) Title: Re: VTune Results; cRenderWorker::RayRecursion; VolumetricShader vs. RayMarching Post by: taurus on August 01, 2016, 11:21:44 AM 6mb too big to attach, heres the link https://github.com/mancoast/mandelbulber2/raw/k1om/_menger-coastn_anim.fract (https://github.com/mancoast/mandelbulber2/raw/k1om/_menger-coastn_anim.fract) Little noob question inbetween. Did you ever render this animation? Depending on fps it should be around 20 minutes. I wonder wether all the stuff below [frames] was necessary to point out, what you mean. Title: Re: VTune Results; cRenderWorker::RayRecursion; VolumetricShader vs. RayMarching Post by: Buddhi on August 01, 2016, 09:40:21 PM 6mb too big to attach, heres the link https://github.com/mancoast/mandelbulber2/raw/k1om/_menger-coastn_anim.fract (https://github.com/mancoast/mandelbulber2/raw/k1om/_menger-coastn_anim.fract) It's a HUGE test case! Have you rendered whole animation to do benchmarking or only one frame. I'm asking because I would like to do similar profiling using valgrind and then start to look why cLights::GetLight() was highlighted here. Title: Re: VTune Results; cRenderWorker::RayRecursion; VolumetricShader vs. RayMarching Post by: mancoast on August 01, 2016, 11:42:46 PM Hello,
This render is not yet completed. For VTune I render a randomly selected frame. This keeps the results fresh with different samples. After Vtune/changes, I run overnight on the servers to get frame to frame time differences. As of now, its about 50%. I am excited to test your latest commit with the isAnyLight modification. Thanks, coast Title: Re: VTune Results; cRenderWorker::RayRecursion; VolumetricShader vs. RayMarching Post by: Buddhi on August 02, 2016, 11:26:17 PM Thanks to your enormous animation I have found and fixed another bug. The program allocated too much memory when flight animation was loaded and previews were used. No it uses about 20% of former memory usage. I have also improved speed of refreshing animation table. |