David Makin
|
|
« on: March 29, 2012, 12:17:30 PM » |
|
Hi all, if using DE (or the "any direction" deltaDE such as Buddhi's method) I just realised that a massive speed-up should be possible by changing the render/stepping algorithms to share distance information across adjacent rays - not something I'd normally do as it's not possible using the traditional method in UF.
Basically if we start with say the centre ray and the distance calculated to step is d then for all rays adjacent to this we can move the start positions forward to the point where the length of a line from the start point on the centre ray meeting the adjacent ray/s is d. This applies *on every step* provided the old position on each adjacent ray is also within the radius d for the next step - only when the old position on a ray is not within the bounds of the new step on the central ray must we stop moving forward on that adjacent ray and store the final position found as the start point for that ray.
Has anyone else suggested this or tried it ?
(Note that it's still possible with a little thought to ensure number of steps and similar info remain (reasonably) intact for pseudo-lighting effects etc.)
|
|
« Last Edit: March 29, 2012, 12:20:02 PM by David Makin »
|
Logged
|
|
|
|
hobold
Fractal Bachius
Posts: 573
|
|
« Reply #1 on: March 29, 2012, 04:02:36 PM » |
|
Yes, this has been suggested before. It was implemented in "Gaston", a realtime renderer for quaternion Julia sets on Macs. It did speed up the scalar code path by a factor of two, but the vectorized code path (i.e. SIMD program using what Apple called "Velocity Engine") did not gain much.
It is true that every single DE computation results in information about a solid ball of space, and a bundle of nearby view rays can be intersected with that ball. This makes the most of the precious DE information that we laboriously computed, but it also introduces data dependencies between adjacent view rays. These dependencies are obstacles to the massive brute force parallelism of a GPU (or any other wide SIMD machine).
I think this optimization has much more potential than just a factor of two. But it would require a few more good ideas to spread the DE samples such that the spheres don't overleap too much, but still cover the view nicely. When most rays have approached the surface closely, one should probably switch back to the usual brute force stepping of each individual ray.
|
|
|
Logged
|
|
|
|
Syntopia
|
|
« Reply #2 on: March 29, 2012, 07:29:37 PM » |
|
There is also this old thread: http://www.fractalforums.com/mandelbulb-implementation/major-raymarching-optimization/I also considered this in Fragmentarium. On a GPU it would be possible to render the DE intersection distances in a lower resolution float buffer, and use these distances as starting points for higher resolution buffers. You could even do this at multiple scales (hierarchically). In the end I decided against it, since the secondary rays (shadows, reflections, occlusions) cannot be accelerated this way. However, since the shadows and occlusions typically are low-frequency these might be calculated at lower resolution. But it don't know if the programming effort is worth it.
|
|
|
Logged
|
|
|
|
|
David Makin
|
|
« Reply #4 on: March 29, 2012, 10:26:55 PM » |
|
I had thought of the intermediate float destination method for GPU but I've never actually tried rendering to even a float colour buffer so I didn't mention it As to the optimisation itself a key thing to note is that as the render resolution increases the extent of optimisation increases in proportion with the increase in *pixel area* not the linear increase in magnification i.e. doubling the resolution should theoretically quadruple the increase in speed (relative to rendering in the usual way).
|
|
|
Logged
|
|
|
|
David Makin
|
|
« Reply #5 on: March 29, 2012, 10:31:10 PM » |
|
> When most rays have approached the surface closely, one should probably switch back to the usual brute force stepping of each individual ray. That threshold will be directly related to the variation in accuracy in the DE as the surface is approached and the ray-density (i.e. pixel resolution), the higher the pixel resolution the closer to the surface you can go before the optimisation is no longer optimum
|
|
« Last Edit: March 29, 2012, 11:24:49 PM by David Makin »
|
Logged
|
|
|
|
David Makin
|
|
« Reply #6 on: March 29, 2012, 11:33:34 PM » |
|
Question - on some video cards is it possible to use a vec4 source texture and vec4 destination ? If so then I need a card like that and I'll get back into proper coding again sharpish Of course it's possible mine (ATI Radeon HD 5870) does it, I still haven\'t really looked into shaders/GLSL/CUDA etc. at least not on PCs (that's PCs in the general sense)
|
|
|
Logged
|
|
|
|
Jesse
Download Section
Fractal Schemer
Posts: 1013
|
|
« Reply #7 on: March 30, 2012, 08:57:50 PM » |
|
In the end I decided against it, since the secondary rays (shadows, reflections, occlusions) cannot be accelerated this way. However, since the shadows and occlusions typically are low-frequency these might be calculated at lower resolution. But it don't know if the programming effort is worth it.
I did only some few tests, but i also decide against it because the cases you really want to increase speed are those with bad DE's and low raystep factors (high fudge factors,values). But unfortunately the benefit of this method is the higher the better the distance estimates are and also the bigger these values are. But on parts the DE's are low and rendering is slow, the benefit shrinks.
|
|
|
Logged
|
|
|
|
Syntopia
|
|
« Reply #8 on: March 31, 2012, 01:30:39 AM » |
|
Question - on some video cards is it possible to use a vec4 source texture and vec4 destination ? If so then I need a card like that and I'll get back into proper coding again sharpish Of course it's possible mine (ATI Radeon HD 5870) does it, I still haven\'t really looked into shaders/GLSL/CUDA etc. at least not on PCs (that's PCs in the general sense) GLSL is quite versatile - you can sample from different textures in your pixel shader, and render to a offscreen FrameBufferObject. You can also choose between several data types (for instance work with 4-component 32-bit floats for colors, instead of 8-bit RGB). These features should be available on most moderne graphics cards.
|
|
|
Logged
|
|
|
|
David Makin
|
|
« Reply #9 on: April 01, 2012, 02:40:14 PM » |
|
GLSL is quite versatile - you can sample from different textures in your pixel shader, and render to a offscreen FrameBufferObject. You can also choose between several data types (for instance work with 4-component 32-bit floats for colors, instead of 8-bit RGB). These features should be available on most moderne graphics cards.
Thanks - I just wasn't sure if cards allowed the destination to be vec4
|
|
|
Logged
|
|
|
|
|
|