Title: rod marching Post by: willvarfar on July 06, 2012, 07:10:35 PM I don't know if this is useful or already used, but I've enjoyed inventing it this morbing
Title: Re: rod marching Post by: willvarfar on July 06, 2012, 07:15:43 PM *morning* (Sorry, didn't mean to submit; hassle posting fom mobile phone browser) here's a link: http://williamedwardscoder.tumblr.com/post/26628848007/rod-marching
I call it "rod" because its like a ray only its got width. Title: Re: rod marching Post by: Syntopia on July 06, 2012, 08:55:38 PM Hi, and welcome to the forums!
I think we have discussed something similar before: http://www.fractalforums.com/programming/enhanced-rendering-using-de-at-least-on-cpu/?PHPSESSID=41de59eab3b21b47d42cac334c3b375a Notice in particular the last link, the cone sphere tracing. I don't know if a hybrid cpu/gpu implemented, though. Title: Re: rod marching Post by: Sockratease on July 06, 2012, 09:39:22 PM Interesting.
Rod Marching, Cone Marching... I suppose Retina Marching is the next logical step :clown: Welcome! Title: Re: rod marching Post by: willvarfar on July 06, 2012, 11:28:24 PM aha, it keeps getting discovered it seems :)
Very nice link. Indeed I'd have called it cone marching too as soon as I allowed for perspective. The neatest thing so far is that, in my very unrepresentative scene, this "cone marching" is doing 2 orders of magnitude less DEs! It'll be interesting how it does in the middle of a mandelbulb or mandelbox! I'll see if I can get some like-for-like speed comparison with boxplorer. Part of my motivation for CPU-side distance maps is to reduce the number of iterations needed on the GPU. There's some description of the overhead of loops in fragment shaders here: http://gamedev.stackexchange.com/a/31780/4129 Title: Re: rod marching Post by: willvarfar on July 12, 2012, 12:45:36 PM Here's some stats for a core2duo laptop, using one core;
Its comparing rod marching to ray-per-pixel rays, for a camera deep inside a mandelbox, with proper perspective. The distance field is 512x512 (small enough to upload to the GPU per frame without much impact). Its very dependent on the scene; move the camera a little and the scene can be easily double-so-complex to compute. Its only computing the distance field. The rod marching is making consistently 10-12x less distance estimates and taking 10-12x less time! Here are some stats showing rods and rays; each pair is from the same eye position. Code: rods 235367 took 0.0611085s from 2.69962,1.35,0.0455439 I rather expect I've made fundemental bugs and inefficiencies in both implementations. I will now plug in a GPU renderer so we can see what it actually means for real with wimpy GPUs like mine... Title: Re: rod marching Post by: willvarfar on July 30, 2012, 06:26:22 AM I've ported the general idea to GPU and its running in-browser here: http://williame.github.com/Mandel_1/ (http://williame.github.com/Mandel_1/)
The scene is drawn in multiple passes building up an intermediate distance-map texture. There is a nasty artifact introduced where it sometimes discards whole blocks, which I have yet to track down; I am hopeful its just a glitch and not a fatal flaw ;) The http://williame.github.com/Mandel_1/ (http://williame.github.com/Mandel_1/) page will try random combinations of passes and print the performance achieved on the right-side pane. Please leave it running a while and then reply with a copy-paste of the right pane and a description of your browser, OS and graphics card :) On my very wimpy old ATI mobile card (40 shaders) I go from 5 fps to 10 fps. I hope this provides a useful speed-up for everyone! Title: Re: rod marching Post by: A Noniem on July 30, 2012, 03:37:33 PM I've ported the general idea to GPU and its running in-browser here: http://williame.github.com/Mandel_1/ (http://williame.github.com/Mandel_1/) The scene is drawn in multiple passes building up an intermediate distance-map texture. There is a nasty artifact introduced where it sometimes discards whole blocks, which I have yet to track down; I am hopeful its just a glitch and not a fatal flaw ;) The http://williame.github.com/Mandel_1/ (http://williame.github.com/Mandel_1/) page will try random combinations of passes and print the performance achieved on the right-side pane. Please leave it running a while and then reply with a copy-paste of the right pane and a description of your browser, OS and graphics card :) On my very wimpy old ATI mobile card (40 shaders) I go from 5 fps to 10 fps. I hope this provides a useful speed-up for everyone! Doing it in multiple passes is a very nice way to port this idea to GPU's and the demo is amazingly smooth, it runs on my AMD 7770 with over 90 FPS @1440x1200. PERFORMANCE: 46.333333333333336 fps at 960x899 using [{13,32x32}{27,128x128}{14,256x256}],50 119.66666666666667 fps at 960x899 using [{8,32x32}{15,64x64}{7,512x512}],50 63 fps at 1440x1200 using [],50 74.33333333333333 fps at 1440x1200 using [{22,32x32}{10,128x128}],50 94 fps at 1440x1200 using [{32,32x32}{10,256x128}],50 96.33333333333333 fps at 1440x1200 using [{33,64x64}{15,128x128}{15,256x256}{26,512x512}],50 98 fps at 1440x1200 using [{33,32x32}{18,128x128}{27,256x256}],50 98.66666666666667 fps at 1440x1200 using [{20,32x32}{14,64x64}{28,512x512}],50 64bit Windows 7 Professional with a Sapphire 7770 OC 1.15Ghz (15% overclock over a normal 7770) Title: Re: rod marching Post by: David Makin on July 31, 2012, 05:01:13 AM Basically the same in Safari or Firefox
PERFORMANCE: 120 fps at 512x512 single-pass rendering at 512x512, 50 120 fps at 512x512 using 3 passes [{15,32x32}{13,256x128}{16,512x256}],50 120 fps at 512x512 using 3 passes [{9,32x32}{9,64x64}{25,128x128}],50 120 fps at 512x512 using 5 passes [{12,32x32}{33,64x64}{11,128x128}{18,256x256}{25,512x512}],50 120 fps at 512x512 single-pass rendering at 512x512, 50 120 fps at 512x512 single-pass rendering at 512x512, 50 120 fps at 512x512 using 3 passes [{20,32x32}{33,256x128}{12,512x512}],50 120 fps at 512x512 single-pass rendering at 512x512, 50 120 fps at 512x512 using 2 passes [{27,32x32}{5,512x256}],50 120 fps at 512x512 using 3 passes [{31,64x32}{30,256x128}{21,512x256}],50 120 fps at 512x512 single-pass rendering at 512x512, 50 120 fps at 512x512 using 4 passes [{20,32x32}{23,64x64}{16,128x256}{34,256x512}],50 120 fps at 512x512 using 3 passes [{10,32x32}{19,64x64}{18,512x256}],50 120 fps at 512x512 using 2 passes [{25,128x256}{12,512x512}],50 120 fps at 512x512 single-pass rendering at 512x512, 50 120 fps at 512x512 using 1 passes [{12,64x64}],50 120 fps at 512x512 using 3 passes [{21,32x32}{26,256x64}{24,512x256}],50 120 fps at 512x512 using 1 passes [{31,256x256}],50 120 fps at 512x512 using 1 passes [{26,32x32}],50 120 fps at 512x512 using 1 passes [{26,32x32}],50 120 fps at 512x512 using 1 passes [{14,128x64}],50 120 fps at 512x512 using 4 passes [{27,32x64}{12,128x128}{20,256x256}{26,512x512}],50 120 fps at 512x512 using 3 passes [{7,32x32}{23,128x256}{20,256x512}],50 120 fps at 512x512 using 4 passes [{16,64x64}{7,128x128}{6,256x256}{29,512x512}],50 120 fps at 512x512 using 1 passes [{31,64x64}],50 Snow Leopatd 10.6.18 dual 6-core CPU with Radeon HD 5870 8GB 1333MHz DDR3 RAM Display size was 876*876 Note that Subblue's original test Mandelbulb on WebGL was running at 200+fps on this system with full lighting etc. At least I think it was Subblue, maybe it was Syntopia, I forget !! Title: Re: rod marching Post by: willvarfar on July 31, 2012, 09:56:44 AM Doing it in multiple passes is a very nice way to port this idea to GPU's and the demo is amazingly smooth, it runs on my AMD 7770 with over 90 FPS @1440x1200. PERFORMANCE: 46.333333333333336 fps at 960x899 using [{13,32x32}{27,128x128}{14,256x256}],50 119.66666666666667 fps at 960x899 using [{8,32x32}{15,64x64}{7,512x512}],50 63 fps at 1440x1200 using [],50 74.33333333333333 fps at 1440x1200 using [{22,32x32}{10,128x128}],50 94 fps at 1440x1200 using [{32,32x32}{10,256x128}],50 96.33333333333333 fps at 1440x1200 using [{33,64x64}{15,128x128}{15,256x256}{26,512x512}],50 98 fps at 1440x1200 using [{33,32x32}{18,128x128}{27,256x256}],50 98.66666666666667 fps at 1440x1200 using [{20,32x32}{14,64x64}{28,512x512}],50 64bit Windows 7 Professional with a Sapphire 7770 OC 1.15Ghz (15% overclock over a normal 7770) Thank you A Noniem! To interpret these numbers, your card could do 63fps when doing classic ray-marching, and 99fps when using three intermediate textures. Basically the same in Safari or Firefox PERFORMANCE: 120 fps at 512x512 single-pass rendering at 512x512, 50 120 fps at 512x512 using 3 passes [{15,32x32}{13,256x128}{16,512x256}],50 ... Snow Leopatd 10.6.18 dual 6-core CPU with Radeon HD 5870 8GB 1333MHz DDR3 RAM Display size was 876*876 Thanks David too! I more recently tweaked the script a bit and you're running a newer version than A Noniem but only two things have changed: the performance messages are clearer and I clamp the final output size to a power-of-two texture. I was seeing artifiacts when the final destination was not a power-of-two. I expect this could be adjusted for in the projection matrix but it was easier to do power-of-two just to see that the artifacts disappeared. Anyone want to point out all the flaws in my code please? (What is interesting is that webGL canvases are always composited so you get a final scaling-to-screen step for 'free'. Nearly.) Your numbers, however, are surprising. 120fps regardless of anything, even the classic ray-marching one. This suggests some kind of vsync-like limit? Note that Subblue's original test Mandelbulb on WebGL was running at 200+fps on this system with full lighting etc. At least I think it was Subblue, maybe it was Syntopia, I forget !! Oh I'd so love to see that demo! Is there a link anywhere? Title: Re: rod marching Post by: A Noniem on July 31, 2012, 12:21:56 PM Using a power of 2 as size indeed decreases a lot of artifacts, but when you run it at a relatively low resolution you run into v-sync problems as you noticed ;D
You could make a demo which runs at 2048x2048 or 4096x4096 since this demo is peanuts for our GPU's. I did expect a bigger speedup however. 63 without rod-marching vs 99 with rod-marching is a nice speedup, but there is also a much bigger chance of artifacts with rod marching. PERFORMANCE: 98 fps at 1024x1024 single-pass rendering at 1024x1024, 50 112.33333333333333 fps at 1024x1024 using 2 passes [{14,64x32}{5,128x64}],50 119.66666666666667 fps at 1024x1024 using 2 passes [{6,64x32}{30,128x64}],50 120 fps at 1024x1024 using 3 passes [{19,32x32}{7,256x128}{10,512x256}],50 120 fps at 1024x1024 using 2 passes [{31,256x128}{17,512x256}],50 119.66666666666667 fps at 1024x1024 using 3 passes [{26,64x128}{33,256x256}{29,512x512}],50 119.66666666666667 fps at 1024x1024 using 4 passes [{6,64x32}{32,128x64}{9,256x128}{10,512x256}],50 120 fps at 1024x1024 using 3 passes [{23,128x128}{7,256x256}{8,512x512}],50 119.66666666666667 fps at 1024x1024 using 1 passes [{31,512x64}],50 120 fps at 1024x1024 using 3 passes [{6,32x32}{12,64x64}{10,128x128}],50 120 fps at 1024x1024 using 3 passes [{11,64x64}{32,128x128}{27,256x256}],50 119.66666666666667 fps at 1024x1024 using 3 passes [{14,32x32}{16,64x64}{11,256x256}],50 120 fps at 1024x1024 using 3 passes [{25,32x32}{21,128x64}{20,256x256}],50 119.33333333333333 fps at 1024x1024 using 2 passes [{14,32x32}{29,64x64}],50 120 fps at 1024x1024 using 4 passes [{32,64x32}{6,128x64}{21,256x128}{7,512x256}],50 119.66666666666667 fps at 1024x1024 using 3 passes [{17,32x32}{17,64x64}{27,128x128}],50 119.66666666666667 fps at 1024x1024 using 2 passes [{27,128x64}{20,256x256}],50 120 fps at 1024x1024 using 3 passes [{33,32x32}{24,64x64}{14,256x256}],50 119.66666666666667 fps at 1024x1024 using 3 passes [{10,64x64}{19,128x128}{20,256x256}],50 119 fps at 1024x1024 using 4 passes [{30,32x32}{9,64x64}{21,128x128}{32,512x512}],50 119.66666666666667 fps at 1024x1024 using 3 passes [{18,32x32}{30,64x64}{7,256x256}],50 119.66666666666667 fps at 1024x1024 using 2 passes [{34,64x64}{26,256x512}],50 119.66666666666667 fps at 1024x1024 using 4 passes [{31,32x32}{20,128x64}{31,256x128}{13,512x512}],50 119 fps at 1024x1024 using 2 passes [{12,64x32}{23,256x128}],50 119.66666666666667 fps at 1024x1024 using 2 passes [{26,64x32}{34,128x128}],50 120 fps at 1024x1024 using 5 passes [{27,32x32}{31,64x64}{32,128x128}{8,256x256}{5,512x512}],50 120 fps at 1024x1024 using 3 passes [{20,32x32}{25,64x64}{19,256x256}],50 120 fps at 1024x1024 using 4 passes [{11,32x32}{28,64x64}{14,128x128}{27,256x256}],50 120 fps at 1024x1024 using 3 passes [{20,32x64}{11,128x128}{9,256x256}],50 111 fps at 1024x1024 single-pass rendering at 1024x1024, 50 117 fps at 1024x1024 using 2 passes [{9,128x64}{19,256x128}],50 Title: Re: rod marching Post by: willvarfar on July 31, 2012, 10:37:24 PM there is also a much bigger chance of artifacts with rod marching. What kind of artifacts? Is there some imprecision inherent in rod-marching? You folks with the good graphics cards seem to sit at 120 fps. My understanding that this is the browser capping it. All the design docs (http://www.chromium.org/developers/design-documents/requestanimationframe-implementation) I've read say they will cap at 60 hz, but you're getting 120 so I guess the implementation differs or perhaps with double buffering they let you have one ahead or ... I'm speculating. But I think your cards would go much faster. The big deal is on the wimpy cards. http://www.notebookcheck.net/Comparison-of-Laptop-Graphics-Cards.130.0.html (http://www.notebookcheck.net/Comparison-of-Laptop-Graphics-Cards.130.0.html) is a great page. My ATI HD 4200 with 40 shaders at 500Mhz, Firefox, Linux: PERFORMANCE: 4 fps at 1024x512 single-pass rendering at 1024x512, 50 ... 8 fps at 1024x512 using 3 passes [{23,128x64}{29,256x128}{8,512x256}],50 8 fps at 1024x512 using 2 passes [{11,32x32}{27,256x128}],50 8 fps at 1024x512 using 2 passes [{16,32x32}{20,256x256}],50 8 fps at 1024x512 using 3 passes [{6,32x32}{26,256x64}{16,512x128}],50 Roy's (http://www.fractalforums.com/meet-and-greet/greetings-from-holland/) stats (don't know what card he has, but he's on Windows): PERFORMANCE: 17.333333333333332 fps at 1440x965 using [],50 20.666666666666668 fps at 1440x965 using [{7,128x64}{13,256x256}],50 21 fps at 1440x965 using [{17,64x64}],50 24 fps at 1440x965 using [{21,64x32}{13,256x128}],50 A friend's ATI HD 5470 with 80 shaders at 750Mhz, Windows, Chrome and Firefox: PERFORMANCE: 6.666666666666667 fps at 1024x512 single-pass rendering at 1024x512, 50 10 fps at 1024x512 using 3 passes [{7,32x32}{14,128x64}{9,512x256}],50 10 fps at 1024x512 using 2 passes [{24,128x64}{29,256x128}],50 10 fps at 1024x512 using 1 passes [{26,128x512}],50 12 fps at 1024x512 using 3 passes [{19,32x32}{32,128x64}{9,512x256}],50 16 fps at 1024x512 using 4 passes [{29,32x32}{17,64x128}{31,128x256}{14,256x512}],50 44.666666666666664 fps at 1024x512 using 3 passes [{21,32x32}{17,64x64}{24,512x512}],50 That final row from the 5470 is a bit ... alarming. Screensaver bypass perhaps? I will have to make it so you can specify explicitly a mode to test, so we can verify... I have a laptop on order with an HD 7670M with 480 shaders at 600Mhz; we'll see if that catapults me up to the 120hz barrier :) Now, this ray marching offers interesting possibilities. It might make interactivity work sluggishly on low-end cards whereas it wouldn't work at all without it. And it might free up gazillions of cycles on the highest end cards which can be put to other purposes. And for those still stuck at the 120hz sound barrier you could now render multi-megapixel frames and have the browser's compositor do a bilinear downscale (http://code.google.com/p/chromium/issues/detail?id=131581#c13) or yet another shader stage that does a bicubic or Lanczos or such. Title: Re: rod marching Post by: cKleinhuis on July 31, 2012, 11:07:29 PM boring ;) 120fps at every test, there must be some kind of capping
Code: PERFORMANCE: Title: Re: rod marching Post by: David Makin on August 01, 2012, 10:52:23 PM Oh I'd so love to see that demo! Is there a link anywhere? I'm pretty sure it was Subblue and it was a WIP version of the one he posted the video of in the other thread - I think he only temporarily made it public basically for beta-testing and feedback. Edit: make that alpha-testing ;) Here: http://www.fractalforums.com/programming/webgl-for-hosting-glsl/msg50176/#msg50176 (http://www.fractalforums.com/programming/webgl-for-hosting-glsl/msg50176/#msg50176) Title: Re: rod marching Post by: willvarfar on August 02, 2012, 10:42:16 AM I have ha reports of an NVIDIA Quadro FX 1800M going from 51fps to 60fps. Again, speculate that 60hz is artificial browser limit. Of course GLSL is not as big again as the magnitude we saw with CPU because its maximum pessimism in each warp. We can escape the 120hz limit in native apps o course. A useful gain at the low end though, and might inspire new approaches to shadows too? So what usefully is next? |