willvarfar
|
|
« on: July 06, 2012, 07:10:35 PM » |
|
I don't know if this is useful or already used, but I've enjoyed inventing it this morbing
|
|
|
Logged
|
|
|
|
|
|
Sockratease
|
|
« Reply #3 on: July 06, 2012, 09:39:22 PM » |
|
Interesting. Rod Marching, Cone Marching... I suppose Retina Marching is the next logical step Welcome!
|
|
|
Logged
|
Life is complex - It has real and imaginary components. The All New Fractal Forums is now in Public Beta Testing! Visit FractalForums.org and check it out!
|
|
|
willvarfar
|
|
« Reply #4 on: July 06, 2012, 11:28:24 PM » |
|
aha, it keeps getting discovered it seems Very nice link. Indeed I'd have called it cone marching too as soon as I allowed for perspective. The neatest thing so far is that, in my very unrepresentative scene, this "cone marching" is doing 2 orders of magnitude less DEs! It'll be interesting how it does in the middle of a mandelbulb or mandelbox! I'll see if I can get some like-for-like speed comparison with boxplorer. Part of my motivation for CPU-side distance maps is to reduce the number of iterations needed on the GPU. There's some description of the overhead of loops in fragment shaders here: http://gamedev.stackexchange.com/a/31780/4129
|
|
« Last Edit: July 07, 2012, 12:06:10 AM by willvarfar »
|
Logged
|
|
|
|
willvarfar
|
|
« Reply #5 on: July 12, 2012, 12:45:36 PM » |
|
Here's some stats for a core2duo laptop, using one core; Its comparing rod marching to ray-per-pixel rays, for a camera deep inside a mandelbox, with proper perspective. The distance field is 512x512 (small enough to upload to the GPU per frame without much impact). Its very dependent on the scene; move the camera a little and the scene can be easily double-so-complex to compute. Its only computing the distance field. The rod marching is making consistently 10-12x less distance estimates and taking 10-12x less time! Here are some stats showing rods and rays; each pair is from the same eye position. rods 235367 took 0.0611085s from 2.69962,1.35,0.0455439 rays 2493476 took 0.578302s from 2.69962,1.35,0.0455439
rods 229348 took 0.0587268s from 2.69095,1.35,0.220913 rays 2433876 took 0.568839s from 2.69095,1.35,0.220913
rods 227182 took 0.0626564s from 2.6709,1.35,0.395373 rays 2390534 took 0.554815s from 2.6709,1.35,0.395373
rods 216570 took 0.0571731s from 2.63954,1.35,0.568186 rays 2338642 took 0.554455s from 2.63954,1.35,0.568186
rods 216414 took 0.0571331s from 2.59824,1.35,0.73426 rays 2308965 took 0.537348s from 2.59824,1.35,0.73426
rods 197589 took 0.0553679s from 2.5465,1.35,0.897394 rays 2194695 took 0.506544s from 2.5465,1.35,0.897394
rods 181117 took 0.0482323s from 2.48629,1.35,1.05278 rays 2059999 took 0.473715s from 2.48629,1.35,1.05278
rods 177440 took 0.0493438s from 2.42262,1.35,1.19202 rays 2185117 took 0.488354s from 2.42262,1.35,1.19202
rods 210094 took 0.0548588s from 2.35336,1.35,1.32352 rays 2678732 took 0.59416s from 2.35336,1.35,1.32352
rods 253639 took 0.0660412s from 2.27455,1.35,1.4548 rays 3408649 took 0.758152s from 2.27455,1.35,1.4548
rods 367257 took 0.0962661s from 2.16978,1.35,1.60688 rays 4482750 took 1.01132s from 2.16978,1.35,1.60688
rods 539270 took 0.130638s from 2.02241,1.35,1.78882 rays 6587307 took 1.51029s from 2.02241,1.35,1.78882
rods 561068 took 0.131583s from 1.8036,1.35,2.00924 rays 6832742 took 1.58173s from 1.8036,1.35,2.00924
rods 252456 took 0.0613615s from 1.4487,1.35,2.27843 rays 3373335 took 0.746726s from 1.4487,1.35,2.27843
rods 178165 took 0.0463587s from 1.05441,1.35,2.4856 rays 2063349 took 0.466148s from 1.05441,1.35,2.4856
rods 209975 took 0.0549983s from 0.852219,1.35,2.56198 rays 2247102 took 0.515845s from 0.852219,1.35,2.56198
rods 218932 took 0.056313s from 0.7143,1.35,2.6038 rays 2322587 took 0.536559s from 0.7143,1.35,2.6038
rods 221153 took 0.0659014s from 0.561138,1.35,2.64105 rays 2340228 took 0.541684s from 0.561138,1.35,2.64105
rods 227491 took 0.0593241s from 0.397009,1.35,2.67065 rays 2392276 took 0.557683s from 0.397009,1.35,2.67065
rods 229484 took 0.0644665s from 0.231641,1.35,2.69005 rays 2430611 took 0.567076s from 0.231641,1.35,2.69005
rods 234015 took 0.0717583s from 0.0607593,1.35,2.69932 rays 2490614 took 0.576687s from 0.0607593,1.35,2.69932
rods 230136 took 0.0590744s from -0.119381,1.35,2.69736 rays 2474156 took 0.578239s from -0.119381,1.35,2.69736 I rather expect I've made fundemental bugs and inefficiencies in both implementations. I will now plug in a GPU renderer so we can see what it actually means for real with wimpy GPUs like mine...
|
|
|
Logged
|
|
|
|
willvarfar
|
|
« Reply #6 on: July 30, 2012, 06:26:22 AM » |
|
I've ported the general idea to GPU and its running in-browser here: http://williame.github.com/Mandel_1/The scene is drawn in multiple passes building up an intermediate distance-map texture. There is a nasty artifact introduced where it sometimes discards whole blocks, which I have yet to track down; I am hopeful its just a glitch and not a fatal flaw The http://williame.github.com/Mandel_1/ page will try random combinations of passes and print the performance achieved on the right-side pane. Please leave it running a while and then reply with a copy-paste of the right pane and a description of your browser, OS and graphics card On my very wimpy old ATI mobile card (40 shaders) I go from 5 fps to 10 fps. I hope this provides a useful speed-up for everyone!
|
|
|
Logged
|
|
|
|
A Noniem
|
|
« Reply #7 on: July 30, 2012, 03:37:33 PM » |
|
I've ported the general idea to GPU and its running in-browser here: http://williame.github.com/Mandel_1/The scene is drawn in multiple passes building up an intermediate distance-map texture. There is a nasty artifact introduced where it sometimes discards whole blocks, which I have yet to track down; I am hopeful its just a glitch and not a fatal flaw The http://williame.github.com/Mandel_1/ page will try random combinations of passes and print the performance achieved on the right-side pane. Please leave it running a while and then reply with a copy-paste of the right pane and a description of your browser, OS and graphics card On my very wimpy old ATI mobile card (40 shaders) I go from 5 fps to 10 fps. I hope this provides a useful speed-up for everyone! Doing it in multiple passes is a very nice way to port this idea to GPU's and the demo is amazingly smooth, it runs on my AMD 7770 with over 90 FPS @1440x1200. PERFORMANCE: 46.333333333333336 fps at 960x899 using [{13,32x32}{27,128x128}{14,256x256}],50 119.66666666666667 fps at 960x899 using [{8,32x32}{15,64x64}{7,512x512}],50 63 fps at 1440x1200 using [],50 74.33333333333333 fps at 1440x1200 using [{22,32x32}{10,128x128}],50 94 fps at 1440x1200 using [{32,32x32}{10,256x128}],50 96.33333333333333 fps at 1440x1200 using [{33,64x64}{15,128x128}{15,256x256}{26,512x512}],50 98 fps at 1440x1200 using [{33,32x32}{18,128x128}{27,256x256}],50 98.66666666666667 fps at 1440x1200 using [{20,32x32}{14,64x64}{28,512x512}],50 64bit Windows 7 Professional with a Sapphire 7770 OC 1.15Ghz (15% overclock over a normal 7770)
|
|
« Last Edit: July 30, 2012, 05:45:51 PM by A Noniem »
|
Logged
|
|
|
|
David Makin
|
|
« Reply #8 on: July 31, 2012, 05:01:13 AM » |
|
Basically the same in Safari or Firefox
PERFORMANCE: 120 fps at 512x512 single-pass rendering at 512x512, 50 120 fps at 512x512 using 3 passes [{15,32x32}{13,256x128}{16,512x256}],50 120 fps at 512x512 using 3 passes [{9,32x32}{9,64x64}{25,128x128}],50 120 fps at 512x512 using 5 passes [{12,32x32}{33,64x64}{11,128x128}{18,256x256}{25,512x512}],50 120 fps at 512x512 single-pass rendering at 512x512, 50 120 fps at 512x512 single-pass rendering at 512x512, 50 120 fps at 512x512 using 3 passes [{20,32x32}{33,256x128}{12,512x512}],50 120 fps at 512x512 single-pass rendering at 512x512, 50 120 fps at 512x512 using 2 passes [{27,32x32}{5,512x256}],50 120 fps at 512x512 using 3 passes [{31,64x32}{30,256x128}{21,512x256}],50 120 fps at 512x512 single-pass rendering at 512x512, 50 120 fps at 512x512 using 4 passes [{20,32x32}{23,64x64}{16,128x256}{34,256x512}],50 120 fps at 512x512 using 3 passes [{10,32x32}{19,64x64}{18,512x256}],50 120 fps at 512x512 using 2 passes [{25,128x256}{12,512x512}],50 120 fps at 512x512 single-pass rendering at 512x512, 50 120 fps at 512x512 using 1 passes [{12,64x64}],50 120 fps at 512x512 using 3 passes [{21,32x32}{26,256x64}{24,512x256}],50 120 fps at 512x512 using 1 passes [{31,256x256}],50 120 fps at 512x512 using 1 passes [{26,32x32}],50 120 fps at 512x512 using 1 passes [{26,32x32}],50 120 fps at 512x512 using 1 passes [{14,128x64}],50 120 fps at 512x512 using 4 passes [{27,32x64}{12,128x128}{20,256x256}{26,512x512}],50 120 fps at 512x512 using 3 passes [{7,32x32}{23,128x256}{20,256x512}],50 120 fps at 512x512 using 4 passes [{16,64x64}{7,128x128}{6,256x256}{29,512x512}],50 120 fps at 512x512 using 1 passes [{31,64x64}],50
Snow Leopatd 10.6.18 dual 6-core CPU with Radeon HD 5870 8GB 1333MHz DDR3 RAM
Display size was 876*876
Note that Subblue's original test Mandelbulb on WebGL was running at 200+fps on this system with full lighting etc. At least I think it was Subblue, maybe it was Syntopia, I forget !!
|
|
« Last Edit: July 31, 2012, 05:09:42 AM by David Makin »
|
Logged
|
|
|
|
willvarfar
|
|
« Reply #9 on: July 31, 2012, 09:56:44 AM » |
|
Doing it in multiple passes is a very nice way to port this idea to GPU's and the demo is amazingly smooth, it runs on my AMD 7770 with over 90 FPS @1440x1200.
PERFORMANCE: 46.333333333333336 fps at 960x899 using [{13,32x32}{27,128x128}{14,256x256}],50 119.66666666666667 fps at 960x899 using [{8,32x32}{15,64x64}{7,512x512}],50 63 fps at 1440x1200 using [],50 74.33333333333333 fps at 1440x1200 using [{22,32x32}{10,128x128}],50 94 fps at 1440x1200 using [{32,32x32}{10,256x128}],50 96.33333333333333 fps at 1440x1200 using [{33,64x64}{15,128x128}{15,256x256}{26,512x512}],50 98 fps at 1440x1200 using [{33,32x32}{18,128x128}{27,256x256}],50 98.66666666666667 fps at 1440x1200 using [{20,32x32}{14,64x64}{28,512x512}],50
64bit Windows 7 Professional with a Sapphire 7770 OC 1.15Ghz (15% overclock over a normal 7770)
Thank you A Noniem! To interpret these numbers, your card could do 63fps when doing classic ray-marching, and 99fps when using three intermediate textures. Basically the same in Safari or Firefox
PERFORMANCE: 120 fps at 512x512 single-pass rendering at 512x512, 50 120 fps at 512x512 using 3 passes [{15,32x32}{13,256x128}{16,512x256}],50 ...
Snow Leopatd 10.6.18 dual 6-core CPU with Radeon HD 5870 8GB 1333MHz DDR3 RAM
Display size was 876*876
Thanks David too! I more recently tweaked the script a bit and you're running a newer version than A Noniem but only two things have changed: the performance messages are clearer and I clamp the final output size to a power-of-two texture. I was seeing artifiacts when the final destination was not a power-of-two. I expect this could be adjusted for in the projection matrix but it was easier to do power-of-two just to see that the artifacts disappeared. Anyone want to point out all the flaws in my code please? (What is interesting is that webGL canvases are always composited so you get a final scaling-to-screen step for 'free'. Nearly.) Your numbers, however, are surprising. 120fps regardless of anything, even the classic ray-marching one. This suggests some kind of vsync-like limit? Note that Subblue's original test Mandelbulb on WebGL was running at 200+fps on this system with full lighting etc. At least I think it was Subblue, maybe it was Syntopia, I forget !!
Oh I'd so love to see that demo! Is there a link anywhere?
|
|
|
Logged
|
|
|
|
A Noniem
|
|
« Reply #10 on: July 31, 2012, 12:21:56 PM » |
|
Using a power of 2 as size indeed decreases a lot of artifacts, but when you run it at a relatively low resolution you run into v-sync problems as you noticed You could make a demo which runs at 2048x2048 or 4096x4096 since this demo is peanuts for our GPU's. I did expect a bigger speedup however. 63 without rod-marching vs 99 with rod-marching is a nice speedup, but there is also a much bigger chance of artifacts with rod marching. PERFORMANCE: 98 fps at 1024x1024 single-pass rendering at 1024x1024, 50 112.33333333333333 fps at 1024x1024 using 2 passes [{14,64x32}{5,128x64}],50 119.66666666666667 fps at 1024x1024 using 2 passes [{6,64x32}{30,128x64}],50 120 fps at 1024x1024 using 3 passes [{19,32x32}{7,256x128}{10,512x256}],50 120 fps at 1024x1024 using 2 passes [{31,256x128}{17,512x256}],50 119.66666666666667 fps at 1024x1024 using 3 passes [{26,64x128}{33,256x256}{29,512x512}],50 119.66666666666667 fps at 1024x1024 using 4 passes [{6,64x32}{32,128x64}{9,256x128}{10,512x256}],50 120 fps at 1024x1024 using 3 passes [{23,128x128}{7,256x256}{8,512x512}],50 119.66666666666667 fps at 1024x1024 using 1 passes [{31,512x64}],50 120 fps at 1024x1024 using 3 passes [{6,32x32}{12,64x64}{10,128x128}],50 120 fps at 1024x1024 using 3 passes [{11,64x64}{32,128x128}{27,256x256}],50 119.66666666666667 fps at 1024x1024 using 3 passes [{14,32x32}{16,64x64}{11,256x256}],50 120 fps at 1024x1024 using 3 passes [{25,32x32}{21,128x64}{20,256x256}],50 119.33333333333333 fps at 1024x1024 using 2 passes [{14,32x32}{29,64x64}],50 120 fps at 1024x1024 using 4 passes [{32,64x32}{6,128x64}{21,256x128}{7,512x256}],50 119.66666666666667 fps at 1024x1024 using 3 passes [{17,32x32}{17,64x64}{27,128x128}],50 119.66666666666667 fps at 1024x1024 using 2 passes [{27,128x64}{20,256x256}],50 120 fps at 1024x1024 using 3 passes [{33,32x32}{24,64x64}{14,256x256}],50 119.66666666666667 fps at 1024x1024 using 3 passes [{10,64x64}{19,128x128}{20,256x256}],50 119 fps at 1024x1024 using 4 passes [{30,32x32}{9,64x64}{21,128x128}{32,512x512}],50 119.66666666666667 fps at 1024x1024 using 3 passes [{18,32x32}{30,64x64}{7,256x256}],50 119.66666666666667 fps at 1024x1024 using 2 passes [{34,64x64}{26,256x512}],50 119.66666666666667 fps at 1024x1024 using 4 passes [{31,32x32}{20,128x64}{31,256x128}{13,512x512}],50 119 fps at 1024x1024 using 2 passes [{12,64x32}{23,256x128}],50 119.66666666666667 fps at 1024x1024 using 2 passes [{26,64x32}{34,128x128}],50 120 fps at 1024x1024 using 5 passes [{27,32x32}{31,64x64}{32,128x128}{8,256x256}{5,512x512}],50 120 fps at 1024x1024 using 3 passes [{20,32x32}{25,64x64}{19,256x256}],50 120 fps at 1024x1024 using 4 passes [{11,32x32}{28,64x64}{14,128x128}{27,256x256}],50 120 fps at 1024x1024 using 3 passes [{20,32x64}{11,128x128}{9,256x256}],50 111 fps at 1024x1024 single-pass rendering at 1024x1024, 50 117 fps at 1024x1024 using 2 passes [{9,128x64}{19,256x128}],50
|
|
« Last Edit: July 31, 2012, 12:26:00 PM by A Noniem »
|
Logged
|
|
|
|
willvarfar
|
|
« Reply #11 on: July 31, 2012, 10:37:24 PM » |
|
there is also a much bigger chance of artifacts with rod marching.
What kind of artifacts? Is there some imprecision inherent in rod-marching? You folks with the good graphics cards seem to sit at 120 fps. My understanding that this is the browser capping it. All the design docs I've read say they will cap at 60 hz, but you're getting 120 so I guess the implementation differs or perhaps with double buffering they let you have one ahead or ... I'm speculating. But I think your cards would go much faster. The big deal is on the wimpy cards. http://www.notebookcheck.net/Comparison-of-Laptop-Graphics-Cards.130.0.html is a great page. My ATI HD 4200 with 40 shaders at 500Mhz, Firefox, Linux: PERFORMANCE: 4 fps at 1024x512 single-pass rendering at 1024x512, 50 ... 8 fps at 1024x512 using 3 passes [{23,128x64}{29,256x128}{8,512x256}],50 8 fps at 1024x512 using 2 passes [{11,32x32}{27,256x128}],50 8 fps at 1024x512 using 2 passes [{16,32x32}{20,256x256}],50 8 fps at 1024x512 using 3 passes [{6,32x32}{26,256x64}{16,512x128}],50Roy's stats (don't know what card he has, but he's on Windows): PERFORMANCE: 17.333333333333332 fps at 1440x965 using [],50 20.666666666666668 fps at 1440x965 using [{7,128x64}{13,256x256}],50 21 fps at 1440x965 using [{17,64x64}],50 24 fps at 1440x965 using [{21,64x32}{13,256x128}],50 A friend's ATI HD 5470 with 80 shaders at 750Mhz, Windows, Chrome and Firefox: PERFORMANCE: 6.666666666666667 fps at 1024x512 single-pass rendering at 1024x512, 50 10 fps at 1024x512 using 3 passes [{7,32x32}{14,128x64}{9,512x256}],50 10 fps at 1024x512 using 2 passes [{24,128x64}{29,256x128}],50 10 fps at 1024x512 using 1 passes [{26,128x512}],50 12 fps at 1024x512 using 3 passes [{19,32x32}{32,128x64}{9,512x256}],50 16 fps at 1024x512 using 4 passes [{29,32x32}{17,64x128}{31,128x256}{14,256x512}],50 44.666666666666664 fps at 1024x512 using 3 passes [{21,32x32}{17,64x64}{24,512x512}],50That final row from the 5470 is a bit ... alarming. Screensaver bypass perhaps? I will have to make it so you can specify explicitly a mode to test, so we can verify... I have a laptop on order with an HD 7670M with 480 shaders at 600Mhz; we'll see if that catapults me up to the 120hz barrier Now, this ray marching offers interesting possibilities. It might make interactivity work sluggishly on low-end cards whereas it wouldn't work at all without it. And it might free up gazillions of cycles on the highest end cards which can be put to other purposes. And for those still stuck at the 120hz sound barrier you could now render multi-megapixel frames and have the browser's compositor do a bilinear downscale or yet another shader stage that does a bicubic or Lanczos or such.
|
|
« Last Edit: July 31, 2012, 10:39:12 PM by willvarfar »
|
Logged
|
|
|
|
cKleinhuis
|
|
« Reply #12 on: July 31, 2012, 11:07:29 PM » |
|
boring 120fps at every test, there must be some kind of capping PERFORMANCE: 120 fps at 1024x512 single-pass rendering at 1024x512, 50 120 fps at 1024x512 using 1 passes [{17,32x32}],50 120 fps at 1024x512 using 2 passes [{11,128x256}{12,256x512}],50 120 fps at 1024x512 using 1 passes [{5,32x32}],50 120 fps at 1024x512 using 3 passes [{32,32x32}{18,64x128}{13,256x512}],50 120 fps at 1024x512 using 2 passes [{8,32x64}{10,64x128}],50 120 fps at 1024x512 using 3 passes [{18,128x64}{17,256x128}{19,512x256}],50 120 fps at 1024x512 using 2 passes [{23,64x128}{34,256x256}],50 120 fps at 1024x512 using 1 passes [{8,128x64}],50 120 fps at 1024x512 using 1 passes [{22,256x512}],50 120 fps at 1024x512 using 1 passes [{13,256x128}],50 120 fps at 1024x512 using 1 passes [{6,128x256}],50 120 fps at 1024x512 using 2 passes [{16,64x64}{17,256x128}],50 120 fps at 1024x512 using 4 passes [{6,32x32}{7,64x64}{33,128x128}{26,512x512}],50 120 fps at 1024x512 using 2 passes [{27,64x64}{10,128x128}],50 120 fps at 1024x512 using 2 passes [{24,64x256}{17,128x512}],50 120 fps at 1024x512 using 1 passes [{16,128x64}],50
|
|
|
Logged
|
---
divide and conquer - iterate and rule - chaos is No random!
|
|
|
David Makin
|
|
« Reply #13 on: August 01, 2012, 10:52:23 PM » |
|
Oh I'd so love to see that demo! Is there a link anywhere?
I'm pretty sure it was Subblue and it was a WIP version of the one he posted the video of in the other thread - I think he only temporarily made it public basically for beta-testing and feedback. Edit: make that alpha-testing Here: http://www.fractalforums.com/programming/webgl-for-hosting-glsl/msg50176/#msg50176
|
|
« Last Edit: August 01, 2012, 11:25:42 PM by David Makin »
|
Logged
|
|
|
|
willvarfar
|
|
« Reply #14 on: August 02, 2012, 10:42:16 AM » |
|
I have ha reports of an NVIDIA Quadro FX 1800M going from 51fps to 60fps. Again, speculate that 60hz is artificial browser limit.
Of course GLSL is not as big again as the magnitude we saw with CPU because its maximum pessimism in each warp.
We can escape the 120hz limit in native apps o course.
A useful gain at the low end though, and might inspire new approaches to shadows too?
So what usefully is next?
|
|
|
Logged
|
|
|
|
|