David Makin
|
|
« on: November 14, 2010, 07:01:31 PM » |
|
Hi all,
Am fairly new to coding for shaders and am just starting a general fractal program to use shader 2 (OpenGL ES2) to go on iPhone/iPad etc. and possibly for webGL too. Anyway the emphasis is going to be on speed rather than deep-zooming so I'm going to be sticking to float rather than attempting to extend to doulble. The thing is the best way of enhancing user expeience in any graphics software is to keep things as interactive as possible and to this end I want to impliment a progressive resolution increasing algorithm so users get the full view as quickly as possible i.e. the way UF's progressive rendering works - similar to Xaos for those unfamiliar with UF. To do this you render at say 1/16 resolution then fill in the 3 parts of each pixel (top-right and bottom two) at 1/4 resolution then fill in the 3 parts of each pixel (top-right and bottom two) at full resolution - obviously for particularly slow renders one could start at 1/32 or 1/64 etc.
My question is what is the best way to do this using shader 2 fragments ? - I can think quickly of 2 possible alternatives:
1. Render all the individual boxes as separate sections of a large texture buffer (in a given time), then have a separate shader that combines the boxes to the current resolution achieved in the time. 2. Render the first box to one texture buffer then use that as a source so these pixels are just fetched on the next pass rather than re-rendered and the other 3/4 are calculated.
Of course here I'm assuming that I'm correct in that I can't find a method whereby the destination for shaders can be set to skip pixels in some way ? Though maybe this could be done by fudging the pixel/colour format information of the destination texture ?
|
|
|
Logged
|
|
|
|
marius
Fractal Lover
Posts: 206
|
|
« Reply #1 on: November 14, 2010, 08:01:35 PM » |
|
Hi all,
Am fairly new to coding for shaders and am just starting a general fractal program to use shader 2 (OpenGL ES2) to go on iPhone/iPad etc. and possibly for webGL too. Anyway the emphasis is going to be on speed rather than deep-zooming so I'm going to be sticking to float rather than attempting to extend to doulble. The thing is the best way of enhancing user expeience in any graphics software is to keep things as interactive as possible and to this end I want to impliment a progressive resolution increasing algorithm so users get the full view as quickly as possible i.e. the way UF's progressive rendering works - similar to Xaos for those unfamiliar with UF. To do this you render at say 1/16 resolution then fill in the 3 parts of each pixel (top-right and bottom two) at 1/4 resolution then fill in the 3 parts of each pixel (top-right and bottom two) at full resolution - obviously for particularly slow renders one could start at 1/32 or 1/64 etc.
My question is what is the best way to do this using shader 2 fragments ? - I can think quickly of 2 possible alternatives:
1. Render all the individual boxes as separate sections of a large texture buffer (in a given time), then have a separate shader that combines the boxes to the current resolution achieved in the time. 2. Render the first box to one texture buffer then use that as a source so these pixels are just fetched on the next pass rather than re-rendered and the other 3/4 are calculated.
Of course here I'm assuming that I'm correct in that I can't find a method whereby the destination for shaders can be set to skip pixels in some way ? Though maybe this could be done by fudging the pixel/colour format information of the destination texture ?
There are probably better ways but have a look at the tweak I did for boxplorer's vertex.glsl to do cross-eyed or over-under 3d. In the vertex shader you can step over the 'grid' skipping rays. Hardware blit can scale it up pretty quickly I imagine. Not clear how you'd go from 1/4 res to full res w/o recomputing the 1/4 resolution rays though. A random scatter within the 4x4 or 8x8 block would be nice, then an image will settle to full rez once movement stops. That's what mandelflyer appears to do.
|
|
|
Logged
|
|
|
|
David Makin
|
|
« Reply #2 on: November 14, 2010, 09:26:50 PM » |
|
Just found and read the info about using stencils - I guess that's the way to skip pixels However after further consideration of the options I think I'm going with the method where the initial (fractal) rendering is done using small box areas - 4 @ 1/32, 4 @ 1/16 etc. to areas of a single texture and then when time has run out they are combined from that source to a destination texture using a special separate shader program which combines all the areas from 1/32 res to the current maximum acheived res. - this destination is then displayed. Then on the next loop if there are no changes then the res. continues (doing 4 @ 1/8, 4 @ 1/4 etc.) again until time runs out and again the other shader code is used to combine again to the destination texture for display (of course if there are changes then we simply restart at 1/32). I think that method is probably less computationally expensive even than using stencils because the fractal rendering is always a "complete" area.
|
|
|
Logged
|
|
|
|
David Makin
|
|
« Reply #3 on: November 15, 2010, 08:29:43 PM » |
|
Just thought I'd mention that shader 2 is noticeably faster on the shader 2 enabled iPhone/iTouch/iPad devices than it is on an older Mac mini (using Intel GMA 950).
|
|
|
Logged
|
|
|
|
cbuchner1
|
|
« Reply #4 on: November 16, 2010, 01:39:22 AM » |
|
Don't interpret the lack of responses as lack of interest. It's highly intriguing what you are doing.
Isn't pixel shader 2.0 rather limited in terms of instruction count and branching? Or does "Shader 2" represent the particular GLSL dialect of OpenGL ES 2.0?
Last time I programmed the GMA 950 it was quite a pain (that involved OpenGL ARB_fragment_program and ARB_vertex_program - a dead end in shader evolution). 96 instructions per fragment program at most, thereof 64 instructions using the ALU. All in some obscure low level assembly shader language. No branching permitted and limited amount of registers (16) to use. Still I was able to calculate some physics (radio propagation and cellular radio coverage) with it at interactive frame rates.
I am quite amazed how much graphics power handheld device have these days. I recently bought a beagleboard XM ARM development board and it has PowerVR graphics that can more than compete with the graphics cards I had in my desktop PC 10 years ago. At a fraction of costs and power consumption.
|
|
« Last Edit: November 16, 2010, 01:51:47 AM by cbuchner1 »
|
Logged
|
|
|
|
cKleinhuis
|
|
« Reply #5 on: November 16, 2010, 02:24:19 AM » |
|
you can read my article i worte for the "ShaderX3" book series, in this article i describe how to overcome calculation of large formula limits with shader2.0 hardware, by using in-between-values-buffers, but in fact i think shader2 programming is oldschool but i see, that current mobile hardware has a need for shader2 programming ... i see you are using OpenGL ES2, which is wonderful, because it is a high level language for colouring of your renderings, you can use a simple predefined 1 dimensional texture with an arbitrary gradient right now i am out of programming graphics hardware ( a real pitty ) but i find it interesting that mobile devices feature gpu programming ...,
|
|
|
Logged
|
---
divide and conquer - iterate and rule - chaos is No random!
|
|
|
David Makin
|
|
« Reply #6 on: November 16, 2010, 04:28:13 AM » |
|
Yes, by "shader 2" I meant using OpenGL ES2.. Here's my main fragment - recompiled on changing "#define"s (inserted where the #define is before compilation) based on users changing the parameters (new formula, colouring etc). Initially I tried it with all the #conditionals as plain conditionals based on the uniforms so re-compilation wasn't required but that's when I discovered that although it worked it was very slow - it appeared to me that what was happening was the runtime code was too large and iOS4 instead ran the code in emulation mode on the CPU instead, though lyc seemed to think it was just down to the number of branches rather than code size but to me that didn't make sense since the branches concerned would not vary the instruction path from one pixel to another plus the speed chsnge was very abrupt as I reduced the size of the code by removing.some options. varying highp vec2 pixel; uniform sampler2D palettes; uniform highp vec2 pos; uniform highp vec2 trapcentre; uniform highp float bailout; uniform highp float llb; uniform highp float smallbail; uniform highp float cscale; uniform lowp float pal; uniform lowp float offset; uniform int maxiter;
#define
void main() { highp vec2 z; highp vec4 zold; highp vec2 d1; highp vec2 d2; highp vec2 z2; highp vec2 s; highp vec4 a; int i = 0; a.x = a.y = zold.x = zold.y = 0.0; #if (mandy==1) { z = pos; } #else { z = pixel; } #endif #if ((usemap==1)||((colouring<5)&&(method==0))) { a.x = bailout; #if (usemap==1) { d1 = z-trapcentre; } #endif } #endif #if (colouring<5) { d2.x = d2.y = 0.0; } #elif (colouring==13) { d1.x = length(pixel); } #endif z2 = z*z; a.w = bailout;
do { zold.w = zold.y; zold.z = zold.x; zold.y = z.y; zold.x = z.x; #if (addrot>=4) { a.z = z2.x + z2.y; if (a.z>0.0) { a.z = 1.0/sqrt(a.z); z.y = 2.0*z.x*z.y*a.z; z.x = (z2.x - z2.y)*a.z; z2 = z*z; } } #endif
#if (formula==0) { z.y = 2.0*z.x*z.y; z.x = z2.x - z2.y; } #elif (formula==1) { z.y = z.y*(3.0*z2.x - z2.y); z.x = z.x*(z2.x - 3.0*z2.y); } #elif (formula==2) { z.y = 4.0*z.x*z.y*(z2.x - z2.y); z.x = dot(z2,z2) - 6.0*z2.x*z2.y; } #elif (formula==3) { a.z = sqrt(z2.x+z2.y); z2.y = sign(z.y)*sqrt(a.z - z.x); z2.x = sqrt(a.z + z.x); z.x = z.x*z2.x - z.y*z2.y; z.y = z.y*z2.x + zold.x*z2.y; z *= 0.70710678; } #elif (formula==4) { z.x = exp(z.x); z.y = z.x*sin(z.y); z.x = z.x*cos(zold.y); } #endif #if (mandy==0) { z += pos; } #else { z += pixel; } #endif #if ((addrot==1)||(addrot==3)||(addrot==5)||(addrot==7)) { z = z + vec2(zold.x,zold.y); } #endif #if ((addrot==2)||(addrot==3)||(addrot==6)||(addrot==7)) { z = z + vec2(zold.z,zold.w); } #endif if (length(vec2(zold.x,zold.y)-z)<smallbail) { i = maxiter; break; } z2 = z*z; #if (colouring<15) if ((a.z = z2.x + z2.y)>=bailout) #else a.z = z2.x + z2.y; if (abs(z2.x/z.y)>=bailout) #endif { #if (usemap>0) { #if (usemap==1) { z.x = abs(d1.x) + 1.0; z.y = abs(d1.y) + 1.0; z = log(z); z.y = -z.y; } #else { d1.y = log(a.z); d1.x = 1.0 + atan(z.y,z.x)/6.2831852; d1.y = 1.0 - (log(d1.y)-llb)/log(d1.y/log(dot(vec2(zold.x,zold.y),vec2(zold.x,zold.y)))); z = d1; } #endif z.x += pal; z.y += offset; z *= cscale; z.x = z.x - floor(z.x); z.y = z.y - floor(z.y); break; } #endif
#if (colouring<5) { #if (colouring==0) { z.y = a.x; } #elif (colouring==1) { z.y = log(a.y + 1.0); } #elif (colouring==2) { z.y = length(d2 - trapcentre); } #elif (colouring==3) { z.y = abs(d2.x-trapcentre.x); } #else// if (colouring==4) { d2 = d2 - trapcentre; z.y = (5.0/3.1415926)*abs(atan(d2.y,d2.x)); } #endif if (z.y>threshold) { z.y = threshold; } z.y *= 0.15; } #else { d1.x = log(a.z); d1.x = (log(d1.x)-llb)/log(d1.y = d1.x/log(dot(vec2(zold.x,zold.y),vec2(zold.x,zold.y)))); #if (orbitfx==1) { s.x = z.x; z.x = z.x - cos(2.0*z.y); z.y = z.y - 2.0*sin(s.x); } #elif (orbitfx==2) { s.x = z.x; z.x = 0.2*sqrt(a.z); z.y = atan(z.y,s.x); z.x = 3.1415926*(z.x-floor(z.x)); } #endif #if (colouring==15) { z.y = float(i)/float(maxiter);//sqrt(log(1.0+1.71828*(float(i) + 1.0)/100.0)); } #elif (colouring==5) { z.y = sqrt(log(1.0+1.71828*(float(i) + 1.0 - d1.x)/100.0)); } #elif (colouring==6) { z.y = ((1.0-d1.x)*abs(atan(z.y,z.x)) + d1.x*abs(atan(zold.y,zold.x)))/3.1415926; } #elif (colouring==7) { z.x = atan(z.y,z.x); z.y = atan(zold.y,zold.x); if (z.x<0.0) { z.x += 6.2831852; } if (z.y<0.0) { z.y += 6.2831852; } z.y = (z.x + d1.x*(z.y - z.x) + 0.5)/7.2831852; } #elif (colouring==8) { a.x /= (float(i)+2.0); a.y /= (float(i)+1.0); z.y = 0.5*(a.x + d1.x*(a.y-a.x)); } #elif (colouring==9) { a.x = abs(atan(z.y,z.x)/3.1415926); z.y = abs(2.0*d1.x - 1.0); if (a.x>z.y) { z.y = a.x; } } #elif (colouring==10) { d1.x -= 0.5; a.x = atan(z.y,z.x)/3.1415926; z.y = (1.0 - a.x*a.x)*(1.0 - 4.0*d1.x*d1.x); } #elif (colouring==11) { a.x = atan(z.y,z.x)/6.2831852; if (a.x<0.0) { a.x = 1.0 + a.x; } d1.y = a.x*floor(d1.y+0.5); a.x = abs(1.0 - 2.0*a.x); z.y = a.x + (1.0-d1.x)*(abs(1.0 - 2.0*(d1.y - floor(d1.y))) - a.x); } #elif (colouring==12) { a.x = atan(z.y,z.x)/6.2831852; a.z = atan(zold.y,zold.x)/6.2831852; d1.y = floor(d1.y+0.5); if (a.x<0.0) { a.x = 1.0 + a.x; } a.y = a.x*d1.y; if (a.z<0.0) { a.z = 1.0 + a.z; } a.x = abs(1.0 - 2.0*a.x); d1.y = a.z*d1.y; a.z = abs(1.0 - 2.0*a.z); a.x = a.x + (1.0-d1.x)*(abs(1.0 - 2.0*(a.y - floor(a.y))) - a.x); a.z = a.z + d1.x*(abs(1.0 - 2.0*(d1.y - floor(d1.y))) - a.z); z.y = a.x + (a.z-a.x)*d1.x; } #elif (colouring==13) { a.x /= (float(i)+1.0); if (i>0) { a.y /= float(i); z.y = 2.0*(a.x + (a.y-a.x)*d1.x); } else { z.y = a.x; } } #elif (colouring==14) { if (i>0) { a.x /= (float(i)+1.0); a.y /= float(i); z.y = sqrt(a.x + (a.y-a.x)*d1.x); } else { z.y = 0.0; } } #endif } #endif
z.y = cscale*(z.y + offset); z.x = pal; z.y = z.y - floor(z.y); break; }
if (a.z<a.w) { a.w = a.z; } #if (orbitfx>0) { s = z; #if (orbitfx==1) { z.x = z.x - cos(2.0*z.y); z.y = z.y - 2.0*sin(s.x); } #else// if (orbitfx==2) { z.x = 0.2*sqrt(a.z); z.y = atan(z.y,s.x); z.x = 3.1415926*(z.x-floor(z.x)); } #endif } #endif
#if (usemap==1) { a.z = length(d2 = z-trapcentre); if (a.z<a.x) { a.x = a.z; d1 = d2; } } #elif (usemap==0) { #if (colouring<5) { d1 = z - trapcentre; #if (shape==0) { a.z = length(d1); } #elif (shape==1) { a.z = (5.0/3.1415926)*abs(atan(d1.y,d1.x)); } #elif (shape==2) { a.z = abs(d1.x); } #else { a.z = abs(d1.x); d1.y = abs(d1.y); a.z = a.z + d1.y; } #endif if (((method==0)&&(a.z<a.x))||(((method>0)&&(a.z>a.x)&&(a.z<threshold)))) { a.x = a.z; d2 = z; a.y = float(i); } } #elif (colouring==8) { a.y = a.x; a.x = a.x + log(1.0 + 0.5*log(a.z)); } #elif (colouring==13) { a.y = a.x; d1.y = 0.1 + length(z - pixel); d2.x = 0.5*abs(d1.y - d1.x); d1.y = d1.y + d1.x - d2.x; a.x += (sqrt(a.z) - d2.x)/d1.y; } #elif (colouring==14) { a.y = a.x; if (i>0) { d1 = z - vec2(zold.x,zold.y); d2 = vec2(zold.x,zold.y) - vec2(zold.z,zold.w); a.z = d1.x*d2.x + d1.y*d2.y; d1.y = d1.y*d2.x - d1.x*d2.y; a.x += abs(atan(d1.y,a.z)); } } #endif } #endif
#if (orbitfx>0) { z = s; } #endif
} while (++i<maxiter); if (i>=maxiter) { z.y = 2.0*sqrt(a.w);//float(maxiter);// } gl_FragColor = texture2D(palettes, z); }
|
|
|
Logged
|
|
|
|
cKleinhuis
|
|
« Reply #7 on: November 16, 2010, 01:14:19 PM » |
|
damn, those are alot branches, i think by using constants and recompiling, most of the branches will be removed
|
|
|
Logged
|
---
divide and conquer - iterate and rule - chaos is No random!
|
|
|
cbuchner1
|
|
« Reply #8 on: November 16, 2010, 03:40:37 PM » |
|
Yes, by "shader 2" I meant using OpenGL ES2..
How do you emulate OpenGL ES2 on the Intel GMA 950? As far as I know Intel only provides a (rather buggy) OpenGL 1.4 implementation for this chip - no ES in sight And this chip is EOL'ed too, meaning driver bugs won't ever get fixed.
|
|
|
Logged
|
|
|
|
David Makin
|
|
« Reply #9 on: November 16, 2010, 09:04:29 PM » |
|
Yes, by "shader 2" I meant using OpenGL ES2..
How do you emulate OpenGL ES2 on the Intel GMA 950? As far as I know Intel only provides a (rather buggy) OpenGL 1.4 implementation for this chip - no ES in sight And this chip is EOL'ed too, meaning driver bugs won't ever get fixed. If you look here: http://www.intel.com/products/chipsets/gma950/index.htmYou'll see it's quoted as having DirectX 9 shader 2 acceleration - I assume Apple have simply extended support for this into the Mac implimentation of OpenGL ES2. But to answer your question directly - I don't know exactly, I just wrote the code in Xcode for the iPhone/iPad and it ran fine in the iPad/iPhone simulators on my mini. Incidentally if I try this using the Webkit enhanced Safari: http://www.ibiblio.org/e-notes/webgl/makin.htmlIn a full-screen window on the mini I get 0 to 3 fps - on the MacPro at work I get 60 to 200
|
|
|
Logged
|
|
|
|
David Makin
|
|
« Reply #10 on: November 16, 2010, 09:13:36 PM » |
|
damn, those are alot branches, i think by using constants and recompiling, most of the branches will be removed
Erm - is using #defines and recompiling. I think there is an absolute maximum of 5 runtime conditionals when the "#" conditionals have been applied, the runtime ones aren't nested more than 1 level and most are just if..endif without any else's.
|
|
« Last Edit: November 16, 2010, 09:17:36 PM by David Makin »
|
Logged
|
|
|
|
|