Enforcer
Guest
|
|
« on: November 24, 2009, 12:25:01 PM » |
|
Its not fast enough for 2560x1600 yet, but who knows... Fermi chip hopefully coming soon 60 FPS in 1280x800, GT200b 3 iterations: 4 iterations: "Improvement" to what ive seen on the forum: - scalar derivative computation Those images produced by the following HLSL code: #define P 8 inline void powN1(inout float3 z, float zr0, inout float dr) { // float zr = sqrt( dot(z,z) ); float zo0 = asin( z.z/zr0 ); float zi0 = atan2( z.y,z.x );
float zr = pow( zr0, P-1 ); float zo = zo0 * P; float zi = zi0 * P; dr = zr*dr*P + 1; zr *= zr0; z = zr*float3( cos(zo)*cos(zi), cos(zo)*sin(zi), sin(zo) ); }
inline float DE(float3 z0) { float3 z=z0; float r; float dr=1; int i=4; r=length(z); while(r<4 && i--) { powN1(z,r,dr); z+=z0; r=length(z); } return -0.5*log(r)*r/dr; }
DX10 bytecode disassembly: dp3 r0.w, r1.xyzx, r1.xyzx sqrt r0.w, r0.w mov r2.xyz, r1.xyzx mov r1.w, r0.w mov r2.w, l(1.000000) mov r3.x, l(4) loop lt r3.y, r1.w, l(4.000000) iadd r3.z, r3.x, l(-1) ine r3.w, r3.x, l(0) and r3.y, r3.y, r3.w mov r3.x, r3.z breakc_z r3.y div r3.y, r2.z, r1.w add r3.w, -|r3.y|, l(1.000000) sqrt r3.w, r3.w mad r4.x, |r3.y|, l(-0.018729), l(0.074261) mad r4.x, r4.x, |r3.y|, l(-0.212114) mad r4.x, r4.x, |r3.y|, l(1.570729) mul r4.y, r3.w, r4.x mad r4.y, r4.y, l(-2.000000), l(3.141593) lt r3.y, r3.y, -r3.y and r3.y, r4.y, r3.y mad r3.y, r4.x, r3.w, r3.y add r3.y, -r3.y, l(1.570796) min r3.w, |r2.x|, |r2.y| max r4.x, |r2.x|, |r2.y| div r4.x, l(1.000000, 1.000000, 1.000000, 1.000000), r4.x mul r3.w, r3.w, r4.x mul r4.x, r3.w, r3.w mad r4.y, r4.x, l(0.020835), l(-0.085133) mad r4.y, r4.x, r4.y, l(0.180141) mad r4.y, r4.x, r4.y, l(-0.330299) mad r4.x, r4.x, r4.y, l(0.999866) mul r4.y, r3.w, r4.x lt r4.z, |r2.x|, |r2.y| mad r4.y, r4.y, l(-2.000000), l(1.570796) and r4.y, r4.z, r4.y mad r3.w, r3.w, r4.x, r4.y lt r4.x, r2.x, -r2.x and r4.x, r4.x, l(0xc0490fdb) add r3.w, r3.w, r4.x min r4.x, r2.x, r2.y max r4.y, r2.x, r2.y lt r4.x, r4.x, -r4.x ge r4.y, r4.y, -r4.y and r4.x, r4.x, r4.y movc r3.w, r4.x, -r3.w, r3.w log r4.x, r1.w mul r4.x, r4.x, l(7.000000) exp r4.x, r4.x mul r3.yw, r3.yyyw, l(0.000000, 8.000000, 0.000000, 8.000000) mul r4.y, r2.w, r4.x mad r2.w, r4.y, l(8.000000), l(1.000000) mul r4.x, r1.w, r4.x sincos null, r4.yz, r3.yywy mul r5.x, r4.z, r4.y sincos r3.w, null, r3.w mul r5.y, r4.y, r3.w sincos r5.z, null, r3.y mad r2.xyz, r4.xxxx, r5.xyzx, r1.xyzx dp3 r3.y, r2.xyzx, r2.xyzx sqrt r1.w, r3.y mov r3.x, r3.z endloop log r0.w, r1.w mul r0.w, r1.w, r0.w mul r0.w, r0.w, l(-0.346574) div r0.w, r0.w, r2.w
------------------------------------ added: exe + source edit shader.fx for power of z^p+c max iteration count (4) max raytrace step count (50) distance threshold (-0.00025) 4x 16x AA 30 fps in 1920x1080 on default settengs, GT200b 1620MHz F1 - fly mode F8 - stereo, O,P,K,L - separation/convergence http://rapidshare.de/files/48733881/mandelbulb.enforcer.v1.zip.html
|
|
« Last Edit: November 24, 2009, 07:58:34 PM by Enforcer »
|
Logged
|
|
|
|
cKleinhuis
|
|
« Reply #1 on: November 24, 2009, 02:07:47 PM » |
|
any executables ?!
|
|
|
Logged
|
---
divide and conquer - iterate and rule - chaos is No random!
|
|
|
cbuchner1
|
|
« Reply #2 on: November 24, 2009, 09:17:36 PM » |
|
Excellent. Any chance this would compile against DirectX 9 as well? DX10 is so Vista-only.
By the way, you rule. The scalar derivative is so much faster, also in my Optix based raytracer.
Christian
|
|
« Last Edit: November 24, 2009, 10:05:14 PM by cbuchner1 »
|
Logged
|
|
|
|
Enforcer
Guest
|
|
« Reply #3 on: November 25, 2009, 01:32:43 AM » |
|
Excellent. Any chance this would compile against DirectX 9 as well? DX10 is so Vista-only.
This source obviously wouldnt . It seems quite a bit of work, there is no similar DX9 sample in SDK. There is nothing (yet) that wouldnt work in DX9 however. I found DX10 API more programming-friendly.
|
|
|
Logged
|
|
|
|
lycium
|
|
« Reply #4 on: November 25, 2009, 02:22:49 AM » |
|
zomg those are extremely impressive performance figures! i'm looking forward to trying out the code you've kindly provided at home, finally getting rid of the fixed(ish) step raymarching great job and thanks for sharing.
|
|
|
Logged
|
|
|
|
cbuchner1
|
|
« Reply #5 on: November 25, 2009, 05:21:17 PM » |
|
By the way, you rule. The scalar derivative is so much faster, also in my Optix based raytracer.
Hmm the positive power Mandelbulbs render faster with the scalar derivative, but the scalar derivative seems to break the rendering for negative powers (Mandeliers, as I call them). Will investigate further.
|
|
|
Logged
|
|
|
|
Enforcer
Guest
|
|
« Reply #6 on: November 27, 2009, 02:47:52 AM » |
|
Pre-computing DE makes rendering ~2-4 times faster (at comparable image quality) 60 FPS in 2560x1600, 3 iterations 44 FPS in 2560x1600, 6 iterations In theory, sampling from 3D texture should not interfere with computation (as there are many threads (warps) in flight, each at its own point in code) So, balance between sampling throughput and computation could lead to best possible performance. Unfortunately, thats not what i see. ALU instructions between texture fetches are "free", but instructions after all fetches do decrease performance.
|
|
« Last Edit: November 27, 2009, 03:17:37 AM by Enforcer »
|
Logged
|
|
|
|
keldor314
Guest
|
|
« Reply #7 on: November 27, 2009, 02:14:16 PM » |
|
I modified the shader a bit - now it can render dynamic level of detail, so that you can zoom in and see more detail without causing aliasing. I simply am multiplying the ray march minimum distance by the distance from the camera. Here's the modified shader: //-------------------------------------------------------------------------------------- // Constant Buffer Variables //-------------------------------------------------------------------------------------- TextureCube txEnv;
SamplerState samLinear { Filter = MIN_MAG_LINEAR_MIP_POINT; AddressU = Clamp; AddressV = Clamp; AddressW = Clamp; };
cbuffer cbNeverChanges { matrix View; };
cbuffer cbChangeOnResize { matrix Projection; float2 vReverseRes; };
cbuffer cbChangesEveryFrame { matrix World; matrix InvWorldViewProjection; matrix InvProjection; float4 vMeshColor; };
struct VS_INPUT { float4 Pos : POSITION; float3 Tex : TEXCOORD; };
struct GS_INPUT { float4 Pos : SV_POSITION; float3 Tex : TEXCOORD0; float4 View: POSITION; };
struct PS_INPUT { float4 Pos : SV_POSITION; float3 Tex1 : TEXCOORD0; float3 Tex2 : TEXCOORD1; };
//-------------------------------------------------------------------------------------- // Vertex Shader //-------------------------------------------------------------------------------------- PS_INPUT QuakeVS( VS_INPUT input ) { PS_INPUT output = (PS_INPUT)0; input.Pos.z = 1; output.Tex1.xyz = 0.15+vMeshColor.xyz*0.05; //WSAD movement output.Tex2 = normalize(mul( input.Pos, InvWorldViewProjection )); output.Tex1 += output.Tex2*0.01; output.Pos =input.Pos; output.Pos.z = 0; return output; }
GS_INPUT VS( VS_INPUT input ) { GS_INPUT output = (GS_INPUT)0; output.Pos = mul( input.Pos, World ); output.Pos = mul( output.Pos, View ); output.View = output.Pos; output.Pos = mul( output.Pos, Projection ); output.Tex = input.Tex; return output; }
[maxvertexcount(3)] void GS( triangle GS_INPUT input[3], inout TriangleStream<PS_INPUT> TriStream ) { PS_INPUT output = (PS_INPUT)0;
float3x3 m,n; m[0] = input[1].Tex - input[0].Tex; m[1] = input[2].Tex - input[0].Tex; m[2] = cross(m[0], m[1]);
n[0] = normalize(input[1].View - input[0].View); n[1] = normalize(input[2].View - input[0].View); n[2] = cross(n[0], n[1]); for(int i=0; i<3; i++) { output.Pos = input[i].Pos; output.Tex1 = input[i].Tex; float3 Norm; Norm = input[i].View; Norm = mul(n,Norm); Norm = -mul(Norm,m); output.Tex2 = Norm; TriStream.Append( output ); } TriStream.RestartStrip(); }
// power #define P 8 inline void powN1(inout float3 z, float zr0, inout float dr) { // float zr = sqrt( dot(z,z) ); float zo0 = asin( z.z/zr0 ); float zi0 = atan2( z.y,z.x );
float zr = pow( zr0, P-1 ); float zo = zo0 * P; float zi = zi0 * P; dr = zr*dr*P + 1; zr *= zr0; z = zr*float3( cos(zo)*cos(zi), cos(zo)*sin(zi), sin(zo) ); }
inline float DE(float3 z0) { float3 z=z0; float r; float dr=1; int i=20; //max iteration count r=length(z); while(r<16. && i--) { powN1(z,r,dr); z+=z0; r=length(z); } return -0.5*log(r)*r/dr; }
// 5% faster but pow8 only, rename to use inline float DE1(float3 z0) { float3 z=z0; float r,r2; float dr=1; int i=4; //max iteration count r2=dot(z,z); r =sqrt(r2); while(r<2 && i--) { float zo0 = asin( z.z/r ); float zi0 = atan2( z.y,z.x );
float zr = r2*r2*r2*r;//pow( zr0, P-1 ); float zo = zo0 * P; float zi = zi0 * P; dr = zr*dr*P + 1; zr *= r; z = zr*float3( cos(zo)*cos(zi), cos(zo)*sin(zi), sin(zo) );
z+=z0; r2=dot(z,z); r =sqrt(r2); } return -0.5*log(r)*r/dr; }
inline float Tex(float3 t) { float c2 = DE( t ); return c2; }
inline float3 CalcNorm(float3 t, float c) { float delta=4.0/25600.0; float3 tx1 = t; tx1.x+=delta; float cx1 = Tex( tx1 ); float3 ty1 = t; ty1.y+=delta; float cy1 = Tex( ty1 ); float3 tz1 = t; tz1.z+=delta; float cz1 = Tex( tz1 ); float3 d1 = float3(c-cx1,c-cy1,c-cz1); return normalize(d1);//*25600; }
inline float3 CalcNormDD(float3 t, float c) { float3 n1=ddx(t); float3 n2=ddy(t); return normalize(cross(n1,n2)); }
inline void Ray1(inout float3 t, inout float c, in float3 Norm) { //max raytrace step count //distance threshold float3 t0 = t; for (int i = 0;i<450;i++) { t += .75*Norm*c; c = Tex(t); [branch] if(c>-0.0003*length(t-t0)) break; }; }
//-------------------------------------------------------------------------------------- // Pixel Shader //-------------------------------------------------------------------------------------- float4 PS2( PS_INPUT input) : SV_Target { float3 Norm = normalize(input.Tex2); float3 t = input.Tex1;
t-=0.5;t*=2.5;
float c; c = Tex(t); Ray1(t,c,Norm);
float3 dx = CalcNorm(t,c);
float ao=-Tex(t+dx*0.05)*40+0.2;
float3 reflVec = reflect(Norm,dx); float3 refl = txEnv.Sample( samLinear, -reflVec.zxy);
float l =dot(dx,Norm); l *= l;
// return float4(0,ao,0,c*256+1.0f);// * vMeshColor;+vMeshColor.x*4 return float4(refl*2*l*ao,c*256*0.4+1.0f);// * vMeshColor;+vMeshColor.x*4 }
float4 PS2AA( PS_INPUT input) : SV_Target { float3 x=ddx(input.Tex1)*0.5; float3 y=ddy(input.Tex1)*0.5; float3 nx=ddx(input.Tex2)*0.5; float3 ny=ddy(input.Tex2)*0.5; float4 c; c=PS2(input); input.Tex1+=x; input.Tex2+=nx; c+=PS2(input); input.Tex1+=y; input.Tex2+=ny; c+=PS2(input); input.Tex1-=x; input.Tex2-=nx; c+=PS2(input); return c*0.25; }
float4 PS2AAA( PS_INPUT input) : SV_Target { float3 x=ddx(input.Tex1)*0.5; float3 y=ddy(input.Tex1)*0.5; float3 nx=ddx(input.Tex2)*0.5; float3 ny=ddy(input.Tex2)*0.5; float4 c; c=PS2AA(input); input.Tex1+=x; input.Tex2+=nx; c+=PS2AA(input); input.Tex1+=y; input.Tex2+=ny; c+=PS2AA(input); input.Tex1-=x; input.Tex2-=nx; c+=PS2AA(input); return c*0.25; }
BlendState NoBlending { AlphaToCoverageEnable = FALSE; BlendEnable[0] = FALSE; };
BlendState SrcBlending { AlphaToCoverageEnable = FALSE; BlendEnable[0] = TRUE; SrcBlend = SRC_ALPHA; DestBlend = INV_SRC_ALPHA; BlendOp = ADD; };
DepthStencilState DisableDepth { DepthEnable = FALSE; DepthWriteMask = ZERO; };
//-------------------------------------------------------------------------------------- technique10 Render1 { pass P0 { SetVertexShader( CompileShader( vs_4_0, VS() ) ); SetGeometryShader( NULL ); SetPixelShader( NULL ); SetBlendState( SrcBlending, float4( 0.0f, 0.0f, 0.0f, 0.0f ), 0xFFFFFFFF ); SetDepthStencilState( DisableDepth, 0 ); } }
technique10 Render2 { pass P0 { SetVertexShader( CompileShader( vs_4_0, VS() ) ); SetGeometryShader( CompileShader( gs_4_0, GS() ) ); // PS2AA for 4x antialiasing // PS2AAA for 16x antialiasing SetPixelShader( CompileShader( ps_4_0, PS2() ) ); SetBlendState( SrcBlending, float4( 0.0f, 0.0f, 0.0f, 0.0f ), 0xFFFFFFFF ); SetDepthStencilState( DisableDepth, 0 ); // SetPixelShader( NULL ); } }
// fly mode technique10 RenderQuad { pass P0 { SetVertexShader( CompileShader( vs_4_0, QuakeVS() ) ); SetGeometryShader( NULL ); // SetPixelShader( NULL ); // PS2AA for 4x antialiasing // PS2AAA for 16x antialiasing SetPixelShader( CompileShader( ps_4_0, PS2() ) ); SetDepthStencilState( DisableDepth, 0 ); } } I also increased the max iterations and ray marching steps, as well as pushing the bailout out to 16 to increase distance estimation accuracy. In addition, I'm multiplying the ray step distance by .75 to relax the stepping a bit, which removes some artifacting when you view certain parts of the fractal from certain angles.
|
|
« Last Edit: November 27, 2009, 02:33:00 PM by keldor314 »
|
Logged
|
|
|
|
keldor314
Guest
|
|
« Reply #8 on: November 27, 2009, 02:54:02 PM » |
|
One big improvement to the camera controls would be to multiply the camera speed by the distance estimation from the fractal. Thus, the camera would move slower the closer it is to the fractal.
|
|
|
Logged
|
|
|
|
lycium
|
|
« Reply #9 on: November 27, 2009, 05:03:14 PM » |
|
iq reports in another thread for his zooming video:
I have used LOD here. The contact-epsilon used in the distance field raymarcher depends on the distance from the point to the camera and the field of view: eps = k * t / sqrt( 1 + focalLength^2) or something similar, you do the math again, I lost the paper somewhere.
|
|
|
Logged
|
|
|
|
|
quaternion
Guest
|
|
« Reply #11 on: July 12, 2010, 08:52:07 PM » |
|
Great!!!
|
|
|
Logged
|
|
|
|
quaternion
Guest
|
|
« Reply #12 on: July 12, 2010, 09:01:56 PM » |
|
The above link does not work?
No
|
|
|
Logged
|
|
|
|
dapa
Guest
|
|
« Reply #13 on: September 20, 2010, 10:57:13 PM » |
|
Please, Enforcer or anyone else, post the source code and binaries again!
|
|
|
Logged
|
|
|
|
Nahee_Enterprises
|
|
« Reply #14 on: September 27, 2010, 03:25:50 PM » |
|
Please, Enforcer or anyone else, post the source code and binaries again! Greetings, and Welcome to this particular Forum !!! I am afraid that "Enforcer" has not posted a thing to the Forums since 11/26/2009, and was last logged into the Forums on 07/21/2010. It appears they are rarely here anymore. It might be difficult to acquire what was once specified at the above links.
|
|
|
Logged
|
|
|
|
|