Welcome to Fractal Forums

Fractal Math, Chaos Theory & Research => Mandelbulb Implementation => Topic started by: Enforcer on November 24, 2009, 12:25:01 PM




Title: Realtime rendering/optimisations
Post by: Enforcer on November 24, 2009, 12:25:01 PM
Its not fast enough for 2560x1600 yet, but who knows... Fermi chip hopefully coming soon
60 FPS in 1280x800, GT200b
3 iterations:
(http://img403.imageshack.us/img403/92/mand60fps.jpg)
4 iterations:
(http://img187.imageshack.us/img187/8127/mand60fps2.jpg)

"Improvement" to what ive seen on the forum:
- scalar derivative computation

Those images produced by the following HLSL code:
Code:
#define P 8
inline void powN1(inout float3 z, float zr0, inout float dr) {
//  float zr = sqrt( dot(z,z) );
  float zo0 = asin( z.z/zr0 );
  float zi0 = atan2( z.y,z.x );

  float zr = pow( zr0, P-1 );
  float zo = zo0 * P;
  float zi = zi0 * P;
  
  dr = zr*dr*P + 1;
  zr *= zr0;
  z  = zr*float3( cos(zo)*cos(zi), cos(zo)*sin(zi), sin(zo) );
}

inline float DE(float3 z0)
{
  float3 z=z0;
  float r;
  float dr=1;
  int i=4;
  r=length(z);
  while(r<4 && i--) {
    powN1(z,r,dr);
    z+=z0;
    r=length(z);
  }
  return -0.5*log(r)*r/dr;
}
DX10 bytecode disassembly:
Code:
            dp3 r0.w, r1.xyzx, r1.xyzx
            sqrt r0.w, r0.w
            mov r2.xyz, r1.xyzx
            mov r1.w, r0.w
            mov r2.w, l(1.000000)
            mov r3.x, l(4)
            loop
              lt r3.y, r1.w, l(4.000000)
              iadd r3.z, r3.x, l(-1)
              ine r3.w, r3.x, l(0)
              and r3.y, r3.y, r3.w
              mov r3.x, r3.z
              breakc_z r3.y
              div r3.y, r2.z, r1.w
              add r3.w, -|r3.y|, l(1.000000)
              sqrt r3.w, r3.w
              mad r4.x, |r3.y|, l(-0.018729), l(0.074261)
              mad r4.x, r4.x, |r3.y|, l(-0.212114)
              mad r4.x, r4.x, |r3.y|, l(1.570729)
              mul r4.y, r3.w, r4.x
              mad r4.y, r4.y, l(-2.000000), l(3.141593)
              lt r3.y, r3.y, -r3.y
              and r3.y, r4.y, r3.y
              mad r3.y, r4.x, r3.w, r3.y
              add r3.y, -r3.y, l(1.570796)
              min r3.w, |r2.x|, |r2.y|
              max r4.x, |r2.x|, |r2.y|
              div r4.x, l(1.000000, 1.000000, 1.000000, 1.000000), r4.x
              mul r3.w, r3.w, r4.x
              mul r4.x, r3.w, r3.w
              mad r4.y, r4.x, l(0.020835), l(-0.085133)
              mad r4.y, r4.x, r4.y, l(0.180141)
              mad r4.y, r4.x, r4.y, l(-0.330299)
              mad r4.x, r4.x, r4.y, l(0.999866)
              mul r4.y, r3.w, r4.x
              lt r4.z, |r2.x|, |r2.y|
              mad r4.y, r4.y, l(-2.000000), l(1.570796)
              and r4.y, r4.z, r4.y
              mad r3.w, r3.w, r4.x, r4.y
              lt r4.x, r2.x, -r2.x
              and r4.x, r4.x, l(0xc0490fdb)
              add r3.w, r3.w, r4.x
              min r4.x, r2.x, r2.y
              max r4.y, r2.x, r2.y
              lt r4.x, r4.x, -r4.x
              ge r4.y, r4.y, -r4.y
              and r4.x, r4.x, r4.y
              movc r3.w, r4.x, -r3.w, r3.w
              log r4.x, r1.w
              mul r4.x, r4.x, l(7.000000)
              exp r4.x, r4.x
              mul r3.yw, r3.yyyw, l(0.000000, 8.000000, 0.000000, 8.000000)
              mul r4.y, r2.w, r4.x
              mad r2.w, r4.y, l(8.000000), l(1.000000)
              mul r4.x, r1.w, r4.x
              sincos null, r4.yz, r3.yywy
              mul r5.x, r4.z, r4.y
              sincos r3.w, null, r3.w
              mul r5.y, r4.y, r3.w
              sincos r5.z, null, r3.y
              mad r2.xyz, r4.xxxx, r5.xyzx, r1.xyzx
              dp3 r3.y, r2.xyzx, r2.xyzx
              sqrt r1.w, r3.y
              mov r3.x, r3.z
            endloop
            log r0.w, r1.w
            mul r0.w, r1.w, r0.w
            mul r0.w, r0.w, l(-0.346574)
            div r0.w, r0.w, r2.w
------------------------------------
added: exe + source

edit shader.fx for
  power of z^p+c
  max iteration count      (4)
  max raytrace step count  (50)
  distance threshold       (-0.00025)
  4x 16x AA

30 fps in 1920x1080 on default settengs, GT200b 1620MHz

F1 - fly mode
F8 - stereo, O,P,K,L - separation/convergence

http://rapidshare.de/files/48733881/mandelbulb.enforcer.v1.zip.html (http://rapidshare.de/files/48733881/mandelbulb.enforcer.v1.zip.html)


Title: Re: Realtime rendering/optimisations
Post by: cKleinhuis on November 24, 2009, 02:07:47 PM
any executables ?!


Title: Re: Realtime rendering/optimisations
Post by: cbuchner1 on November 24, 2009, 09:17:36 PM
Excellent. Any chance this would compile against DirectX 9 as well? DX10 is so Vista-only.

By the way, you rule. The scalar derivative is so much faster, also in my Optix based raytracer.

Christian


Title: Re: Realtime rendering/optimisations
Post by: Enforcer on November 25, 2009, 01:32:43 AM
Excellent. Any chance this would compile against DirectX 9 as well? DX10 is so Vista-only.
This source obviously wouldnt .
It seems quite a bit of work, there is no similar DX9 sample in SDK.
There is nothing (yet) that wouldnt work in DX9 however.
I found DX10 API more programming-friendly.


Title: Re: Realtime rendering/optimisations
Post by: lycium on November 25, 2009, 02:22:49 AM
zomg those are extremely impressive performance figures! i'm looking forward to trying out the code you've kindly provided at home, finally getting rid of the fixed(ish) step raymarching :)

great job and thanks for sharing.


Title: Re: Realtime rendering/optimisations
Post by: cbuchner1 on November 25, 2009, 05:21:17 PM
By the way, you rule. The scalar derivative is so much faster, also in my Optix based raytracer.

Hmm the positive power Mandelbulbs render faster with the scalar derivative, but the scalar derivative seems to break the rendering for negative powers (Mandeliers, as I call them). Will investigate further.


Title: Re: Realtime rendering/optimisations
Post by: Enforcer on November 27, 2009, 02:47:52 AM
Pre-computing DE makes rendering ~2-4 times faster (at comparable image quality)

60 FPS in 2560x1600, 3 iterations
(http://img228.imageshack.us/img228/9702/mandcol1.jpg)

44 FPS in 2560x1600, 6 iterations
(http://img301.imageshack.us/img301/9702/mandcol1.jpg)

In theory, sampling from 3D texture should not interfere with computation (as there are many threads (warps) in flight,
each at its own point in code)
So, balance between sampling throughput and computation could lead to best possible performance.

Unfortunately, thats not what i see. ALU instructions between texture fetches are "free", but instructions after all fetches
do decrease performance.


Title: Re: Realtime rendering/optimisations
Post by: keldor314 on November 27, 2009, 02:14:16 PM
I modified the shader a bit - now it can render dynamic level of detail, so that you can zoom in and see more detail without causing aliasing.  I simply am multiplying the ray march minimum distance by the distance from the camera.

(http://i4.photobucket.com/albums/y136/Keldor314/DynamicLoD.jpg)

(http://i4.photobucket.com/albums/y136/Keldor314/LookingOut.png)

Here's the modified shader:

Code:
//--------------------------------------------------------------------------------------
// Constant Buffer Variables
//--------------------------------------------------------------------------------------
TextureCube txEnv;

SamplerState samLinear
{
    Filter = MIN_MAG_LINEAR_MIP_POINT;
    AddressU = Clamp;
    AddressV = Clamp;
    AddressW = Clamp;
};

cbuffer cbNeverChanges
{
    matrix View;
};

cbuffer cbChangeOnResize
{
    matrix Projection;
    float2 vReverseRes;
};

cbuffer cbChangesEveryFrame
{
    matrix World;
    matrix InvWorldViewProjection;
    matrix InvProjection;
    float4 vMeshColor;
};

struct VS_INPUT
{
    float4 Pos : POSITION;
    float3 Tex : TEXCOORD;
};

struct GS_INPUT
{
    float4 Pos : SV_POSITION;
    float3 Tex : TEXCOORD0;
    float4 View: POSITION;
};

struct PS_INPUT
{
    float4 Pos : SV_POSITION;
    float3 Tex1 : TEXCOORD0;
    float3 Tex2 : TEXCOORD1;
};

//--------------------------------------------------------------------------------------
// Vertex Shader
//--------------------------------------------------------------------------------------
PS_INPUT QuakeVS( VS_INPUT input )
{
    PS_INPUT output = (PS_INPUT)0;
    input.Pos.z = 1;
    output.Tex1.xyz = 0.15+vMeshColor.xyz*0.05;   //WSAD movement
    output.Tex2 = normalize(mul( input.Pos, InvWorldViewProjection ));
    output.Tex1 += output.Tex2*0.01;
    output.Pos =input.Pos;
    output.Pos.z = 0;
    
    return output;
}

GS_INPUT VS( VS_INPUT input )
{
    GS_INPUT output = (GS_INPUT)0;
    output.Pos = mul( input.Pos, World );
    output.Pos = mul( output.Pos, View );
    output.View = output.Pos;
    output.Pos = mul( output.Pos, Projection );
    output.Tex = input.Tex;
    
    return output;
}

[maxvertexcount(3)]
void GS( triangle GS_INPUT input[3], inout TriangleStream<PS_INPUT> TriStream )
{
    PS_INPUT output = (PS_INPUT)0;

    float3x3 m,n;
    m[0] = input[1].Tex - input[0].Tex;
    m[1] = input[2].Tex - input[0].Tex;
    m[2] = cross(m[0], m[1]);

    n[0] = normalize(input[1].View - input[0].View);
    n[1] = normalize(input[2].View - input[0].View);
    n[2] = cross(n[0], n[1]);
    
    for(int i=0; i<3; i++)
    {
        output.Pos = input[i].Pos;
        output.Tex1 = input[i].Tex;
        float3 Norm;
        Norm = input[i].View;
        Norm = mul(n,Norm);
        Norm = -mul(Norm,m);
        output.Tex2 = Norm;
        
        TriStream.Append( output );
    }
    TriStream.RestartStrip();
}

// power
#define P 8
inline void powN1(inout float3 z, float zr0, inout float dr) {
//  float zr = sqrt( dot(z,z) );
  float zo0 = asin( z.z/zr0 );
  float zi0 = atan2( z.y,z.x );

  float zr = pow( zr0, P-1 );
  float zo = zo0 * P;
  float zi = zi0 * P;
  
  dr = zr*dr*P + 1;
  zr *= zr0;
  z  = zr*float3( cos(zo)*cos(zi), cos(zo)*sin(zi), sin(zo) );
}

inline float DE(float3 z0)
{
  float3 z=z0;
  float r;
  float dr=1;
  int i=20;                   //max iteration count
  r=length(z);
  while(r<16. && i--) {
    powN1(z,r,dr);
    z+=z0;
    r=length(z);
  }
  return -0.5*log(r)*r/dr;
}

// 5% faster but pow8 only,  rename to use
inline float DE1(float3 z0)
{
  float3 z=z0;
  float r,r2;
  float dr=1;
  int i=4;                   //max iteration count
  r2=dot(z,z);
  r =sqrt(r2);
  while(r<2 && i--) {
    float zo0 = asin( z.z/r );
    float zi0 = atan2( z.y,z.x );

    float zr = r2*r2*r2*r;//pow( zr0, P-1 );
    float zo = zo0 * P;
    float zi = zi0 * P;
    
    dr = zr*dr*P + 1;
    zr *= r;
    z  = zr*float3( cos(zo)*cos(zi), cos(zo)*sin(zi), sin(zo) );

    z+=z0;
    r2=dot(z,z);
    r =sqrt(r2);
  }
  return -0.5*log(r)*r/dr;
}

inline float Tex(float3 t)
{
   float c2 = DE( t );
   return c2;
}

inline float3 CalcNorm(float3 t, float c)
{
   float delta=4.0/25600.0;
   float3 tx1 = t;
   tx1.x+=delta;
   float cx1 = Tex( tx1 );
   float3 ty1 = t;
   ty1.y+=delta;
   float cy1 = Tex( ty1 );
   float3 tz1 = t;
   tz1.z+=delta;
   float cz1 = Tex( tz1 );
   float3 d1 = float3(c-cx1,c-cy1,c-cz1);
   return normalize(d1);//*25600;
}

inline float3 CalcNormDD(float3 t, float c)
{
   float3 n1=ddx(t);
   float3 n2=ddy(t);
   return normalize(cross(n1,n2));
}

inline void Ray1(inout float3 t, inout float c, in float3 Norm)
{
   //max raytrace step count
   //distance threshold
   float3 t0 = t;
   for (int i = 0;i<450;i++) { t += .75*Norm*c; c = Tex(t);  [branch] if(c>-0.0003*length(t-t0)) break;       };
}

//--------------------------------------------------------------------------------------
// Pixel Shader
//--------------------------------------------------------------------------------------
float4 PS2( PS_INPUT input) : SV_Target
{
   float3 Norm = normalize(input.Tex2);
   float3 t = input.Tex1;

   t-=0.5;t*=2.5;

   float c;
   c = Tex(t);
   Ray1(t,c,Norm);

   float3 dx = CalcNorm(t,c);

   float ao=-Tex(t+dx*0.05)*40+0.2;

   float3 reflVec = reflect(Norm,dx);
   float3 refl = txEnv.Sample( samLinear, -reflVec.zxy);

   float l =dot(dx,Norm);
   l *= l;

//   return float4(0,ao,0,c*256+1.0f);// * vMeshColor;+vMeshColor.x*4
   return float4(refl*2*l*ao,c*256*0.4+1.0f);// * vMeshColor;+vMeshColor.x*4
}

float4 PS2AA( PS_INPUT input) : SV_Target
{
  float3 x=ddx(input.Tex1)*0.5;
  float3 y=ddy(input.Tex1)*0.5;
  float3 nx=ddx(input.Tex2)*0.5;
  float3 ny=ddy(input.Tex2)*0.5;
  float4 c;
  c=PS2(input);
  input.Tex1+=x;
  input.Tex2+=nx;
  c+=PS2(input);
  input.Tex1+=y;
  input.Tex2+=ny;
  c+=PS2(input);
  input.Tex1-=x;
  input.Tex2-=nx;
  c+=PS2(input);
  return c*0.25;
}

float4 PS2AAA( PS_INPUT input) : SV_Target
{
  float3 x=ddx(input.Tex1)*0.5;
  float3 y=ddy(input.Tex1)*0.5;
  float3 nx=ddx(input.Tex2)*0.5;
  float3 ny=ddy(input.Tex2)*0.5;
  float4 c;
  c=PS2AA(input);
  input.Tex1+=x;
  input.Tex2+=nx;
  c+=PS2AA(input);
  input.Tex1+=y;
  input.Tex2+=ny;
  c+=PS2AA(input);
  input.Tex1-=x;
  input.Tex2-=nx;
  c+=PS2AA(input);
  return c*0.25;
}

BlendState NoBlending
{
    AlphaToCoverageEnable = FALSE;
    BlendEnable[0] = FALSE;
};

BlendState SrcBlending
{
    AlphaToCoverageEnable = FALSE;
    BlendEnable[0] = TRUE;
    SrcBlend = SRC_ALPHA;
    DestBlend = INV_SRC_ALPHA;
    BlendOp = ADD;
};

DepthStencilState DisableDepth
{
    DepthEnable = FALSE;
    DepthWriteMask = ZERO;
};

//--------------------------------------------------------------------------------------
technique10 Render1
{
    pass P0
    {
        SetVertexShader( CompileShader( vs_4_0, VS() ) );
        SetGeometryShader( NULL );
        SetPixelShader( NULL );
        SetBlendState( SrcBlending, float4( 0.0f, 0.0f, 0.0f, 0.0f ), 0xFFFFFFFF );
        SetDepthStencilState( DisableDepth, 0 );
    }
}

technique10 Render2
{
    pass P0
    {
        SetVertexShader( CompileShader( vs_4_0, VS() ) );
        SetGeometryShader( CompileShader( gs_4_0, GS() ) );
        // PS2AA  for 4x antialiasing
        // PS2AAA for 16x antialiasing
        SetPixelShader( CompileShader( ps_4_0, PS2() ) );
        SetBlendState( SrcBlending, float4( 0.0f, 0.0f, 0.0f, 0.0f ), 0xFFFFFFFF );
        SetDepthStencilState( DisableDepth, 0 );
//        SetPixelShader( NULL );
    }
}

// fly mode
technique10 RenderQuad
{
    pass P0
    {
        SetVertexShader( CompileShader( vs_4_0, QuakeVS() ) );
        SetGeometryShader( NULL );
//        SetPixelShader( NULL );
        // PS2AA  for 4x antialiasing
        // PS2AAA for 16x antialiasing
        SetPixelShader( CompileShader( ps_4_0, PS2() ) );
        SetDepthStencilState( DisableDepth, 0 );
    }
}

I also increased the max iterations and ray marching steps, as well as pushing the bailout out to 16 to increase distance estimation accuracy.  In addition, I'm multiplying the ray step distance by .75 to relax the stepping a bit, which removes some artifacting when you view certain parts of the fractal from certain angles.


Title: Re: Realtime rendering/optimisations
Post by: keldor314 on November 27, 2009, 02:54:02 PM
One big improvement to the camera controls would be to multiply the camera speed by the distance estimation from the fractal.  Thus, the camera would move slower the closer it is to the fractal.


Title: Re: Realtime rendering/optimisations
Post by: lycium on November 27, 2009, 05:03:14 PM
iq reports in another thread for his zooming video:

I have used LOD here. The contact-epsilon used in the distance field raymarcher depends on the distance from the point to the camera and the field of view: eps = k * t / sqrt( 1 + focalLength^2) or something similar, you do the math again, I lost the paper somewhere.


Title: Re: Realtime rendering/optimisations
Post by: flexiverse on July 12, 2010, 05:41:36 PM
Has anyone got the link to this program/code? 

http://rapidshare.de/files/48733881/mandelbulb.enforcer.v1.zip.html

The above link does not work?


Title: Re: Realtime rendering/optimisations
Post by: quaternion on July 12, 2010, 08:52:07 PM
Great!!!


Title: Re: Realtime rendering/optimisations
Post by: quaternion on July 12, 2010, 09:01:56 PM
The above link does not work?

No


Title: Re: Realtime rendering/optimisations
Post by: dapa on September 20, 2010, 10:57:13 PM
Please, Enforcer or anyone else, post the source code and binaries again!  :sad1:


Title: Re: Realtime rendering/optimisations
Post by: Nahee_Enterprises on September 27, 2010, 03:25:50 PM
    Please, Enforcer or anyone else, post the source code and binaries again!  :sad1:

Greetings, and Welcome to this particular Forum !!!    :)

I am afraid that "Enforcer" has not posted a thing to the Forums since 11/26/2009, and was last logged into the Forums on 07/21/2010.  It appears they are rarely here anymore.

It might be difficult to acquire what was once specified at the above links.
 


Title: Re: Realtime rendering/optimisations
Post by: keldor314 on October 04, 2010, 09:27:41 AM
Here's the version I was playing with.  I changed a number of things, so you'll want to run the exe in the debug folder.  Also, you might want to lower the number of iterations and the max number of raymarch steps in the .fx file, since I have it set to a very high quality which only runs at a few FPS on a high end GPU.  On a low end one, it would likely crash the driver due to the watchdog timer timing out.

http://dl.dropbox.com/u/424639/mandelbulb.enforcer.v1.zip (http://dl.dropbox.com/u/424639/mandelbulb.enforcer.v1.zip)