Logo by Pauldelbrot - Contribute your own Logo!
News: Follow us on Twitter
 
*
Welcome, Guest. Please login or register. August 21, 2017, 08:36:02 AM


Login with username, password and session length



Pages: [1] 2   Go Down
  Print  
Share this topic on DiggShare this topic on FacebookShare this topic on GoogleShare this topic on RedditShare this topic on StumbleUponShare this topic on Twitter
Author Topic: Realtime rendering/optimisations  (Read 9242 times)
0 Members and 1 Guest are viewing this topic.
Enforcer
Guest
« on: November 24, 2009, 12:25:01 PM »

Its not fast enough for 2560x1600 yet, but who knows... Fermi chip hopefully coming soon
60 FPS in 1280x800, GT200b
3 iterations:

4 iterations:


"Improvement" to what ive seen on the forum:
- scalar derivative computation

Those images produced by the following HLSL code:
Code:
#define P 8
inline void powN1(inout float3 z, float zr0, inout float dr) {
//  float zr = sqrt( dot(z,z) );
  float zo0 = asin( z.z/zr0 );
  float zi0 = atan2( z.y,z.x );

  float zr = pow( zr0, P-1 );
  float zo = zo0 * P;
  float zi = zi0 * P;
  
  dr = zr*dr*P + 1;
  zr *= zr0;
  z  = zr*float3( cos(zo)*cos(zi), cos(zo)*sin(zi), sin(zo) );
}

inline float DE(float3 z0)
{
  float3 z=z0;
  float r;
  float dr=1;
  int i=4;
  r=length(z);
  while(r<4 && i--) {
    powN1(z,r,dr);
    z+=z0;
    r=length(z);
  }
  return -0.5*log(r)*r/dr;
}
DX10 bytecode disassembly:
Code:
           dp3 r0.w, r1.xyzx, r1.xyzx
            sqrt r0.w, r0.w
            mov r2.xyz, r1.xyzx
            mov r1.w, r0.w
            mov r2.w, l(1.000000)
            mov r3.x, l(4)
            loop
              lt r3.y, r1.w, l(4.000000)
              iadd r3.z, r3.x, l(-1)
              ine r3.w, r3.x, l(0)
              and r3.y, r3.y, r3.w
              mov r3.x, r3.z
              breakc_z r3.y
              div r3.y, r2.z, r1.w
              add r3.w, -|r3.y|, l(1.000000)
              sqrt r3.w, r3.w
              mad r4.x, |r3.y|, l(-0.018729), l(0.074261)
              mad r4.x, r4.x, |r3.y|, l(-0.212114)
              mad r4.x, r4.x, |r3.y|, l(1.570729)
              mul r4.y, r3.w, r4.x
              mad r4.y, r4.y, l(-2.000000), l(3.141593)
              lt r3.y, r3.y, -r3.y
              and r3.y, r4.y, r3.y
              mad r3.y, r4.x, r3.w, r3.y
              add r3.y, -r3.y, l(1.570796)
              min r3.w, |r2.x|, |r2.y|
              max r4.x, |r2.x|, |r2.y|
              div r4.x, l(1.000000, 1.000000, 1.000000, 1.000000), r4.x
              mul r3.w, r3.w, r4.x
              mul r4.x, r3.w, r3.w
              mad r4.y, r4.x, l(0.020835), l(-0.085133)
              mad r4.y, r4.x, r4.y, l(0.180141)
              mad r4.y, r4.x, r4.y, l(-0.330299)
              mad r4.x, r4.x, r4.y, l(0.999866)
              mul r4.y, r3.w, r4.x
              lt r4.z, |r2.x|, |r2.y|
              mad r4.y, r4.y, l(-2.000000), l(1.570796)
              and r4.y, r4.z, r4.y
              mad r3.w, r3.w, r4.x, r4.y
              lt r4.x, r2.x, -r2.x
              and r4.x, r4.x, l(0xc0490fdb)
              add r3.w, r3.w, r4.x
              min r4.x, r2.x, r2.y
              max r4.y, r2.x, r2.y
              lt r4.x, r4.x, -r4.x
              ge r4.y, r4.y, -r4.y
              and r4.x, r4.x, r4.y
              movc r3.w, r4.x, -r3.w, r3.w
              log r4.x, r1.w
              mul r4.x, r4.x, l(7.000000)
              exp r4.x, r4.x
              mul r3.yw, r3.yyyw, l(0.000000, 8.000000, 0.000000, 8.000000)
              mul r4.y, r2.w, r4.x
              mad r2.w, r4.y, l(8.000000), l(1.000000)
              mul r4.x, r1.w, r4.x
              sincos null, r4.yz, r3.yywy
              mul r5.x, r4.z, r4.y
              sincos r3.w, null, r3.w
              mul r5.y, r4.y, r3.w
              sincos r5.z, null, r3.y
              mad r2.xyz, r4.xxxx, r5.xyzx, r1.xyzx
              dp3 r3.y, r2.xyzx, r2.xyzx
              sqrt r1.w, r3.y
              mov r3.x, r3.z
            endloop
            log r0.w, r1.w
            mul r0.w, r1.w, r0.w
            mul r0.w, r0.w, l(-0.346574)
            div r0.w, r0.w, r2.w
------------------------------------
added: exe + source

edit shader.fx for
  power of z^p+c
  max iteration count      (4)
  max raytrace step count  (50)
  distance threshold       (-0.00025)
  4x 16x AA

30 fps in 1920x1080 on default settengs, GT200b 1620MHz

F1 - fly mode
F8 - stereo, O,P,K,L - separation/convergence

http://rapidshare.de/files/48733881/mandelbulb.enforcer.v1.zip.html
« Last Edit: November 24, 2009, 07:58:34 PM by Enforcer » Logged
cKleinhuis
Administrator
Fractal Senior
*******
Posts: 7044


formerly known as 'Trifox'


WWW
« Reply #1 on: November 24, 2009, 02:07:47 PM »

any executables ?!
Logged

---

divide and conquer - iterate and rule - chaos is No random!
cbuchner1
Fractal Phenom
******
Posts: 443


« Reply #2 on: November 24, 2009, 09:17:36 PM »

Excellent. Any chance this would compile against DirectX 9 as well? DX10 is so Vista-only.

By the way, you rule. The scalar derivative is so much faster, also in my Optix based raytracer.

Christian
« Last Edit: November 24, 2009, 10:05:14 PM by cbuchner1 » Logged
Enforcer
Guest
« Reply #3 on: November 25, 2009, 01:32:43 AM »

Excellent. Any chance this would compile against DirectX 9 as well? DX10 is so Vista-only.
This source obviously wouldnt .
It seems quite a bit of work, there is no similar DX9 sample in SDK.
There is nothing (yet) that wouldnt work in DX9 however.
I found DX10 API more programming-friendly.
Logged
lycium
Fractal Supremo
*****
Posts: 1155



WWW
« Reply #4 on: November 25, 2009, 02:22:49 AM »

zomg those are extremely impressive performance figures! i'm looking forward to trying out the code you've kindly provided at home, finally getting rid of the fixed(ish) step raymarching smiley

great job and thanks for sharing.
Logged

cbuchner1
Fractal Phenom
******
Posts: 443


« Reply #5 on: November 25, 2009, 05:21:17 PM »

By the way, you rule. The scalar derivative is so much faster, also in my Optix based raytracer.

Hmm the positive power Mandelbulbs render faster with the scalar derivative, but the scalar derivative seems to break the rendering for negative powers (Mandeliers, as I call them). Will investigate further.
Logged
Enforcer
Guest
« Reply #6 on: November 27, 2009, 02:47:52 AM »

Pre-computing DE makes rendering ~2-4 times faster (at comparable image quality)

60 FPS in 2560x1600, 3 iterations


44 FPS in 2560x1600, 6 iterations


In theory, sampling from 3D texture should not interfere with computation (as there are many threads (warps) in flight,
each at its own point in code)
So, balance between sampling throughput and computation could lead to best possible performance.

Unfortunately, thats not what i see. ALU instructions between texture fetches are "free", but instructions after all fetches
do decrease performance.
« Last Edit: November 27, 2009, 03:17:37 AM by Enforcer » Logged
keldor314
Guest
« Reply #7 on: November 27, 2009, 02:14:16 PM »

I modified the shader a bit - now it can render dynamic level of detail, so that you can zoom in and see more detail without causing aliasing.  I simply am multiplying the ray march minimum distance by the distance from the camera.





Here's the modified shader:

Code:
//--------------------------------------------------------------------------------------
// Constant Buffer Variables
//--------------------------------------------------------------------------------------
TextureCube txEnv;

SamplerState samLinear
{
    Filter = MIN_MAG_LINEAR_MIP_POINT;
    AddressU = Clamp;
    AddressV = Clamp;
    AddressW = Clamp;
};

cbuffer cbNeverChanges
{
    matrix View;
};

cbuffer cbChangeOnResize
{
    matrix Projection;
    float2 vReverseRes;
};

cbuffer cbChangesEveryFrame
{
    matrix World;
    matrix InvWorldViewProjection;
    matrix InvProjection;
    float4 vMeshColor;
};

struct VS_INPUT
{
    float4 Pos : POSITION;
    float3 Tex : TEXCOORD;
};

struct GS_INPUT
{
    float4 Pos : SV_POSITION;
    float3 Tex : TEXCOORD0;
    float4 View: POSITION;
};

struct PS_INPUT
{
    float4 Pos : SV_POSITION;
    float3 Tex1 : TEXCOORD0;
    float3 Tex2 : TEXCOORD1;
};

//--------------------------------------------------------------------------------------
// Vertex Shader
//--------------------------------------------------------------------------------------
PS_INPUT QuakeVS( VS_INPUT input )
{
    PS_INPUT output = (PS_INPUT)0;
    input.Pos.z = 1;
    output.Tex1.xyz = 0.15+vMeshColor.xyz*0.05;   //WSAD movement
    output.Tex2 = normalize(mul( input.Pos, InvWorldViewProjection ));
    output.Tex1 += output.Tex2*0.01;
    output.Pos =input.Pos;
    output.Pos.z = 0;
    
    return output;
}

GS_INPUT VS( VS_INPUT input )
{
    GS_INPUT output = (GS_INPUT)0;
    output.Pos = mul( input.Pos, World );
    output.Pos = mul( output.Pos, View );
    output.View = output.Pos;
    output.Pos = mul( output.Pos, Projection );
    output.Tex = input.Tex;
    
    return output;
}

[maxvertexcount(3)]
void GS( triangle GS_INPUT input[3], inout TriangleStream<PS_INPUT> TriStream )
{
    PS_INPUT output = (PS_INPUT)0;

    float3x3 m,n;
    m[0] = input[1].Tex - input[0].Tex;
    m[1] = input[2].Tex - input[0].Tex;
    m[2] = cross(m[0], m[1]);

    n[0] = normalize(input[1].View - input[0].View);
    n[1] = normalize(input[2].View - input[0].View);
    n[2] = cross(n[0], n[1]);
    
    for(int i=0; i<3; i++)
    {
        output.Pos = input[i].Pos;
        output.Tex1 = input[i].Tex;
        float3 Norm;
        Norm = input[i].View;
        Norm = mul(n,Norm);
        Norm = -mul(Norm,m);
        output.Tex2 = Norm;
        
        TriStream.Append( output );
    }
    TriStream.RestartStrip();
}

// power
#define P 8
inline void powN1(inout float3 z, float zr0, inout float dr) {
//  float zr = sqrt( dot(z,z) );
  float zo0 = asin( z.z/zr0 );
  float zi0 = atan2( z.y,z.x );

  float zr = pow( zr0, P-1 );
  float zo = zo0 * P;
  float zi = zi0 * P;
  
  dr = zr*dr*P + 1;
  zr *= zr0;
  z  = zr*float3( cos(zo)*cos(zi), cos(zo)*sin(zi), sin(zo) );
}

inline float DE(float3 z0)
{
  float3 z=z0;
  float r;
  float dr=1;
  int i=20;                   //max iteration count
  r=length(z);
  while(r<16. && i--) {
    powN1(z,r,dr);
    z+=z0;
    r=length(z);
  }
  return -0.5*log(r)*r/dr;
}

// 5% faster but pow8 only,  rename to use
inline float DE1(float3 z0)
{
  float3 z=z0;
  float r,r2;
  float dr=1;
  int i=4;                   //max iteration count
  r2=dot(z,z);
  r =sqrt(r2);
  while(r<2 && i--) {
    float zo0 = asin( z.z/r );
    float zi0 = atan2( z.y,z.x );

    float zr = r2*r2*r2*r;//pow( zr0, P-1 );
    float zo = zo0 * P;
    float zi = zi0 * P;
    
    dr = zr*dr*P + 1;
    zr *= r;
    z  = zr*float3( cos(zo)*cos(zi), cos(zo)*sin(zi), sin(zo) );

    z+=z0;
    r2=dot(z,z);
    r =sqrt(r2);
  }
  return -0.5*log(r)*r/dr;
}

inline float Tex(float3 t)
{
   float c2 = DE( t );
   return c2;
}

inline float3 CalcNorm(float3 t, float c)
{
   float delta=4.0/25600.0;
   float3 tx1 = t;
   tx1.x+=delta;
   float cx1 = Tex( tx1 );
   float3 ty1 = t;
   ty1.y+=delta;
   float cy1 = Tex( ty1 );
   float3 tz1 = t;
   tz1.z+=delta;
   float cz1 = Tex( tz1 );
   float3 d1 = float3(c-cx1,c-cy1,c-cz1);
   return normalize(d1);//*25600;
}

inline float3 CalcNormDD(float3 t, float c)
{
   float3 n1=ddx(t);
   float3 n2=ddy(t);
   return normalize(cross(n1,n2));
}

inline void Ray1(inout float3 t, inout float c, in float3 Norm)
{
   //max raytrace step count
   //distance threshold
   float3 t0 = t;
   for (int i = 0;i<450;i++) { t += .75*Norm*c; c = Tex(t);  [branch] if(c>-0.0003*length(t-t0)) break;       };
}

//--------------------------------------------------------------------------------------
// Pixel Shader
//--------------------------------------------------------------------------------------
float4 PS2( PS_INPUT input) : SV_Target
{
   float3 Norm = normalize(input.Tex2);
   float3 t = input.Tex1;

   t-=0.5;t*=2.5;

   float c;
   c = Tex(t);
   Ray1(t,c,Norm);

   float3 dx = CalcNorm(t,c);

   float ao=-Tex(t+dx*0.05)*40+0.2;

   float3 reflVec = reflect(Norm,dx);
   float3 refl = txEnv.Sample( samLinear, -reflVec.zxy);

   float l =dot(dx,Norm);
   l *= l;

//   return float4(0,ao,0,c*256+1.0f);// * vMeshColor;+vMeshColor.x*4
   return float4(refl*2*l*ao,c*256*0.4+1.0f);// * vMeshColor;+vMeshColor.x*4
}

float4 PS2AA( PS_INPUT input) : SV_Target
{
  float3 x=ddx(input.Tex1)*0.5;
  float3 y=ddy(input.Tex1)*0.5;
  float3 nx=ddx(input.Tex2)*0.5;
  float3 ny=ddy(input.Tex2)*0.5;
  float4 c;
  c=PS2(input);
  input.Tex1+=x;
  input.Tex2+=nx;
  c+=PS2(input);
  input.Tex1+=y;
  input.Tex2+=ny;
  c+=PS2(input);
  input.Tex1-=x;
  input.Tex2-=nx;
  c+=PS2(input);
  return c*0.25;
}

float4 PS2AAA( PS_INPUT input) : SV_Target
{
  float3 x=ddx(input.Tex1)*0.5;
  float3 y=ddy(input.Tex1)*0.5;
  float3 nx=ddx(input.Tex2)*0.5;
  float3 ny=ddy(input.Tex2)*0.5;
  float4 c;
  c=PS2AA(input);
  input.Tex1+=x;
  input.Tex2+=nx;
  c+=PS2AA(input);
  input.Tex1+=y;
  input.Tex2+=ny;
  c+=PS2AA(input);
  input.Tex1-=x;
  input.Tex2-=nx;
  c+=PS2AA(input);
  return c*0.25;
}

BlendState NoBlending
{
    AlphaToCoverageEnable = FALSE;
    BlendEnable[0] = FALSE;
};

BlendState SrcBlending
{
    AlphaToCoverageEnable = FALSE;
    BlendEnable[0] = TRUE;
    SrcBlend = SRC_ALPHA;
    DestBlend = INV_SRC_ALPHA;
    BlendOp = ADD;
};

DepthStencilState DisableDepth
{
    DepthEnable = FALSE;
    DepthWriteMask = ZERO;
};

//--------------------------------------------------------------------------------------
technique10 Render1
{
    pass P0
    {
        SetVertexShader( CompileShader( vs_4_0, VS() ) );
        SetGeometryShader( NULL );
        SetPixelShader( NULL );
        SetBlendState( SrcBlending, float4( 0.0f, 0.0f, 0.0f, 0.0f ), 0xFFFFFFFF );
        SetDepthStencilState( DisableDepth, 0 );
    }
}

technique10 Render2
{
    pass P0
    {
        SetVertexShader( CompileShader( vs_4_0, VS() ) );
        SetGeometryShader( CompileShader( gs_4_0, GS() ) );
        // PS2AA  for 4x antialiasing
        // PS2AAA for 16x antialiasing
        SetPixelShader( CompileShader( ps_4_0, PS2() ) );
        SetBlendState( SrcBlending, float4( 0.0f, 0.0f, 0.0f, 0.0f ), 0xFFFFFFFF );
        SetDepthStencilState( DisableDepth, 0 );
//        SetPixelShader( NULL );
    }
}

// fly mode
technique10 RenderQuad
{
    pass P0
    {
        SetVertexShader( CompileShader( vs_4_0, QuakeVS() ) );
        SetGeometryShader( NULL );
//        SetPixelShader( NULL );
        // PS2AA  for 4x antialiasing
        // PS2AAA for 16x antialiasing
        SetPixelShader( CompileShader( ps_4_0, PS2() ) );
        SetDepthStencilState( DisableDepth, 0 );
    }
}

I also increased the max iterations and ray marching steps, as well as pushing the bailout out to 16 to increase distance estimation accuracy.  In addition, I'm multiplying the ray step distance by .75 to relax the stepping a bit, which removes some artifacting when you view certain parts of the fractal from certain angles.
« Last Edit: November 27, 2009, 02:33:00 PM by keldor314 » Logged
keldor314
Guest
« Reply #8 on: November 27, 2009, 02:54:02 PM »

One big improvement to the camera controls would be to multiply the camera speed by the distance estimation from the fractal.  Thus, the camera would move slower the closer it is to the fractal.
Logged
lycium
Fractal Supremo
*****
Posts: 1155



WWW
« Reply #9 on: November 27, 2009, 05:03:14 PM »

iq reports in another thread for his zooming video:

I have used LOD here. The contact-epsilon used in the distance field raymarcher depends on the distance from the point to the camera and the field of view: eps = k * t / sqrt( 1 + focalLength^2) or something similar, you do the math again, I lost the paper somewhere.
Logged

flexiverse
Safarist
******
Posts: 99



« Reply #10 on: July 12, 2010, 05:41:36 PM »

Has anyone got the link to this program/code? 

http://rapidshare.de/files/48733881/mandelbulb.enforcer.v1.zip.html

The above link does not work?
Logged
quaternion
Guest
« Reply #11 on: July 12, 2010, 08:52:07 PM »

Great!!!
Logged
quaternion
Guest
« Reply #12 on: July 12, 2010, 09:01:56 PM »

The above link does not work?

No
Logged
dapa
Guest
« Reply #13 on: September 20, 2010, 10:57:13 PM »

Please, Enforcer or anyone else, post the source code and binaries again!  sad
Logged
Nahee_Enterprises
World Renowned
Fractal Senior
******
Posts: 2250


use email to contact


nahee_enterprises Nahee.Enterprises NaheeEnterprise
WWW
« Reply #14 on: September 27, 2010, 03:25:50 PM »

    Please, Enforcer or anyone else, post the source code and binaries again!  sad

Greetings, and Welcome to this particular Forum !!!    smiley

I am afraid that "Enforcer" has not posted a thing to the Forums since 11/26/2009, and was last logged into the Forums on 07/21/2010.  It appears they are rarely here anymore.

It might be difficult to acquire what was once specified at the above links.
 
Logged

Pages: [1] 2   Go Down
  Print  
 
Jump to:  

Related Topics
Subject Started by Replies Views Last post
Realtime rendering on GPU 3D Fractal Generation « 1 2 3 4 » alexl 45 18031 Last post September 12, 2014, 12:04:33 PM
by Mrz00m
realtime zoom - positions of minibrots ? Programming cKleinhuis 12 3764 Last post September 11, 2011, 11:51:52 PM
by Duncan C
realtime gpu render?! 3D Fractal Generation slon_ru 1 1593 Last post February 10, 2013, 01:52:06 AM
by Apophyster
realtime rendering with gl shader Movies Showcase (Rate My Movie) sleeplessmonk 2 506 Last post October 12, 2014, 09:04:47 PM
by SeryZone
Mandelbulb in 3d realtime for VR The 3D Mandelbulb Chillheimer 0 871 Last post January 08, 2016, 06:40:38 PM
by Chillheimer

Powered by MySQL Powered by PHP Powered by SMF 1.1.21 | SMF © 2015, Simple Machines

Valid XHTML 1.0! Valid CSS! Dilber MC Theme by HarzeM
Page created in 0.332 seconds with 29 queries. (Pretty URLs adds 0.014s, 2q)