Realtime rendering/optimisations

Enforcer

Guest

« on: November 24, 2009, 12:25:01 PM »

Its not fast enough for 2560x1600 yet, but who knows... Fermi chip hopefully coming soon
60 FPS in 1280x800, GT200b
3 iterations:

4 iterations:

"Improvement" to what ive seen on the forum:
- scalar derivative computation

Those images produced by the following HLSL code:

Code:

#define P 8
inline void powN1(inout float3 z, float zr0, inout float dr) {
// float zr = sqrt( dot(z,z) );
  float zo0 = asin( z.z/zr0 );
  float zi0 = atan2( z.y,z.x );

  float zr = pow( zr0, P-1 );
  float zo = zo0 * P;
  float zi = zi0 * P;

  dr = zr*dr*P + 1;
  zr *= zr0;
  z = zr*float3( cos(zo)*cos(zi), cos(zo)*sin(zi), sin(zo) );
}

inline float DE(float3 z0)
{
  float3 z=z0;
  float r;
  float dr=1;
  int i=4;
  r=length(z);
  while(r<4 && i--) {
   powN1(z,r,dr);
   z+=z0;
   r=length(z);
  }
  return -0.5*log(r)*r/dr;
}

DX10 bytecode disassembly:

Code:

dp3 r0.w, r1.xyzx, r1.xyzx
   sqrt r0.w, r0.w
   mov r2.xyz, r1.xyzx
   mov r1.w, r0.w
   mov r2.w, l(1.000000)
   mov r3.x, l(4)
   loop
   lt r3.y, r1.w, l(4.000000)
   iadd r3.z, r3.x, l(-1)
   ine r3.w, r3.x, l(0)
   and r3.y, r3.y, r3.w
   mov r3.x, r3.z
   breakc_z r3.y
   div r3.y, r2.z, r1.w
   add r3.w, -|r3.y|, l(1.000000)
   sqrt r3.w, r3.w
   mad r4.x, |r3.y|, l(-0.018729), l(0.074261)
   mad r4.x, r4.x, |r3.y|, l(-0.212114)
   mad r4.x, r4.x, |r3.y|, l(1.570729)
   mul r4.y, r3.w, r4.x
   mad r4.y, r4.y, l(-2.000000), l(3.141593)
   lt r3.y, r3.y, -r3.y
   and r3.y, r4.y, r3.y
   mad r3.y, r4.x, r3.w, r3.y
   add r3.y, -r3.y, l(1.570796)
   min r3.w, |r2.x|, |r2.y|
   max r4.x, |r2.x|, |r2.y|
   div r4.x, l(1.000000, 1.000000, 1.000000, 1.000000), r4.x
   mul r3.w, r3.w, r4.x
   mul r4.x, r3.w, r3.w
   mad r4.y, r4.x, l(0.020835), l(-0.085133)
   mad r4.y, r4.x, r4.y, l(0.180141)
   mad r4.y, r4.x, r4.y, l(-0.330299)
   mad r4.x, r4.x, r4.y, l(0.999866)
   mul r4.y, r3.w, r4.x
   lt r4.z, |r2.x|, |r2.y|
   mad r4.y, r4.y, l(-2.000000), l(1.570796)
   and r4.y, r4.z, r4.y
   mad r3.w, r3.w, r4.x, r4.y
   lt r4.x, r2.x, -r2.x
   and r4.x, r4.x, l(0xc0490fdb)
   add r3.w, r3.w, r4.x
   min r4.x, r2.x, r2.y
   max r4.y, r2.x, r2.y
   lt r4.x, r4.x, -r4.x
   ge r4.y, r4.y, -r4.y
   and r4.x, r4.x, r4.y
   movc r3.w, r4.x, -r3.w, r3.w
   log r4.x, r1.w
   mul r4.x, r4.x, l(7.000000)
   exp r4.x, r4.x
   mul r3.yw, r3.yyyw, l(0.000000, 8.000000, 0.000000, 8.000000)
   mul r4.y, r2.w, r4.x
   mad r2.w, r4.y, l(8.000000), l(1.000000)
   mul r4.x, r1.w, r4.x
   sincos null, r4.yz, r3.yywy
   mul r5.x, r4.z, r4.y
   sincos r3.w, null, r3.w
   mul r5.y, r4.y, r3.w
   sincos r5.z, null, r3.y
   mad r2.xyz, r4.xxxx, r5.xyzx, r1.xyzx
   dp3 r3.y, r2.xyzx, r2.xyzx
   sqrt r1.w, r3.y
   mov r3.x, r3.z
   endloop
   log r0.w, r1.w
   mul r0.w, r1.w, r0.w
   mul r0.w, r0.w, l(-0.346574)
   div r0.w, r0.w, r2.w

------------------------------------
added: exe + source

edit shader.fx for
power of z^p+c
max iteration count (4)
max raytrace step count (50)
distance threshold (-0.00025)
4x 16x AA

30 fps in 1920x1080 on default settengs, GT200b 1620MHz

F1 - fly mode
F8 - stereo, O,P,K,L - separation/convergence

http://rapidshare.de/files/48733881/mandelbulb.enforcer.v1.zip.html


« Last Edit: November 24, 2009, 07:58:34 PM by Enforcer »	Logged

cKleinhuis

Administrator
Fractal Senior

Posts: 7044

formerly known as 'Trifox'

Re: Realtime rendering/optimisations

« Reply #1 on: November 24, 2009, 02:07:47 PM »

any executables ?!


	Logged

---

divide and conquer - iterate and rule - chaos is No random!

cbuchner1

Fractal Phenom

Posts: 443

Re: Realtime rendering/optimisations

« Reply #2 on: November 24, 2009, 09:17:36 PM »

Excellent. Any chance this would compile against DirectX 9 as well? DX10 is so Vista-only.

By the way, you rule. The scalar derivative is so much faster, also in my Optix based raytracer.

Christian


« Last Edit: November 24, 2009, 10:05:14 PM by cbuchner1 »	Logged

Enforcer

Guest

Re: Realtime rendering/optimisations

« Reply #3 on: November 25, 2009, 01:32:43 AM »

Quote from: cbuchner1 on November 24, 2009, 09:17:36 PM

Excellent. Any chance this would compile against DirectX 9 as well? DX10 is so Vista-only.

This source obviously wouldnt .
It seems quite a bit of work, there is no similar DX9 sample in SDK.
There is nothing (yet) that wouldnt work in DX9 however.
I found DX10 API more programming-friendly.


	Logged

lycium

Fractal Supremo

Posts: 1158

Re: Realtime rendering/optimisations

« Reply #4 on: November 25, 2009, 02:22:49 AM »

zomg those are extremely impressive performance figures! i'm looking forward to trying out the code you've kindly provided at home, finally getting rid of the fixed(ish) step raymarching

great job and thanks for sharing.


	Logged

http://chaoticafractals.com | http://indigorenderer.com | http://lyc.deviantart.com

cbuchner1

Fractal Phenom

Posts: 443

Re: Realtime rendering/optimisations

« Reply #5 on: November 25, 2009, 05:21:17 PM »

Quote from: cbuchner1 on November 24, 2009, 09:17:36 PM

By the way, you rule. The scalar derivative is so much faster, also in my Optix based raytracer.

Hmm the positive power Mandelbulbs render faster with the scalar derivative, but the scalar derivative seems to break the rendering for negative powers (Mandeliers, as I call them). Will investigate further.


	Logged

Enforcer

Guest

Re: Realtime rendering/optimisations

« Reply #6 on: November 27, 2009, 02:47:52 AM »

Pre-computing DE makes rendering ~2-4 times faster (at comparable image quality)

60 FPS in 2560x1600, 3 iterations

44 FPS in 2560x1600, 6 iterations

In theory, sampling from 3D texture should not interfere with computation (as there are many threads (warps) in flight,
each at its own point in code)
So, balance between sampling throughput and computation could lead to best possible performance.

Unfortunately, thats not what i see. ALU instructions between texture fetches are "free", but instructions after all fetches
do decrease performance.


« Last Edit: November 27, 2009, 03:17:37 AM by Enforcer »	Logged

keldor314

Guest

Re: Realtime rendering/optimisations

« Reply #7 on: November 27, 2009, 02:14:16 PM »

I modified the shader a bit - now it can render dynamic level of detail, so that you can zoom in and see more detail without causing aliasing. I simply am multiplying the ray march minimum distance by the distance from the camera.

Here's the modified shader:

Code:

//--------------------------------------------------------------------------------------
// Constant Buffer Variables
//--------------------------------------------------------------------------------------
TextureCube txEnv;

SamplerState samLinear
{
   Filter = MIN_MAG_LINEAR_MIP_POINT;
   AddressU = Clamp;
   AddressV = Clamp;
   AddressW = Clamp;
};

cbuffer cbNeverChanges
{
   matrix View;
};

cbuffer cbChangeOnResize
{
   matrix Projection;
   float2 vReverseRes;
};

cbuffer cbChangesEveryFrame
{
   matrix World;
   matrix InvWorldViewProjection;
   matrix InvProjection;
   float4 vMeshColor;
};

struct VS_INPUT
{
   float4 Pos : POSITION;
   float3 Tex : TEXCOORD;
};

struct GS_INPUT
{
   float4 Pos : SV_POSITION;
   float3 Tex : TEXCOORD0;
   float4 View: POSITION;
};

struct PS_INPUT
{
   float4 Pos : SV_POSITION;
   float3 Tex1 : TEXCOORD0;
   float3 Tex2 : TEXCOORD1;
};

//--------------------------------------------------------------------------------------
// Vertex Shader
//--------------------------------------------------------------------------------------
PS_INPUT QuakeVS( VS_INPUT input )
{
   PS_INPUT output = (PS_INPUT)0;
   input.Pos.z = 1;
   output.Tex1.xyz = 0.15+vMeshColor.xyz*0.05; //WSAD movement
   output.Tex2 = normalize(mul( input.Pos, InvWorldViewProjection ));
   output.Tex1 += output.Tex2*0.01;
   output.Pos =input.Pos;
   output.Pos.z = 0;

   return output;
}

GS_INPUT VS( VS_INPUT input )
{
   GS_INPUT output = (GS_INPUT)0;
   output.Pos = mul( input.Pos, World );
   output.Pos = mul( output.Pos, View );
   output.View = output.Pos;
   output.Pos = mul( output.Pos, Projection );
   output.Tex = input.Tex;

   return output;
}

[maxvertexcount(3)]
void GS( triangle GS_INPUT input[3], inout TriangleStream<PS_INPUT> TriStream )
{
   PS_INPUT output = (PS_INPUT)0;

   float3x3 m,n;
   m[0] = input[1].Tex - input[0].Tex;
   m[1] = input[2].Tex - input[0].Tex;
   m[2] = cross(m[0], m[1]);

   n[0] = normalize(input[1].View - input[0].View);
   n[1] = normalize(input[2].View - input[0].View);
   n[2] = cross(n[0], n[1]);

   for(int i=0; i<3; i++)
   {
   output.Pos = input[i].Pos;
   output.Tex1 = input[i].Tex;
   float3 Norm;
   Norm = input[i].View;
   Norm = mul(n,Norm);
   Norm = -mul(Norm,m);
   output.Tex2 = Norm;

   TriStream.Append( output );
   }
   TriStream.RestartStrip();
}

// power
#define P 8
inline void powN1(inout float3 z, float zr0, inout float dr) {
// float zr = sqrt( dot(z,z) );
  float zo0 = asin( z.z/zr0 );
  float zi0 = atan2( z.y,z.x );

  float zr = pow( zr0, P-1 );
  float zo = zo0 * P;
  float zi = zi0 * P;

  dr = zr*dr*P + 1;
  zr *= zr0;
  z = zr*float3( cos(zo)*cos(zi), cos(zo)*sin(zi), sin(zo) );
}

inline float DE(float3 z0)
{
  float3 z=z0;
  float r;
  float dr=1;
  int i=20; //max iteration count
  r=length(z);
  while(r<16. && i--) {
   powN1(z,r,dr);
   z+=z0;
   r=length(z);
  }
  return -0.5*log(r)*r/dr;
}

// 5% faster but pow8 only, rename to use
inline float DE1(float3 z0)
{
  float3 z=z0;
  float r,r2;
  float dr=1;
  int i=4; //max iteration count
  r2=dot(z,z);
  r =sqrt(r2);
  while(r<2 && i--) {
   float zo0 = asin( z.z/r );
   float zi0 = atan2( z.y,z.x );

   float zr = r2*r2*r2*r;//pow( zr0, P-1 );
   float zo = zo0 * P;
   float zi = zi0 * P;

   dr = zr*dr*P + 1;
   zr *= r;
   z = zr*float3( cos(zo)*cos(zi), cos(zo)*sin(zi), sin(zo) );

   z+=z0;
   r2=dot(z,z);
   r =sqrt(r2);
  }
  return -0.5*log(r)*r/dr;
}

inline float Tex(float3 t)
{
   float c2 = DE( t );
   return c2;
}

inline float3 CalcNorm(float3 t, float c)
{
   float delta=4.0/25600.0;
   float3 tx1 = t;
   tx1.x+=delta;
   float cx1 = Tex( tx1 );
   float3 ty1 = t;
   ty1.y+=delta;
   float cy1 = Tex( ty1 );
   float3 tz1 = t;
   tz1.z+=delta;
   float cz1 = Tex( tz1 );
   float3 d1 = float3(c-cx1,c-cy1,c-cz1);
   return normalize(d1);//*25600;
}

inline float3 CalcNormDD(float3 t, float c)
{
   float3 n1=ddx(t);
   float3 n2=ddy(t);
   return normalize(cross(n1,n2));
}

inline void Ray1(inout float3 t, inout float c, in float3 Norm)
{
   //max raytrace step count
   //distance threshold
   float3 t0 = t;
   for (int i = 0;i<450;i++) { t += .75*Norm*c; c = Tex(t); [branch] if(c>-0.0003*length(t-t0)) break; };
}

//--------------------------------------------------------------------------------------
// Pixel Shader
//--------------------------------------------------------------------------------------
float4 PS2( PS_INPUT input) : SV_Target
{
   float3 Norm = normalize(input.Tex2);
   float3 t = input.Tex1;

   t-=0.5;t*=2.5;

   float c;
   c = Tex(t);
   Ray1(t,c,Norm);

   float3 dx = CalcNorm(t,c);

   float ao=-Tex(t+dx*0.05)*40+0.2;

   float3 reflVec = reflect(Norm,dx);
   float3 refl = txEnv.Sample( samLinear, -reflVec.zxy);

   float l =dot(dx,Norm);
   l *= l;

// return float4(0,ao,0,c*256+1.0f);// * vMeshColor;+vMeshColor.x*4
   return float4(refl*2*l*ao,c*256*0.4+1.0f);// * vMeshColor;+vMeshColor.x*4
}

float4 PS2AA( PS_INPUT input) : SV_Target
{
  float3 x=ddx(input.Tex1)*0.5;
  float3 y=ddy(input.Tex1)*0.5;
  float3 nx=ddx(input.Tex2)*0.5;
  float3 ny=ddy(input.Tex2)*0.5;
  float4 c;
  c=PS2(input);
  input.Tex1+=x;
  input.Tex2+=nx;
  c+=PS2(input);
  input.Tex1+=y;
  input.Tex2+=ny;
  c+=PS2(input);
  input.Tex1-=x;
  input.Tex2-=nx;
  c+=PS2(input);
  return c*0.25;
}

float4 PS2AAA( PS_INPUT input) : SV_Target
{
  float3 x=ddx(input.Tex1)*0.5;
  float3 y=ddy(input.Tex1)*0.5;
  float3 nx=ddx(input.Tex2)*0.5;
  float3 ny=ddy(input.Tex2)*0.5;
  float4 c;
  c=PS2AA(input);
  input.Tex1+=x;
  input.Tex2+=nx;
  c+=PS2AA(input);
  input.Tex1+=y;
  input.Tex2+=ny;
  c+=PS2AA(input);
  input.Tex1-=x;
  input.Tex2-=nx;
  c+=PS2AA(input);
  return c*0.25;
}

BlendState NoBlending
{
   AlphaToCoverageEnable = FALSE;
   BlendEnable[0] = FALSE;
};

BlendState SrcBlending
{
   AlphaToCoverageEnable = FALSE;
   BlendEnable[0] = TRUE;
   SrcBlend = SRC_ALPHA;
   DestBlend = INV_SRC_ALPHA;
   BlendOp = ADD;
};

DepthStencilState DisableDepth
{
   DepthEnable = FALSE;
   DepthWriteMask = ZERO;
};

//--------------------------------------------------------------------------------------
technique10 Render1
{
   pass P0
   {
   SetVertexShader( CompileShader( vs_4_0, VS() ) );
   SetGeometryShader( NULL );
   SetPixelShader( NULL );
   SetBlendState( SrcBlending, float4( 0.0f, 0.0f, 0.0f, 0.0f ), 0xFFFFFFFF );
   SetDepthStencilState( DisableDepth, 0 );
   }
}

technique10 Render2
{
   pass P0
   {
   SetVertexShader( CompileShader( vs_4_0, VS() ) );
   SetGeometryShader( CompileShader( gs_4_0, GS() ) );
   // PS2AA for 4x antialiasing
   // PS2AAA for 16x antialiasing
   SetPixelShader( CompileShader( ps_4_0, PS2() ) );
   SetBlendState( SrcBlending, float4( 0.0f, 0.0f, 0.0f, 0.0f ), 0xFFFFFFFF );
   SetDepthStencilState( DisableDepth, 0 );
// SetPixelShader( NULL );
   }
}

// fly mode
technique10 RenderQuad
{
   pass P0
   {
   SetVertexShader( CompileShader( vs_4_0, QuakeVS() ) );
   SetGeometryShader( NULL );
// SetPixelShader( NULL );
   // PS2AA for 4x antialiasing
   // PS2AAA for 16x antialiasing
   SetPixelShader( CompileShader( ps_4_0, PS2() ) );
   SetDepthStencilState( DisableDepth, 0 );
   }
}

I also increased the max iterations and ray marching steps, as well as pushing the bailout out to 16 to increase distance estimation accuracy. In addition, I'm multiplying the ray step distance by .75 to relax the stepping a bit, which removes some artifacting when you view certain parts of the fractal from certain angles.


« Last Edit: November 27, 2009, 02:33:00 PM by keldor314 »	Logged

keldor314

Guest

Re: Realtime rendering/optimisations

« Reply #8 on: November 27, 2009, 02:54:02 PM »

One big improvement to the camera controls would be to multiply the camera speed by the distance estimation from the fractal. Thus, the camera would move slower the closer it is to the fractal.


	Logged

lycium

Fractal Supremo

Posts: 1158

Re: Realtime rendering/optimisations

« Reply #9 on: November 27, 2009, 05:03:14 PM »

iq reports in another thread for his zooming video:

I have used LOD here. The contact-epsilon used in the distance field raymarcher depends on the distance from the point to the camera and the field of view: eps = k * t / sqrt( 1 + focalLength^2) or something similar, you do the math again, I lost the paper somewhere.


	Logged

http://chaoticafractals.com | http://indigorenderer.com | http://lyc.deviantart.com

flexiverse

Safarist

Posts: 99

Re: Realtime rendering/optimisations

« Reply #10 on: July 12, 2010, 05:41:36 PM »

Has anyone got the link to this program/code?

http://rapidshare.de/files/48733881/mandelbulb.enforcer.v1.zip.html

The above link does not work?


	Logged

quaternion

Guest

Re: Realtime rendering/optimisations

« Reply #11 on: July 12, 2010, 08:52:07 PM »

Great!!!


	Logged

quaternion

Guest

Re: Realtime rendering/optimisations

« Reply #12 on: July 12, 2010, 09:01:56 PM »

Quote from: flexiverse on July 12, 2010, 05:41:36 PM

The above link does not work?


	Logged

dapa

Guest

Re: Realtime rendering/optimisations

« Reply #13 on: September 20, 2010, 10:57:13 PM »

Please, Enforcer or anyone else, post the source code and binaries again! sad


	Logged

Nahee_Enterprises

World Renowned
Fractal Senior

Posts: 2250

use email to contact

Re: Realtime rendering/optimisations

« Reply #14 on: September 27, 2010, 03:25:50 PM »

Quote from: dapa on September 20, 2010, 10:57:13 PM

Please, Enforcer or anyone else, post the source code and binaries again! sad

Greetings, and Welcome to this particular Forum !!!

I am afraid that "Enforcer" has not posted a thing to the Forums since 11/26/2009, and was last logged into the Forums on 07/21/2010. It appears they are rarely here anymore.

It might be difficult to acquire what was once specified at the above links.


	Logged

_Sincerely, Paul N. Lee_ _ _ _ _ _ _ _ _
http://www.Nahee.com/PNL/Fractals.html
http://www.Nahee.com/Fractals/

Pages: [1] 2 Go Down

« previous next »

	Author	Topic: Realtime rendering/optimisations (Read 19578 times)
		Description:
0 Members and 3 Guests are viewing this topic.

Related Topics
	Subject	Started by	Replies	Views	Last post
	Realtime rendering on GPU 3D Fractal Generation « 1 2 3 4 »	alexl	45	38804	September 12, 2014, 12:04:33 PM by Mrz00m
	realtime gpu render?! 3D Fractal Generation	slon_ru	1	2410	February 10, 2013, 01:52:06 AM by Apophyster
	realtime rendering with gl shader Movies Showcase (Rate My Movie)	sleeplessmonk	2	1999	October 12, 2014, 09:04:47 PM by SeryZone
	Mandelbulb in 3d realtime for VR The 3D Mandelbulb	Chillheimer	0	4895	January 08, 2016, 06:40:38 PM by Chillheimer
	3D Fractal Realtime Rendering Fractal Programs	okankoese	1	7494	November 19, 2017, 08:05:06 PM by knighty

END OF AN ERA, FRACTALFORUMS.COM IS CONTINUED ON FRACTALFORUMS.ORG

it was a great time but no longer maintainable by c.Kleinhuis contact him for any data retrieval,
thanks and see you perhaps in 10 years again

The All New FractalForums is now in Public Beta Testing! Visit FractalForums.org and check it out!

	Welcome, Guest. Please login or register.	April 18, 2024, 03:58:38 PM
		Login with username, password and session length

END OF AN ERA, FRACTALFORUMS.COM IS CONTINUED ON FRACTALFORUMS.ORG

it was a great time but no longer maintainable by c.Kleinhuis contact him for any data retrieval, thanks and see you perhaps in 10 years again

The All New FractalForums is now in Public Beta Testing! Visit FractalForums.org and check it out!

it was a great time but no longer maintainable by c.Kleinhuis contact him for any data retrieval,
thanks and see you perhaps in 10 years again