Welcome to Fractal Forums

Fractal Software => Programming => Topic started by: ker2x on July 12, 2010, 06:41:02 PM




Title: Buddhabrot on GPU
Post by: ker2x on July 12, 2010, 06:41:02 PM
I'm moving this discussion : http://www.fractalforums.com/images-showcase-(rate-my-fractal)/the-infinity-fields-(detailed-buddhabrot-zoom)/
on this thread (as requested).

The discussion is about fiding an efficient way to compute a buddhabrot on a GPU :

The GPU is not supposed to be efficient with that kind of computation, but let's try ! :)

The speed problem is with the scattered read-modify-writes to global memory. I am thinking mainly about the CUDA architecture now:

Wouldn't it be faster to use the 16kb of shared memory (64 kb on Fermi) as some kind of independent "mini framebuffer" tiles, and accumulating the writes within shared memory only - as if one rendered a lot of independent deep zooms? Techniques to render Buddhabrot zooms exist (with appropriate non-uniform sampling optimizations). And applying these techniques to individual tiles that make up a larger image might just work. Every multiprocessor would get to work on its own tile (shared memory is individual to each multiprocessor). For those tiles that finish rendering, the multiprocessor will immediately get to work on another tile (on Fermi at least). One would not need any writes to global memory - speeding up the process by an order of magnitude maybe -  until the very end when the completed tile is written out.

Christian




Title: Re: Buddhabrot on GPU
Post by: ker2x on July 12, 2010, 07:00:19 PM
I'm planning to try in C# (Using Visual Express 2010) + OpenTk and Cloo ( http://www.opentk.com (Cloo in the OpenCL framework for C#))

Edit :
i finished to write the boring code (and learning OpenTK and Cloo).
For now i just have a very classic Mandelbrot that prove that i understood how to do openCL in C# :)
In the next few day, i'll rewrite my Mandelbrot app to (try to) render a buddhabrot in OpenCL \o/



Title: Re: Buddhabrot on GPU
Post by: cbuchner1 on July 12, 2010, 08:39:46 PM
I'm planning to try in C# (Using Visual Express 2010) + OpenTk and Cloo ( http://www.opentk.com (Cloo in the OpenCL framework for C#))

More details.

nvidia Fermi architecture has up to 48kb of shared memory (+16kb L1 cache) per multiprocessor.  This fits a 64x64 pixel RGB tile with 32 bits per color channel.

And Fermi has 768kb of common L2 cache. This should greatly boost performance of read-modify-write operations as needed for Buddhabrots when accessing the card's main (global) memory. Also atomic operations are said to work much faster on Fermi compared to previous generations.

Not sure how much of these hardware features are accessible through OpenCL. I am more the CUDA person myself.

I just ordered a GTX 460 card with 1GB for some tinkering. My first Fermi based card. ;-) Finally a fermi card that does not set the house on fire and has a price point somewhere below insanity.

EDIT: a DX11 compute shader version is found here: http://www.yakiimo3d.com/2010/03/29/dx11-directcompute-buddhabrot-nebulabrot/
some further AMD optimizations in this thread: http://forum.beyond3d.com/showthread.php?t=57042


Title: Re: Buddhabrot on GPU
Post by: ker2x on July 12, 2010, 11:40:35 PM
EDIT: a DX11 compute shader version is found here: http://www.yakiimo3d.com/2010/03/29/dx11-directcompute-buddhabrot-nebulabrot/

Interesting to see that it's still fast with a non-optimized version.

If anyone want a working codebase for OpenTK + Cloo, here is my code (it render a mandelbrot for now) :
Code:
using System;
using System.Collections.Generic;
using System.Text;
using System.Drawing;
using System.Drawing.Imaging;
using System.Runtime.InteropServices;

using OpenTK;
using OpenTK.Graphics;
using OpenTK.Graphics.OpenGL;
using OpenTK.Input;



using Cloo;


namespace TKBuddhabrot
{
    class TKBuddhabrot : GameWindow
    {

private static string kernelSource = @"
__kernel void mandelbrot(
  const float deltaReal,
  const float deltaImaginary,
  const float realMin,
  const float imaginaryMin,
  const unsigned int maxIter,
  const unsigned int escapeOrbit,
  const unsigned int hRes,
  __global int* outputi
) {

  int xId = get_global_id(0);
  int yId = get_global_id(1);

  float realPos = realMin + (xId * deltaReal);
  float imaginaryPos = imaginaryMin + (yId * deltaImaginary);
  float real = realPos;
  float imaginary = imaginaryPos;
  float realSquared = real * real;
  float imaginarySquared = imaginary * imaginary;

  int iter = 0;
  while ( (iter < maxIter) && ((realSquared + imaginarySquared) < escapeOrbit) )
  {
    imaginary = (2 * (real * imaginary)) + imaginaryPos;
    real = realSquared - imaginarySquared + realPos;
    realSquared = real * real;
    imaginarySquared = imaginary * imaginary;
    iter++;
  }
  if(iter >= maxIter){
        iter = 0;
  }
  outputi[(yId * hRes) + xId] = iter;
}



";

        ComputePlatform platform;
        ComputeContextPropertyList properties;
        ComputeContext context;

        Bitmap bmp;

        float realMin, realMax, imaginaryMin, imaginaryMax, deltaReal, deltaImaginary;
        int maxiter, screenSizeInPixel, escapeOrbit;

        static int initialScreenWidth = 800;
        static int initialScreenHeight = 800;



        /// <summary>Creates a window with the specified title.</summary>
        public TKBuddhabrot() : base(initialScreenWidth, initialScreenHeight, GraphicsMode.Default, "TKBuddhabrot")
        {
            VSync = VSyncMode.On;
        }

        /// <summary>Load resources here.</summary>
        /// <param name="e">Not used.</param>
        protected override void OnLoad(EventArgs e)
        {
            base.OnLoad(e);

            //Create Bitmap
            bmp = new Bitmap(ClientRectangle.Width, ClientRectangle.Height);

            //OpenGL Stuff
            GL.ClearColor(0.1f, 0.2f, 0.5f, 0.0f);

            //OpenCL initialisation
            platform = ComputePlatform.Platforms[0];
            Console.WriteLine("Compute platform : " + platform.ToString());

            properties = new ComputeContextPropertyList(platform);           
            context = new ComputeContext(platform.Devices, properties, null, IntPtr.Zero);
            Console.WriteLine("Compute context : " + context.ToString());

            //Mandelbrot Specific
            realMin = -2.25f;
            realMax = 0.75f;
            imaginaryMin = -1.5f;
            imaginaryMax = 1.5f;
            maxiter = 64;
            escapeOrbit = 4;

            deltaReal = (realMax - realMin) / (ClientRectangle.Width - 1);
            deltaImaginary = (imaginaryMax - imaginaryMin) / (ClientRectangle.Height - 1);
            screenSizeInPixel = ClientRectangle.Width * ClientRectangle.Height;

            //OpenCL Buffer
            ComputeBuffer<float> kernelOutput = new ComputeBuffer<float>(context, ComputeMemoryFlags.WriteOnly, screenSizeInPixel);

            //Build OpenCL kernel
            ComputeProgram program = new ComputeProgram(context, new string[] { kernelSource });
            program.Build(null, null, null, IntPtr.Zero);
            ComputeKernel kernel = program.CreateKernel("mandelbrot");

            //OpenCL args
            //  const float deltaReal,
            //  const float deltaImaginary,
            //  const float realMin,
            //  const float imaginaryMin,
            //  const unsigned int maxIter,
            //  const unsigned int escapeOrbit,
            //  const unsigned int hRes,
            //  __global int* outputi

            kernel.SetValueArgument<float>(0, deltaReal);
            kernel.SetValueArgument<float>(1, deltaImaginary);
            kernel.SetValueArgument<float>(2, realMin);
            kernel.SetValueArgument<float>(3, imaginaryMin);
            kernel.SetValueArgument<int>(4, maxiter);
            kernel.SetValueArgument<int>(5, escapeOrbit);
            kernel.SetValueArgument<int>(6, ClientRectangle.Width);
            kernel.SetMemoryArgument(7, kernelOutput);

            //Execute
            ComputeCommandQueue commands = new ComputeCommandQueue(context, context.Devices[0], ComputeCommandQueueFlags.None);
            ComputeEventList events = new ComputeEventList();
            commands.Execute(kernel, null, new long[] { ClientRectangle.Width, ClientRectangle.Height }, null, events);

            //Get result
            int[] kernelResult = new int[screenSizeInPixel];
            GCHandle kernelResultHandle = GCHandle.Alloc(kernelResult, GCHandleType.Pinned);

            commands.Read(kernelOutput, false, 0, screenSizeInPixel, kernelResultHandle.AddrOfPinnedObject(), events);
            commands.Finish();

            //Finish openCL stuff
            kernelResultHandle.Free();
           
            int maxfound = 0;
            foreach (int iter in kernelResult)
            {
                if (iter > maxfound) maxfound = iter;
            }

            //Use the result
            int x, y;
            for (x = 0; x < bmp.Width; x++)
            {
                for (y = 0; y < bmp.Height; y++)
                {
                    Color c = Color.FromArgb((int)((float)kernelResult[x + y * bmp.Width] / maxfound * 255.0),
                        (int)((float)kernelResult[x + y * bmp.Width] / maxfound * 255.0),
                        (int)((float)kernelResult[x + y * bmp.Width] / maxfound * 255.0)
                        );

                    bmp.SetPixel(x, y, c);
                }
            }

            BitmapData bmp_data = bmp.LockBits(new Rectangle(0, 0, bmp.Width, bmp.Height), ImageLockMode.ReadOnly, System.Drawing.Imaging.PixelFormat.Format32bppArgb);

            GL.TexImage2D(TextureTarget.Texture2D, 0, PixelInternalFormat.Rgba, bmp_data.Width, bmp_data.Height, 0,
                OpenTK.Graphics.OpenGL.PixelFormat.Bgra, PixelType.UnsignedByte, bmp_data.Scan0);

            bmp.UnlockBits(bmp_data);
            GL.TexParameter(TextureTarget.Texture2D, TextureParameterName.TextureMinFilter, (int)TextureMinFilter.Linear);
            GL.TexParameter(TextureTarget.Texture2D, TextureParameterName.TextureMagFilter, (int)TextureMagFilter.Linear);

            Console.WriteLine("done");

        }

        /// <summary>
        /// Called when your window is resized. Set your viewport here. It is also
        /// a good place to set up your projection matrix (which probably changes
        /// along when the aspect ratio of your window).
        /// </summary>
        /// <param name="e">Not used.</param>
        protected override void OnResize(EventArgs e)
        {
            base.OnResize(e);

            GL.Viewport(ClientRectangle.X, ClientRectangle.Y, ClientRectangle.Width, ClientRectangle.Height);

            Matrix4 projection = Matrix4.CreatePerspectiveFieldOfView((float)Math.PI / 4, Width / (float)Height, 1.0f, 64.0f);
            GL.MatrixMode(MatrixMode.Projection);
            GL.LoadMatrix(ref projection);
        }

        /// <summary>
        /// Called when it is time to setup the next frame. Add you game logic here.
        /// </summary>
        /// <param name="e">Contains timing information for framerate independent logic.</param>
        protected override void OnUpdateFrame(FrameEventArgs e)
        {
            base.OnUpdateFrame(e);

            if (Keyboard[Key.Escape])
                Exit();
        }

        /// <summary>
        /// Called when it is time to render the next frame. Add your rendering code here.
        /// </summary>
        /// <param name="e">Contains timing information.</param>
        protected override void OnRenderFrame(FrameEventArgs e)
        {
            base.OnRenderFrame(e);

            GL.Clear(ClearBufferMask.ColorBufferBit | ClearBufferMask.DepthBufferBit);

            GL.Disable(EnableCap.DepthTest);
            GL.Enable(EnableCap.Texture2D);
            GL.Enable(EnableCap.Blend);
            GL.BlendFunc(BlendingFactorSrc.SrcAlpha, BlendingFactorDest.OneMinusSrcAlpha);


            GL.MatrixMode(MatrixMode.Projection);

            GL.LoadIdentity();
            GL.Ortho(0, ClientRectangle.Width, ClientRectangle.Height, 0, 0, 1);
         
 
            GL.Begin(BeginMode.Polygon);
                GL.TexCoord2(0.0, 1.0);
                GL.Vertex2(0, 0);

                GL.TexCoord2(1.0, 1.0);
                GL.Vertex2(ClientRectangle.Width, 0);

                GL.TexCoord2(1.0, 0.0);
                GL.Vertex2(ClientRectangle.Width, ClientRectangle.Height);

                GL.TexCoord2(0.0, 0.0);
                GL.Vertex2(0,ClientRectangle.Height);
            GL.End();
           
            SwapBuffers();
        }

        /// <summary>
        /// The main entry point for the application.
        /// </summary>
        [STAThread]
        static void Main()
        {
            // The 'using' idiom guarantees proper resource cleanup.
            // We request 30 UpdateFrame events per second, and unlimited
            // RenderFrame events (as fast as the computer can handle).
            using (TKBuddhabrot game = new TKBuddhabrot())
            {
                Console.WriteLine("Display device List (may be usefull for debug)");
                foreach (DisplayDevice device in DisplayDevice.AvailableDisplays)
                {

                    Console.WriteLine("-------------");
                    Console.WriteLine("is primary : " + device.IsPrimary);
                    Console.WriteLine("bound : " + device.Bounds);
                    Console.WriteLine("Refresh rate : " + device.RefreshRate);
                    Console.WriteLine("bpp : " + device.BitsPerPixel);
                    //foreach (DisplayResolution res in device.AvailableResolutions) { Console.WriteLine(res); }

                }
                Console.WriteLine("-------------");
                game.Run(30.0);
            }
        }


    }
}


Title: Re: Buddhabrot on GPU
Post by: lycium on July 13, 2010, 12:45:17 AM
I'm planning to try in C# (Using Visual Express 2010) + OpenTk and Cloo ( http://www.opentk.com (Cloo in the OpenCL framework for C#))

More details.

nvidia Fermi architecture has up to 48kb of shared memory (+16kb L1 cache) per multiprocessor.  This fits a 64x64 pixel RGB tile with 32 bits per color channel.

And Fermi has 768kb of common L2 cache. This should greatly boost performance of read-modify-write operations as needed for Buddhabrots when accessing the card's main (global) memory. Also atomic operations are said to work much faster on Fermi compared to previous generations.

Not sure how much of these hardware features are accessible through OpenCL. I am more the CUDA person myself.

I just ordered a GTX 460 card with 1GB for some tinkering. My first Fermi based card. ;-) Finally a fermi card that does not set the house on fire and has a price point somewhere below insanity.

EDIT: a DX11 compute shader version is found here: http://www.yakiimo3d.com/2010/03/29/dx11-directcompute-buddhabrot-nebulabrot/
some further AMD optimizations in this thread: http://forum.beyond3d.com/showthread.php?t=57042

chris, you're right of course about the fermi arch, but you didn't actually buy one of those chips! the gtx 460 lacks the cache and many compute-oriented features...


Title: Re: Buddhabrot on GPU
Post by: ker2x on July 13, 2010, 09:53:58 AM
chris, you're right of course about the fermi arch, but you didn't actually buy one of those chips! the gtx 460 lacks the cache and many compute-oriented features...

Ho wow... and i was ready to buy one. Thx for the info !


Title: Re: Buddhabrot on GPU
Post by: cbuchner1 on July 13, 2010, 10:28:52 AM
chris, you're right of course about the fermi arch, but you didn't actually buy one of those chips! the gtx 460 lacks the cache and many compute-oriented features...

(*) citation needed

GTX 460 is Compute Capability 2.1 and has a higher double precision throughput than the GTX 470 for example. I am all excited about it.

After some research, I found out the L2 caches were indeed shrunk a bit.
L2-Caches on GTX 460: 512 KB on the 1GB model, 384 kb on the 768MB model
                   (768 KB on GF100, i.e. GTX 470/480)


Title: Re: Buddhabrot on GPU
Post by: hobold on July 13, 2010, 01:58:32 PM
All the details you could ever want and more:

http://www.anandtech.com/show/3809/nvidias-geforce-gtx-460-the-200-king

In short, the new GF104 chip (as used in the GeForce460) has the same computing capabilities as the older GF104 chip (as used in GeForce 465 and upwards). The only difference is that GF104 has a slightly lower capacity in a few areas, but not in any fundamentally significant ways.

The main differences are related to silicon fabrication technology (which is good for power consumption, and good for price due to better silicon yield) and the fact that the GF104 is the first superscalar GPU. Being superscalar means the GF104 implements another type of parallelism that has been around in CPUs since 1992 (IBM's single chip RIOS implementation as a commercial product, better known as POWER1), but not yet in GPUs until today. The end result is that GF104 is considerably smarter utilizing its computational resources, which helps both absolute performance and performance per Watt.

The GF104 is not the new king of the hill, but finally closes the gap to AMD, and re-establishes a little bit of a technological advantage at Nvidia. Superscalarism should be a significant step for GPU computing, because it is a stepping stone to dynamic out of order execution.

At the $200 price point, the GeForce460 is now the best alternative. But if you can afford to spend more, or have less to spend, AMD/ATI may have the better options for you.


Title: Re: Buddhabrot on GPU
Post by: ker2x on July 13, 2010, 03:24:21 PM
All the details you could ever want and more:

http://www.anandtech.com/show/3809/nvidias-geforce-gtx-460-the-200-king

In short, the new GF104 chip (as used in the GeForce460) has the same computing capabilities as the older GF104 chip (as used in GeForce 465 and upwards). The only difference is that GF104 has a slightly lower capacity in a few areas, but not in any fundamentally significant ways.

The main differences are related to silicon fabrication technology (which is good for power consumption, and good for price due to better silicon yield) and the fact that the GF104 is the first superscalar GPU. Being superscalar means the GF104 implements another type of parallelism that has been around in CPUs since 1992 (IBM's single chip RIOS implementation as a commercial product, better known as POWER1), but not yet in GPUs until today. The end result is that GF104 is considerably smarter utilizing its computational resources, which helps both absolute performance and performance per Watt.

The GF104 is not the new king of the hill, but finally closes the gap to AMD, and re-establishes a little bit of a technological advantage at Nvidia. Superscalarism should be a significant step for GPU computing, because it is a stepping stone to dynamic out of order execution.

At the $200 price point, the GeForce460 is now the best alternative. But if you can afford to spend more, or have less to spend, AMD/ATI may have the better options for you.

very interesting article.

But i'm confused :
- The 460 1GB is much, much, much better than the 460 756MB
- The 460 seems to be better than the 465
- the GF104 seems to be better than the GF100.

But... isn't the overpriced GTX480 based on a GF100 ? (and not a GF104)
What's the point in buying a GF480 then ? (other han saving money, of course)

thx. (PS: tomorrow is a non-working day here, i'll work on OpenCL buddhabrot  ;D )


Title: Re: Buddhabrot on GPU
Post by: hobold on July 13, 2010, 05:34:05 PM
The only point in buying a GF100 is if you desperately need double precision and don't care about the downsides (price, power).

I don't want to incite a flame war, so please do not read the following as  favouritism or prejudice. The fact of the matter is that Nvidia made a few unfortunate design decisions in the original GF 100 (Fermi) that came back to bite them. The newer GF 104 fixes most of those issues, and is a bit scaled down to better target the mass market instead of the computing high end. For example, GF 104 lacks ECC protected memory which GF 100 had, but was really overkill for graphics (and even four our fractal purposes, an incorrect pixel here or there doesn't matter all that much). AMD/ATI played it safe by aiming lower, and was lucky enough to take the crown for the moment. The real losers are in the high performance computing market, because now there is no GPU product anymore that tries to match the level of reliability of good server hardware.

Rumour has it that GF 100 is already out of production after a rather small run of a few thousand chips. The newer GF 104 is the immediate future, and will probably see noteworthy clock speed increases during its product life cycle. The slightly weird distinction between GeForce460 with 786MB and 1024MB is probably just a result of Nvidia being overly cautious right now. But once production has ramped up, we might see Nvidia attacking AMD more aggressively. And we can hope that there will be a bit of a price war between the two. :)


Title: Re: Buddhabrot on GPU
Post by: ker2x on July 13, 2010, 05:36:45 PM
There is a way to generate random number (non-crypto) in the GPU :

Parallel Random Number Generation Using OpenMP, OpenCL and PGI Accelerator Directives
http://www.pgroup.com/lit/articles/insider/v2n2a4.htm

The openCL implementation show : throughput = 3476.850472 [MB/s]
Hum.... should be enough  ;D

While the mersenne twister is not suitable for cryptography (predictability), It has a very long period of 2^19937 − 1, should be enough for our usage :)

Edit : and another one http://forums.nvidia.com/index.php?showtopic=101390
Edit2 : and an exemple code directly from NVidia http://developer.download.nvidia.com/compute/cuda/3_0/sdk/website/OpenCL/website/samples.html#oclMersenneTwister


Title: Re: Buddhabrot on GPU
Post by: cbuchner1 on July 13, 2010, 07:46:40 PM
The only point in buying a GF100 is if you desperately need double precision and don't care about the downsides (price, power).

I am certainly not flaming, but having 480 cores (GTX 480) vs. 240 (GTX 285) may be a valid argument too.

And the support for function pointers, recursion, new/delete operators (in one of the upcoming CUDA toolkits) which clearly grants the programmer more options in algorithm design... Especially the recursion could be useful with fractals.


Title: Re: Buddhabrot on GPU
Post by: ker2x on July 13, 2010, 09:42:12 PM
my port to Buddhabroth-openCL is going well, but i need help with this please : http://www.fractalforums.com/programming/need-help-to-convert-(abs(-1-0-sqrt(1-(4*c))-)-to-c/


Title: Re: Buddhabrot on GPU
Post by: ker2x on July 13, 2010, 10:59:41 PM
i finally have a working openCL buddhabroth. (inspired from my fortran code)

Not efficient or interesting, but it works \o/

(http://fractals.s3.amazonaws.com/buddhabrot/buddhacl.jpg)

Here is the (still ugly and undocumented) openCL code (good luck to decypher it) :

I'll happily help to decypher, comment, and accept any (good or bad) critics.

Code:

bool isInMSet(
  float cr,
  float ci,
  const unsigned int maxIter,
  const float escapeOrbit
)
{
    int iter = 0;
    float zr = 0.0;
    float zi = 0.0;
    float zr2 = zr * zr;
    float zi2 = zi * zi;
    float temp = 0.0;

    //Check if c is in the 2nd order period bulb.
    if( sqrt( ((cr+1.0) * (cr+1.0)) + (ci * ci) ) < 0.25 )
    {
        return true;
    }

    //Check if c is in the main cardiod
    //IF ((ABS( 1.0 - SQRT(1-(4*c)) ))  < 1.0 ) THEN RETURN TRUE (main cardioid)

     while ( (iter < maxIter) && ((zr2 + zi2) < escapeOrbit) )
    {
        temp = zr * zi;
        zr2 = zr * zr;
        zi2 = zi * zi;
        zr = zr2 - zi2 + cr;
        zi = temp + temp + ci;
        iter++;
    }

    if ( iter < maxIter )
    {
        return false;
    } else {
        return true;
    }

}   

__kernel void mandelbrot(
  const float realMax,
  const float imaginaryMax,
  const float realMin,
  const float imaginaryMin,
  const unsigned int maxIter,
  const unsigned int escapeOrbit,
  const unsigned int hRes,
  __global int* outputi
) {

  const int xId = get_global_id(0);
  const int yId = get_global_id(1);
  const int maxX = get_global_size(0);
  const int maxY = get_global_size(1);

  float deltaReal = (realMax - realMin) / (maxX - 1);
  float deltaImaginary = (imaginaryMax - imaginaryMin) / (maxY - 1);

  float realPos = realMin + (xId * deltaReal);
  float imaginaryPos = imaginaryMin + (yId * deltaImaginary);

  float real = realPos;
  float imaginary = imaginaryPos;

  if(isInMSet(real, imaginary, maxIter, escapeOrbit) == false)
  {

    int iter = 0;
    float zr = 0.0;
    float zi = 0.0;
    float zr2 = zr * zr;
    float zi2 = zi * zi;
    float cr = real;
    float ci = imaginary;
    float temp = 0.0;
   
    while ((iter < maxIter))
    {
        temp = zr * zi;
        zr2 = zr * zr;
        zi2 = zi * zi;
        zr = zr2 - zi2 + cr;
        zi = temp + temp + ci;
        int x = (int)(maxX * (zr - realMin) / (realMax - realMin));
        int y = (int)(maxY * (zi - imaginaryMin) / (imaginaryMax - imaginaryMin));
       
        if( (x > 0) && (y > 0) && (x < maxX) && (y < maxY) && (iter > 2))
        {
            outputi[(y * hRes) + x] += 1;
        }
        iter++;
    }

 
  }

}



Title: Re: Buddhabrot on GPU
Post by: cbuchner1 on July 13, 2010, 11:33:53 PM
i finally have a working openCL buddhabroth. (inspired from my fortran code)

Nice code, but we're not cooking soup here ( http://en.wikipedia.org/wiki/Broth )  :evil1:


Title: Re: Buddhabrot on GPU
Post by: ker2x on July 13, 2010, 11:41:45 PM
i finally have a working openCL buddhabroth. (inspired from my fortran code)

Nice code, but we're not cooking soup here ( http://en.wikipedia.org/wiki/Broth )  :evil1:


woops :)))

BTW... here is a nicer buddhabrot

(http://fractals.s3.amazonaws.com/buddhabrot/buddhacl2.jpg)


Title: Re: Buddhabrot on GPU
Post by: cbuchner1 on July 13, 2010, 11:45:17 PM
really much better picture, but where does the asymmetry come from?


Title: Re: Buddhabrot on GPU
Post by: ker2x on July 14, 2010, 12:10:35 AM
really much better picture, but where does the asymmetry come from?



huhhh weird... i don't know.
It may come from here :
x = (int)(maxX * (zr - realMin) / (realMax - realMin));
y = (int)(maxY * (zi - imaginaryMin) / (imaginaryMax - imaginaryMin));

i'm not sure, but using (int) the fractional part may be discarded (truncated) instead of being rounded to the nearest integer.
I check that problem. (i have a bigger problem with "Out of ressource exception" when i have too many loops, but i need to read much more doc to understand that problem, so i check the problem you found now)

Edit : nope, it's something else

Edit : problem solved :
const int maxX = get_global_size(0)-1;
const int maxY = get_global_size(1)-1;

intead of :
const int maxX = get_global_size(0);
const int maxY = get_global_size(1);


Title: Re: Buddhabrot on GPU
Post by: cbuchner1 on July 14, 2010, 12:23:08 AM
huhhh weird... i don't know.

after reading the discussion here: http://erleuchtet.org/2010/07/ridiculously-large-buddhabrot.html I think the bright circles in your image are mostly caused by excape trajectories with very long orbits. And you may not run enough input samples to get a good "average" of all possible orbits. So a few orbits will stand out (all these bright "curls" in the image)

If you happen to sample in a regular grid (instead of a random sampling) and this grid is not perfectly symmetrical on the imaginary axis, maybe that would be causing the asymmetry.


Title: Re: Buddhabrot on GPU
Post by: ker2x on July 14, 2010, 12:32:59 AM
huhhh weird... i don't know.

after reading the discussion here: http://erleuchtet.org/2010/07/ridiculously-large-buddhabrot.html I think the bright circles in your image are mostly caused by excape trajectories with very long orbits. And you may not run enough input samples to get a good "average" of all possible orbits. So a few orbits will stand out (all these bright "curls" in the image)

If you happen to sample in a regular grid (instead of a random sampling) and this grid is not perfectly symmetrical on the imaginary axis, maybe that would be causing the asymmetry.

Yes, i haven't implemented the RNG in openCL yet. i directly map pixel -> complex plane coordinate.
See my edit above : the grid wasn't symetrical because i assumed that maxX was equal to the number pixel on X. but it's number of pixel - 1. (from 0 to size-1)

Still not that... i'll see tomorrow. i need to solve this problem of "out of ressource exception".


Title: Re: Buddhabrot on GPU
Post by: ker2x on July 14, 2010, 01:50:43 PM
The latest version of a working openCL code :

Code:
//A function to check if the choosen point is in the mandelbrot set
bool isInMSet(
  float cr,
  float ci,
  const unsigned int maxIter,
  const float escapeOrbit
)
{
    int iter = 0;
    float zr = 0.0;
    float zi = 0.0;
    float zr2 = zr * zr;
    float zi2 = zi * zi;
    float temp = 0.0;

    //Quick rejection check if c is in the 2nd order period bulb.
    if( sqrt( ((cr+1.0) * (cr+1.0)) + (ci * ci) ) < 0.25 )
    {
        return true;
    }

    //Quick rejection check if c is in the main cardiod
    //IF ((ABS( 1.0 - SQRT(1-(4*c)) ))  < 1.0 ) THEN RETURN TRUE (main cardioid)
    float tempi = ci*(-4.0);
    float tempr = 1.0 - cr*4.0;
    float theta = atan2(tempi,tempr)/2.0;
    float r = pow((tempr*tempr + tempi*tempi),0.25);
    tempr = 1.0 - r * cos(theta);
    tempi = -r * sin(theta);
    if( (tempr * tempr + tempi * tempi) < 1.0)
    {
        return true;
    }


    //Bruteforce check if c is escaping escapeOrbit (with a good old iteration up to maxIter)
    while( (iter < maxIter) && ((zr2 + zi2) < escapeOrbit) )
    {
        temp = zr * zi;
        zr2 = zr * zr;
        zi2 = zi * zi;
        zr = zr2 - zi2 + cr;
        zi = temp + temp + ci;
        iter++;
    }

    if ( iter < maxIter )
    {
        return false;
    } else {
        return true;
    }

}   

//Main kernel

__kernel void buddhabrot(
  const float realMax,
  const float imaginaryMax,
  const float realMin,
  const float imaginaryMin,
  const unsigned int maxIter,
  const unsigned int escapeOrbit,
  const unsigned int hRes,
  const float offset,
  __global int* outputi
) {

  const int xId = get_global_id(0);
  const int yId = get_global_id(1);
  const int offsetStep = get_global_id(2);

  const int maxX = get_global_size(0);
  const int maxY = get_global_size(1);

  const float deltaReal = (realMax - realMin) / (maxX - 1);
  const float deltaImaginary = (imaginaryMax - imaginaryMin) / (maxY - 1);

  float cr = realMin + (xId * deltaReal) + (offsetStep * offset);
  float ci = imaginaryMin + (yId * deltaImaginary);

  int iter = 0;
  float zr = 0.0;
  float zi = 0.0;
  float zr2 = zr * zr;
  float zi2 = zi * zi;
  float temp = 0.0;
  int x, y;
   
  if(isInMSet(cr, ci, maxIter, escapeOrbit) == false)
  {
    iter = 0;
    zr = 0.0;
    zi = 0.0;
    zr2 = zr * zr;
    zi2 = zi * zi;
    temp = 0.0;
    while ((iter < maxIter) && ((zr2 + zi2) < escapeOrbit) )
    {
      temp = zr * zi;
      zr2 = zr * zr;
      zi2 = zi * zi;
      zr = zr2 - zi2 + cr;
      zi = temp + temp + ci;
      x = (maxX * (zr - realMin) / (realMax - realMin));
      y = (maxY * (zi - imaginaryMin) / (imaginaryMax - imaginaryMin));
       
      if( (x > 0) && (y > 0) && (x < maxX) && (y < maxY) && (iter > 2))
      {
        outputi[(y * hRes) + x] += 1;
      }
      iter++;
    } //EndWhile
  } //EndIf
} //EndKernel



I still have this out of ressource problem returned by "clEnqueueReadBuffer" when the openCL code take to much time to process.
It's seems to be a NVIDIA/Windows problem. According to what i understand, windows think that the nvidia driver is frozen and it restart the driver.

I'm planning to code the random number generator, then do something like that : http://www.yakiimo3d.com/2010/03/29/dx11-directcompute-buddhabrot-nebulabrot/
Refresh the display every seconds, or something like that, so it won't timeout.


Title: Re: Buddhabrot on GPU
Post by: hobold on July 14, 2010, 03:21:16 PM
The only point in buying a GF100 is if you desperately need double precision and don't care about the downsides (price, power).

I am certainly not flaming, but having 480 cores (GTX 480) vs. 240 (GTX 285) may be a valid argument too.

And the support for function pointers, recursion, new/delete operators (in one of the upcoming CUDA toolkits) which clearly grants the programmer more options in algorithm design... Especially the recursion could be useful with fractals.
To clarify, my comparison was between GeForce 480 (GF100 chip) and GeForce 460 (GF104 chip). The GeForce 285 (GT200 chip) is an older generation and lacks capabilities such as those you mentioned from the latest CUDA version.


Title: Re: Buddhabrot on GPU
Post by: cbuchner1 on July 14, 2010, 03:35:44 PM
Quote
To clarify, my comparison was between GeForce 480 (GF100 chip) and GeForce 460 (GF104 chip). The GeForce 285 (GT200 chip) is an older generation and lacks capabilities such as those you mentioned from the latest CUDA version.

Ah, ok I misunderstood what you meant.

Check out this thread: http://forums.nvidia.com/index.php?s=ef4693f2411102e7259ba57bdba8f89f&showtopic=173877&pid=1087178&st=0&#entry1087178

"Double precision [on GTX 460] is 1/6th of the FP32 performance which is better than the 1/8th performance on the GTX470/480."

Unless you get a Tesla based on GF100 (which has all Double Precision ALUs enabled), the GF104 might be a better deal when you intend to run double precision arithmetics. The ratio of enabled double precision ALUs to CUDA cores has been improved.


Title: Re: Buddhabrot on GPU
Post by: ker2x on July 14, 2010, 03:56:09 PM
Currently working on the mersenne twister implementation... a real pain in the *ss  :sad1:


Title: Re: Buddhabrot on GPU
Post by: cbuchner1 on July 14, 2010, 05:21:11 PM
Currently working on the mersenne twister implementation... a real pain in the *ss  :sad1:

I might try this three-step approach in CUDA. Maybe I'll even draw the random numbers on the CPU. The speed-up of moving this to the GPU might be negligible.

a) find good candidates (i.e. very long escape trajectories, store the corresponding starting coordinates and orbit length)
b) sort this list according to descending orbit length
c) process trajectories of comparable length in the same work units ("blocks" in CUDA)

with b+c) I can guarantee that all threads belonging to the same work unit will roughly terminate simultaneously. This may give a performance boost because no threads will be idling.


My GTX 460 is waiting for me in the mail... UPDATE: what a pity! It's not working properly in my Windows XP Prof. 64bit machine. The driver does not initialize the card properly. Hmm...



Title: Re: Buddhabrot on GPU
Post by: ker2x on July 15, 2010, 01:18:10 AM
I rewrote the whole app. i use System.Windows.Forms instead of OpenTK (but still use Cloo for OpenCL).
The mersenne twister is still a work in progress, much harder than expected.

(http://fractals.s3.amazonaws.com/buddhabrot/buddhacl5.jpg)


Title: Re: Buddhabrot on GPU
Post by: ker2x on July 15, 2010, 10:38:37 AM
Some doc about monte-carlo (that's what we usually do to render a buddhabrot) on  GPU :
- http://www.atomic.physics.lu.se/biophotonics/our_research/monte_carlo_simulations/gpu_monte_carlo/
- http://omlc.ogi.edu/software/mc/

edit :
According to this paper : http://www.atomic.physics.lu.se/biophotonics/our_research/monte_carlo_simulations/gpu_monte_carlo/
i may try this http://www.ast.cam.ac.uk/~stg20/cuda/random/index.html instead of the mersenne twister, the 2^60 period is still good enough for a buddhabrot
( http://en.wikipedia.org/wiki/Multiply-with-carry )

edit2 : or this http://en.wikipedia.org/wiki/Xorshift  ?

Edit3 : many differents RNG implemented in C# http://www.hvass-labs.org/projects/randomops/cs/ (can easily be ported to OpenCL)


Title: Re: Buddhabrot on GPU
Post by: ker2x on July 16, 2010, 12:33:38 AM
i still have a lot of problems to implements de RNG.
So i implemented it on the cpu host for now.
It not yet optimised (by far!), but this result take ~5s
I'm sure i can do much better :)

(http://fractals.s3.amazonaws.com/buddhabrot/buddhacl6.jpg)


Title: Re: Buddhabrot on GPU
Post by: cbuchner1 on July 16, 2010, 12:45:36 AM

Splendid! Now we need the nebula color scheme.


Title: Re: Buddhabrot on GPU
Post by: kram1032 on July 16, 2010, 12:59:39 AM
getting better :D


Title: Re: Buddhabrot on GPU
Post by: ker2x on July 16, 2010, 07:45:48 PM
I just got a new laptop (Asus EeePC 1201PN, with a Ion2 GPU) and a legal version of Visutal Studio 2008 and 2010.
So i'll be able to use the OpenCL/Cuda profiler and ... tadammmm : http://developer.nvidia.com/object/nsight.html  \o/ \o/

So the development of the buddhabrot is on hold while i learn to use nsight and redevelop the app in C/C++ (eek!)


Title: Re: Buddhabrot on GPU
Post by: cbuchner1 on July 16, 2010, 07:51:26 PM
Congratulations!

nSight support for 2010 is not official, although there is a thread in the nVidia forum how to do a manual setup.

You're staying with OpenCL, right? Now that you have an nVidia GPU you might also want to look into the direction of CUDA, maybe.


Title: Re: Buddhabrot on GPU
Post by: ker2x on July 16, 2010, 08:03:52 PM
Congratulations!

nSight support for 2010 is not official, although there is a thread in the nVidia forum how to do a manual setup.

You're staying with OpenCL, right? Now that you have an nVidia GPU you might also want to look into the direction of CUDA, maybe.


I already had a NVidia (8800GTX), and the NVidia book i bought (Programming with massively parallel processors) focus on Cuda, with just a few page about openCL.
But i'll try my best to focus on OpenCL. I may do some cuda for early development, if i find it better than openCL, but i'll always rewrite the kernel in openCL :)

(Edit: i'll probably buy the pro version, the price is fair and it can profile openCL)
(Edit2: i hope it will work on the Ion2  :'( )


Title: Re: Buddhabrot on GPU
Post by: cbuchner1 on July 16, 2010, 10:14:29 PM

Last time I checked, nVidia's nSight required the GPU to be debugged to be separate from the display device. Either by connecting two PCs remotely, or by having a second GPU in the PC. Apparently the second option won't apply to your netbook.



Title: Re: Buddhabrot on GPU
Post by: ker2x on July 16, 2010, 10:19:27 PM
i have a bigger problem ... http://developer.nvidia.com/forums/index.php?showtopic=4899
Can't install the dev drivers for some unknown reason  :'(  :angry:

Last time I checked, nVidia's nSight required the GPU to be debugged to be separate from the display device. Either by connecting two PCs remotely, or by having a second GPU in the PC. Apparently the second option won't apply to your netbook.

yes, it's in the knowledge base... mmm... i'll see... for now i can't do anything at all :(


Title: Re: Buddhabrot on GPU
Post by: cbuchner1 on July 16, 2010, 10:40:29 PM
i have a bigger problem ... http://developer.nvidia.com/forums/index.php?showtopic=4899
Can't install the dev drivers for some unknown reason  :'(  :angry:

sometimes editing the nvdisp.inf file (or whatever it's called for the mobile drivers) can help.
Also there is a site laptopvideo2go.com which specializes in providing pre-modified inf files
for most nVidia driver releases. These are usually getting posted in a forum thread.

the forums on forums.nvidia.com seem to be more active than those on developer.nvidia.com.
You're more likely to find help there.




Title: Re: Buddhabrot on GPU
Post by: ker2x on July 17, 2010, 12:14:01 AM
i have a bigger problem ... http://developer.nvidia.com/forums/index.php?showtopic=4899
Can't install the dev drivers for some unknown reason  :'(  :angry:

sometimes editing the nvdisp.inf file (or whatever it's called for the mobile drivers) can help.
Also there is a site laptopvideo2go.com which specializes in providing pre-modified inf files
for most nVidia driver releases. These are usually getting posted in a forum thread.

the forums on forums.nvidia.com seem to be more active than those on developer.nvidia.com.
You're more likely to find help there.

thank you thank you thank you \o/
It works ! It's not the developper driver but, at least, my gfx card is seen as an openCL capable GPU by Geeks3D GPU Caps viewer and the openCL demo works \o/
I now sucessfully installed the Cuda SDK (which include the openCL SDK)

*hugs*  :embarrass:


Title: Re: Buddhabrot on GPU
Post by: ker2x on July 17, 2010, 12:32:05 AM
I ran some tests :
8800GTX Bandwidth (PCIE 16x) :
host -> device : 1600MB/s
device -> host : 1250MB/s
device -> device : 10GB/s
 
Ion (PCIE 1x?) :
host -> device : 128MB/s
device -> host : 160MB/s
device -> device : 7GB/s

Hum... this is going to be fun  :angel1:


Title: Re: Buddhabrot on GPU
Post by: cbuchner1 on July 17, 2010, 11:54:05 AM
I ran some tests :
8800GTX Bandwidth (PCIE 16x) :
host -> device : 1600MB/s
device -> host : 1250MB/s
device -> device : 10GB/s

Huh, device<->device bandwidth on a GTX 8800 should be higher. I'd expect something in the 70GB/sec range (86.4 GB/s theoretical peak)


Title: Re: Buddhabrot on GPU
Post by: ker2x on July 17, 2010, 09:04:50 PM
I built a simple separate console app to test the xorshift Random Number Generator, it works.  ;D


Code:
__kernel void xorshift(
    uint s1,
    uint s2,
    uint s3,
    uint s4,
    const int bufferSize,
    __global uint* outputUInt,
    __global float* outputFloat
)
{
    uint st;

    for(int i=0; i < bufferSize; i++)
    {
        st = s1 ^ (s1 << 11);
        s1 = s2;
        s2 = s3;
        s3 = s4;
        s4 = s4 ^ (s4 >> 19) ^ ( st ^ (st >> 18));
        outputUInt[i] = s4;
        outputFloat[i] = s4 / 4294967295.0;
    }
}


Title: Re: Buddhabrot on GPU
Post by: ker2x on July 18, 2010, 03:31:42 AM
i published the source code to github, including a working random number generator :)

http://github.com/ker2x/WinBuddhaOpenCL



Title: Re: Buddhabrot on GPU
Post by: ker2x on July 18, 2010, 04:05:45 PM
it become interesting. 4 millions samples/second at 1000 iterations on my slow Ion2 gpu.
Time to add colors  ;D

Edit :
I tested this http://www.microsoft.com/whdc/device/display/wddm_timeout.mspx to disable the driver timeout when the gpu code take too much time.
It worked on my Win7 (i had to create the key).

Edit2 : i added a samples/s counter. it's more like 2.5 Millions/s instead of 4 Millions   :angry:
Edit3 : 15 Millions/s on my 8800 GTX \o/
Edit4 : 18 Millions/s on a GTX260  :hmh:


Title: Re: Buddhabrot on GPU
Post by: ker2x on July 18, 2010, 08:22:49 PM
A not-so-deep zoom

            minIter = 100;
            maxIter = 10000;

            realMin = -1.2f;
            realMax = -0.8f;
            imaginaryMin = -0.4f;
            imaginaryMax = -0.1f;

(Yes, for some reason, the positive y axis is the negative imaginary axis. i'll fix it)

Considering the impressive computation speed as long as the point is out of the screen (so no read-write in the global memory) i do not see an obvious way to improve the deep zoom speed. (the statistical methods like hasting-metropolis-thingy involve a lot of memory access, afaik, so it's probably not good).

I need to do some rewrite and learning to implement colors, so colors will wait a few days.  :sad1:



Title: Re: Buddhabrot on GPU
Post by: cbuchner1 on July 18, 2010, 09:30:34 PM
Quote
Considering the impressive computation speed as long as the point is out of the screen (so no read-write in the global memory) i do not see an obvious way to improve the deep zoom speed.

When most of your pixels are out of the screen (i.e. deep zoom) you're bounded only by the floating point throughput of the card. You may want to make sure that all threads are busy at all times (e.g. by making a thread which ran out of work pick up a new random number and continue from there).


Title: Re: Buddhabrot on GPU
Post by: ker2x on July 18, 2010, 10:00:24 PM
Quote
Considering the impressive computation speed as long as the point is out of the screen (so no read-write in the global memory) i do not see an obvious way to improve the deep zoom speed.

When most of your pixels are out of the screen (i.e. deep zoom) you're bounded only by the floating point throughput of the card. You may want to make sure that all threads are busy at all times (e.g. by making a thread which ran out of work pick up a new random number and continue from there).


I run much more thread than available core (usually 1 million per frame, much more if i have a low (<1000) maxiter).
When a tread complete, a new thread pick a new pair of random number (according to its unique global_id) from the 2 randomBuffer.
The 2 randomBuffers are filled by another kernel. So i never need to transfer randoms from host to device.


Title: Re: Buddhabrot on GPU
Post by: cbuchner1 on July 18, 2010, 10:34:58 PM
I run much more thread than available core (usually 1 million per frame, much more if i have a low (<1000) maxiter).
When a tread complete, a new thread pick a new pair of random number (according to its unique global_id) from the 2 randomBuffer.
The 2 randomBuffers are filled by another kernel. So i never need to transfer randoms from host to device.

It may take longer that you expect for a new thread to start up:

The scheduling granularity of the nVidia hardware is in so-called "warps" of 32 threads. So while one thread within a warp is still active but the other threads have terminated, these threads will sit idle until the last working thread finishes.

A similar inefficiency exists at the block level in nVidia's Compute Capability 1.x hardware: New blocks are only fed to the GPU by the driver when about half of all active blocks have terminated (as an example you have 14 multiprocessors on a nVidia 8800GT, so 7 blocks must have terminated before new blocks are fed to the hardware). On Fermi architecture (Compute Capability >= 2.0) this behavior has been improved (block scheduling was moved from the driver into the hardware, which they named "Gigathread engine")

This should also apply to OpenCL in much the same way, even though the naming of threads, warps and blocks may be slightly different (a "block" becomes a "work group" in OpenCL)

I still can't get my GTX 460 to work under Linux or Windows XP Prof. (seems my mainboard BIOS doesn't like this card!), so I am starting to do Buddhabrots in CUDA on my laptop's 9600M graphics card (32 shaders). I will be trying to utilize shared memory as much as possible to prevent the scattered writes to global memory.



Title: Re: Buddhabrot on GPU
Post by: ker2x on July 18, 2010, 11:16:03 PM
in progress ... 175k sample/s at 100k max iteration

(http://fractals.s3.amazonaws.com/buddhabrot/buddha-100k-on-gpu.JPG)

A deeper zoom on the left "bulb" of the above pic at 10k iterations . (took a few hours to render)

(http://fractals.s3.amazonaws.com/buddhabrot/buddhaCLion4.jpg)


Title: Re: Buddhabrot on GPU
Post by: cbuchner1 on July 18, 2010, 11:46:31 PM
in progress ... 175k sample/s at 100k max iteration

How do you define sample? Does a sample represent a scattered write to the accumulation buffer, or is it a complex starting point for an orbit?

Is this number for your 8800 card or for the Ion2 ?


Title: Re: Buddhabrot on GPU
Post by: ker2x on July 18, 2010, 11:55:42 PM
in progress ... 175k sample/s at 100k max iteration

How do you define sample? Does a sample represent a scattered write to the accumulation buffer, or is it a complex starting point for an orbit?

Is this number for your 8800 card or for the Ion2 ?


numbers for the 8800GTX.
Technically, a "sample" is a thread. so it is what you call "a complex starting point for an orbit".


Title: Re: Buddhabrot on GPU
Post by: ker2x on July 19, 2010, 10:56:14 PM
Et la lumière fut ...

Color \o/  :D

(http://fractals.s3.amazonaws.com/buddhabrot/BuddhaOpenCL-color.jpg)


Title: Re: Buddhabrot on GPU
Post by: ker2x on July 19, 2010, 11:42:34 PM
Okay, a nicer one. around 10mn of computation on my Laptop.
Sorry about the ugly jpg compression, it's a screenshot, i need to implement a way to print the real result (in HDR if possible) with a lossless compression.
Source code available here : http://github.com/ker2x/WinBuddhaOpenCL

(http://fractals.s3.amazonaws.com/buddhabrot/buddhabrot-opencl-color-wide.jpg)


Title: Re: Buddhabrot on GPU
Post by: cbuchner1 on July 20, 2010, 12:26:01 AM
Nice, I also got some first B/W images today with CUDA. Using my tiled rendering method in shared memory seems to slow things down currently.

I also tried two buddhabrot variations:
1) I multiplied the contribution of each orbit with its length.
2) I divided the contribution of each orbit by its total length.

One could combine the original buddhabrot with these two variations in the three R,G,B channels to result in a much different colorization.

I could also imagine using some nonlinear functions, such as gaussians centered at different locations on the "number of iterations" axis, that emphasize different orbit lengths for the R,G, and B color channels.


Title: Re: Buddhabrot on GPU
Post by: ker2x on July 20, 2010, 12:42:10 AM
Nice, I also got some first B/W images today with CUDA

I also tried two buddhabrot variations:
1) I multiplied the contribution of each orbit with its length.
2) I divided the contribution of each orbit with its total length.


Yay \o/
I'd love to see your progress and results. I'm planning to try cuda too.
Feel free to post your dev diary here (with code if opensource (hopefully)).


Title: Re: Buddhabrot on GPU
Post by: cbuchner1 on July 20, 2010, 04:45:56 PM
Selectively plotting only the contributions for orbits of length <= 10 iterations and in the second picture those of length 10 to 20.
Or how about this third image: Selectively plotting only the 10th step from ANY orbit below the cutoff (256 iterations).

Combining all of this opens some interesting new ways of coloring. Creating animations by generating image sequences with parameter sweeps of the above restrictions is also possible. One could also compute "smooth" orbit lengths (instead of integer ones), which would allow for more gradual parameter sweeps.

So as you see I am currently more interested in exporing variations of the original Buddha- and Nebulabrots. The original horse has been beaten to death already, I am try to cross-breed some donkeys and zebras now.

I have a last one for you: Plotting iteration 5 of all orbits of length 100 and greater (but below the cutoff of 256)


Title: Re: Buddhabrot on GPU
Post by: ker2x on July 20, 2010, 06:58:12 PM
I have a last one for you: Plotting iteration 5 of all orbits of length 100 and greater (but below the cutoff of 256)

Very nice, and interesting :)
My code should be able to do the same thing. i'm going to try :)


Title: Re: Buddhabrot on GPU
Post by: ker2x on July 20, 2010, 07:02:54 PM
A new colorfull buddha :
(http://fractals.s3.amazonaws.com/buddhabrot/buddha-highcolor.jpg)

Red : 20k -> 60k iterations
Green : 60k -> 100k iterations
Blue : 100k -> 200k iterations


Title: Re: Buddhabrot on GPU
Post by: kram1032 on July 20, 2010, 10:15:14 PM
those are some nice variations :D


Title: Re: Buddhabrot on GPU
Post by: cbuchner1 on July 20, 2010, 11:38:11 PM
There are more things I want to try tomorrow.

I could try to restrict my random generator to a specific region in the complex plane. So the set of starting points for iterating will be from a small, confined region - so this will render only a small part of the orbits contributing to the "complete" buddhabrot.

I could then move this source region over the complex plane and watch the resulting buddhabrot change its shape. Also I could pick three different source regions and generate three distinct color channels from.

But first I need a way to create proper HDR exposures, save frames, combine three grayscale frames to a color image and assemble an  animation from this source data. Plenty of "boring" routine jobs before I get the exciting animations.


Title: Re: Buddhabrot on GPU
Post by: ker2x on July 21, 2010, 08:55:05 AM
i still have a problem with coloring, so i added just a pinch of contrast with irfanview :

(http://fractals.s3.amazonaws.com/buddhabrot/opencl-buddha-deep-sea.jpg)

the same one with heavy-duty postprocessing (reveal interesting details) :

(http://fractals.s3.amazonaws.com/buddhabrot/opencl-buddha-postprocessing.jpg)

same position, differents iteration :

(http://fractals.s3.amazonaws.com/buddhabrot/buddha-nebula.jpg)


Title: Re: Buddhabrot on GPU
Post by: cbuchner1 on July 21, 2010, 02:27:55 PM
so this will render only a small part of the orbits contributing to the "complete" buddhabrot.

No longer the average buddhabrot :

-all orbits start within axis-aligned ellipse bounded by (-2.0, 2.0), i*(-1.0, 0.0).
-only plotting orbits length 100 or more (but below cutoff 256)
-weighting each orbit with a factor proportional to ln(orbit length)

I have started looking into the Metropolis-Hastings algorithm to speed up zooming. I have also split off my random number generation from the actual rendering code and I think I can fully integrate this algorithm into the random generator. EDIT: yup, works.


Title: Re: Buddhabrot on GPU
Post by: cbuchner1 on July 22, 2010, 11:00:32 PM
I have started looking into the Metropolis-Hastings algorithm to speed up zooming.

Here is a first result of a moderately deep zoom. Rendered in 1280 x 800 pixels, centered around the
coordinates given below. It took about 30 minutes on a laptop graphic cards with 32 CUDA cores.
It would take about 2 minutes on a GTX 460, I guess. If I could only get this card to work in my
designated PC...

Brightness mapping was done using the formula powf(L, 0.5). I still had to use some contrast adjustment in
a paint program.

Code:
float zx = -0.0423594f, zy = -0.985749f, zoom = 120.f;

// the formula for computing the real and imaginary bounds is like this, ih/iw is the height to width ratio.
float zrmin = zx - 1.0f / zoom, zrmax = zx + 1.0f / zoom;
float zimin = zy - 1.0f / zoom * ih/iw, zimax = zy + 1.0f / zoom * ih/iw;

About the metropolis-hastings:

The "reference code" on Alexander Boswell's web site was a little confusing, he used way more exp and
log expressions in his probability model than mathematically needed. In the end I made some deviating
design decisions: A normal distribution for the small mutations instead of his exponential distribution. And
I choose one of 3 mutations in each CUDA thread, according to a probability model that is much different
from Boswell's (I kept his idea that "good" mutations are more likely to win).

I want to try something else for adding color now. Maybe different color spaces (not the usual R,G,B).
How about HSV for a change? We already have a V channel (intensity). Now we need to find some
coordinates for hue and saturation.

Hmm, maybe I could integrate the nVidia denoise SDK sample into this code. I still see a lot of image grain that a good filter could get rid of.


Title: Re: Buddhabrot on GPU
Post by: ker2x on July 23, 2010, 07:58:19 AM
I have started looking into the Metropolis-Hastings algorithm to speed up zooming.
I want to try something else for adding color now. Maybe different color spaces (not the usual R,G,B).
How about HSV for a change? We already have a V channel (intensity). Now we need to find some
coordinates for hue and saturation.

What about CMYB with 4 iteration level ? that's something on my todo list.
(that will require conversion to RGB to display on the screen)


Title: Re: Buddhabrot on GPU
Post by: ker2x on July 23, 2010, 09:00:55 AM
The metropolis-hastings look like a good idea, but :
- i don't understand it
- i don't know how to implement it efficiently (mostly because i don't understand it  :angry: )
- i'm not convinced

I haven't seen a buddhabrot using this algorithm that still have a nice "background". It seems to select only the orbit that draw mandelbrot shapes and focus on the center of the screen.
Addtionally, on the gpu, the real bottleneck is happening when the computed point is inside the screen (require one read+write on the global memory per color), computing points that won't be on screen is incredibely fast (unless you have extreme high iteration, of course. Then it could be a major waste of gpu cycle)

I may still try to implement it, just to be sure, if i can put my hand on code i can understand. But it's really not a priority (fixing bugs, and improve coloring are my priorities).

I'have seen enough mandelbrot shape in my life, i'm not chasing after them in the buddhabrot :)


Title: Re: Buddhabrot on GPU
Post by: cbuchner1 on July 23, 2010, 11:52:03 AM
I think Metropolis-Hastings is actually a misnomer. What it really is, is a "nonuniform random sampling" using tiny mutations of previous samples. The better mutation has a higher rate of suvival, a genetic algorithm really. I was able to hack it in a day.

Basically if you know that your previous sample has hit the screen at least once, you'll make a tiny mutation to the starting location (in the order of 1/100th to 1/10th of the zoom level). Find a metric that honors the number of times the screen is hit (and total iteration length as well) and determine randomly (based on the metric as weights) whether or not to the new sample will replace the previous one. Occasionally draw an entirely new sample to allow the system to find entirely new orbits.

There should be a bias towards longer iterations, and the more often the screen is hit, the better. This is basically why this algorithm may produce images that are biased towards longer orbits (and may indeed look different from uniformly sampled images).


Title: Re: Buddhabrot on GPU
Post by: lycium on July 23, 2010, 02:52:38 PM
Having implemented it in a 3D rendering engine, I should say that M-H sampling is neither a misnomer (?) nor a simple trick where you just add little offsets to random numbers...

Nevertheless, you've done some interesting work here Chris :)


Title: Re: Buddhabrot on GPU
Post by: cbuchner1 on July 23, 2010, 03:34:46 PM
Let's just say my implementation is no longer Metropolis Hastings because I deviate from the original Markov model. The problem with the original M-H (as described in the Wikipedia article) creates a whole lot of branch divergence which is bad for performance on this particular hardware architecture. The GPU works best when all threads of its scheduling unit perform the same work at the same time. When a few threads branch off performance suffers because the hardware serializes the different branches (one branch after the other instead of executing all in parallel).

In M-H there is a significant chance that the new (mutated) sample won't be accepted. Going back and making another mutation in this case causes mentioned branch divergence. Not attempting another mutation forces the affected thread idle (we cannot render any orbits for a rejected mutations) which is also very wasteful with computing resources. So whatever you do it's bad for performance because only some of the CUDA cores will do useful work.

So I choose to test three mutated samples in each thread, weighing each sample's survival probability using a metric similar (but not identical to) the original M-H algorithm. There is also a certain probability to stick with the previous sample and to not draw the orbit, but this probability has been much reduced. I guess my current approach breaks the underlying maths for markov chains. I would have to study some statistics text books to fix this, my higher maths skills have become rusty ;)


Title: Re: Buddhabrot on GPU
Post by: kram1032 on July 23, 2010, 05:59:29 PM
the RGB colourspace is pretty natural...
Well, you could do a direct frequency to frequency color space, so to say...

Just like sound: Amplitude * \e^{i*(f*\phi+phase)}

Where your f is basically the iteration count or maybe the period of that specific point.
Then use that (mapped linearly, logarithmically or what ever) to actual frequencies of light, perceivable by us and then finally convert those via La*b* to RGB.

Not sure if that'd work like this... Maybe it's way more complicated^^
But if it does, you should get some nice colours :)
Shifting or stretching the shape causes different frequencies to be visible :)


Title: Re: Buddhabrot on GPU
Post by: hobold on July 23, 2010, 06:00:54 PM
Conditional execution is a weak point of SIMD machines (just rephrasing your statement about GPUs in the terminology of the "multimedia" instruction set extensions of CPUs). However, I think it is wasted effort to try and achieve 100% utilization across the whole vector width. Scalar CPUs do not predict conditional branches with 100% accuracy either.

In other words, a bit of "warp divergence" is not necessarily a killer criterion for a particular algorithm. As long as you can maintain good average utilization, you will beat scalar processors handily. Re-arranging computation across vector lanes is rather expensive, so you might well be better off not doing it after every iteration, but only once after the current utilization drops below some threshold.

This used to be a bigger problem with SSE or AltiVec, where vectors are four lanes wide (in the case of 32 bit floats). One inactive lane already costs 25% of potential performance. But GPU vectors are 32 lanes wide, so you don't lose as much potential when one lane goes inactive.


Title: Re: Buddhabrot on GPU
Post by: cbuchner1 on July 23, 2010, 08:28:32 PM
Here is a new binary + source distribution. Binary is found in "Release" folder. You may need to place the DLLs found in separate attachments in the same folder.

Also contains CUDA source code. Tested with CUDA SDK 2.3 Just unzip the folder into the SDK's C/src directory and build.

I am allowing for interactive zooming and exploration now.
Left mouse button and dragging selects an area.
Hold Ctrl to zoom around the point where you clicked first.
Hold Shift to remove the aspect ratio lock during zooming.

The program shows the other available keystrokes on the console, mainly for controlling brightness/exposure.

The window can now be resized. The program also accepts --width=x and --height=y parameters on the command line
to set an initial window size. It may be easier to just hit the "maximize" window button though ;)

Christian


Title: Re: Buddhabrot on GPU
Post by: ker2x on July 23, 2010, 11:33:16 PM
Code:
The following error has occurred during XML parsing:

File: D:\projets\Buddhabrot-cuda\Buddhabrot\Buddhabrot_vc90.vcproj
Line: 22
Column: 4
Error Message:
Custom build rules file 'd:\projets\common\Cuda.rules' was not found or failed to load.
The file 'D:\projets\Buddhabrot-cuda\Buddhabrot\Buddhabrot_vc90.vcproj' has failed to load.


Title: Re: Buddhabrot on GPU
Post by: cbuchner1 on July 24, 2010, 01:54:54 AM
Custom build rules file 'd:\projets\common\Cuda.rules' was not found or failed to load.

Attaching my cuda.rules file. It is part of the CUDA 2.3 SDK, which you can get from the nVidia site.

http://developer.nvidia.com/object/cuda_2_3_downloads.html

Without the SDK, I guess my code won't compile. The Buddhabrot folder must go into the SDK's
C\src folder. Make sure you can compile and run the other graphical SDK samples first, before
trying the Buddhabrot. My code was based on the "Sobel" SDK sample code.

One more thing. My Analysiskernel requires sm_11 architecture (Compute capability 1.1) because
it makes use of global atomics. On your GTX 8800 you may want to comment out the Analysiskernel
call and change the project options to use sm_10 for compilation. This kernel only reports statistics
on how many threads actually plot useful orbits but it won't run on a GTX 8800 card with Compute
capability 1.0


Title: Re: Buddhabrot on GPU
Post by: cbuchner1 on July 24, 2010, 02:57:20 AM
What about CMYB with 4 iteration level ? that's something on my todo list.
(that will require conversion to RGB to display on the screen)

CMYK you mean? That's what Google suggests instead after searching for
CMYB.

I think the black (or Key) color is only used to save ink on the cyan, magenta
and yellow and also to allow for printing high detail text - so it is really useful
only for printing.

So maybe a CMY color space (without the black) would suffice for experiments
with the Buddhabrot.



Title: Re: Buddhabrot on GPU
Post by: kram1032 on July 24, 2010, 11:50:37 AM
I did an error above that made the colour stuff to not make much sense...
What I meant was:
directly map either the iteration count or the periodicy to the frequency of the light spectrum and then convert that to RGB, possibly based on receptor messures of the human eye.

Just tweak the mapping to highlight different parts.

Would that work?


Title: Re: Buddhabrot on GPU
Post by: cbuchner1 on July 24, 2010, 01:22:12 PM
directly map either the iteration count or the periodicy to the frequency of the light spectrum and then convert that to RGB, possibly based on receptor messures of the human eye.

Iteration count mapped to a spectral wavelength would mean I would only get the "pure" colors found in a rainbow. Still sounds nice. Maybe it will give the look of a soap bubble.


Title: Re: Buddhabrot on GPU
Post by: cbuchner1 on July 24, 2010, 06:39:53 PM
CUDA Buddhabrot has been updated to allow interactive zooming. I've just updated this once more
this night to allow for resizing of the window.

the EXE file, source code and first part of the DLLs are found here: http://www.fractalforums.com/index.php?topic=3614.msg19874#msg19874

Find the second DLL archive for updated CUDA Buddhabrot.zip (binary EXE) attached. Use them if
the EXE can't find these DLLs in your system. Just place them in the same folder as the EXE itself.

*Really* enjoy this on an nVidia GTX 260 or better. ;) You definitely need a CUDA capable graphics
cards and not too ancient drivers. The more shaders (CUDA cores) the better.

The next thing I'm shooting for will be color. But even in black & white it's already fun exploring.
I've placed 3 sample pictures in the gallery. All taken as screen shots and cropped in MS Paint ;)


Title: Re: Buddhabrot on GPU
Post by: kram1032 on July 25, 2010, 11:52:31 AM
if you use smooth iterations, you'd get a continuous spectrum... (Or am I wrong with that?^^)
Also, if you use fairly high iteration counts, you'd get fairly smooth spectra aswell... :)

Too bad I have a ATI that came before all the OpenCL stuff...
For CUDA, I'd need a NVIDIA, I guess?


Title: Re: Buddhabrot on GPU
Post by: hobold on July 25, 2010, 07:12:40 PM
CUDA is limited to Nvidia hardware. The various CUDA versions track the evolution of the hardware. So the newest features require the latest GPUs.

OpenCL is newer, and specifically created as a cross platform interface. As far as I am aware, AMD provides implementations for both their 'x86 CPUs and their (i.e. ATI's) newer GPUs. I believe there is an experimental implementation for IBM's Cell processor. And Nvidia supports it as well, even though it is theoretically a threat to CUDA.

Right now, CUDA is more feature rich and more mature then OpenGL. But the industry at large doesn't like lock in to a single vendor much. So my personal expectation is that OpenCL will quickly reach a point where it is reliable and available. But that is only one opinion.


Old GPUs are unlikely to retroactively gain support for CUDA and/or OpenCL. In most cases, the hardware simply doesn't have the capability.


Title: Re: Buddhabrot on GPU
Post by: kram1032 on July 25, 2010, 09:56:37 PM
yeah I know.

...which currently stops me from checking out your app.

Neiter CUDA (which would have worked for a Graphics Card of the same age, if it only was a NVIDIA) nor OpenCL for me :(


Title: Re: Buddhabrot on GPU
Post by: cbuchner1 on July 26, 2010, 06:23:23 PM
Neiter CUDA (which would have worked for a Graphics Card of the same age, if it only was a NVIDIA) not OpenCL for me :(

Well sorry, but you're missing out on the new color feature ;)  Source and binary attached. And the good news is that it isn't noticably slower than the previous grayscale version (given that you use the same maximum iteration count).

With these command line parameters you can now pass in maximum iteration count per R,G,B channels:

For example to get good old grayscale you'd use these arguments (note that these ARE case sensitive)
--maxR=1000 --maxG=1000 --maxB=1000

Defaults for maxR,G,B are 1000, 200, 40  so that's always a factor 5 between color channels

I have to admit that it is more difficult to get aesthetically pleasing deep zooms with colors, because my code for doing HDR to LDR mapping treat the RGB channels separately but with the same parameters. Hence one color channel may be overexposed, whereas the other appears is too faint. You'll see some of these problems in the second screenshot So I need to do some research on tone mapping algorithms.

Remember to grab the DLL archives from previous posts if you don't already have them and put the contents into the Release folder where the EXE file is.

If you want to get into GPU accelerated buddhabrot exploration sub $100, I'd recommend the nVidia GT240 card (96 shaders). Sub $200 I can recommend the GT 470 with 768MB RAM - but that's a power guzzler.... ATI may be faster at the same price, but doesn't have CUDA compatibility.

I am officially getting bored with the "traditional" coloring method, so I am venturing into physics based models (wavelength to RGB color mapping etc). Check back tomorrow for updates...


Christian



Title: Re: Buddhabrot on GPU
Post by: kram1032 on July 26, 2010, 06:51:19 PM
that's a really nice render yet again :)


Title: Re: Buddhabrot on GPU
Post by: cbuchner1 on July 26, 2010, 11:16:06 PM
that's a really nice render yet again :)

I think I can top this. Bring on the containment booms: we've got a fractal leak!

This is what you get when you map orbit length to a wavelength and then map that this a RGB color that the orbit will contribute.
Simply fractastic. Also check out the larger images I posted to the image gallery. The new coloring method is pretty intense.

I am attaching the new binary here. There are .BAT files in the Release folder now to launch the program in various coloring modes.
Edit these files to suit your likings (initial window size, color parameters etc) Remember to grab the DLLs from previous postings in
this thread if you don't have them yet.

Except for a screenshot and load/save coordinates feature, I think my program is now almost ready to be used for exploration.


Title: Re: Buddhabrot on GPU
Post by: cbuchner1 on July 27, 2010, 03:28:04 PM
directly map either the iteration count or the periodicy to the frequency of the light spectrum and then convert that to RGB, possibly based on receptor messures of the human eye.

Iteration count mapped to a spectral wavelength would mean I would only get the "pure" colors found in a rainbow. Still sounds nice. Maybe it will give the look of a soap bubble.


Indeed some renders have the "oil sheen" or "soap bubble" effect. Thank you for the inspiration. The code I used to map wavelength to RGB is essentially this one
http://www.physics.sfasu.edu/astro/color/spectra.html , but ported to C/CUDA



Title: Re: Buddhabrot on GPU
Post by: ker2x on July 27, 2010, 06:00:22 PM
I'm planning to rewrite some part of my app to use Image2D. nice tutorial here :  http://www.cmsoft.com.br/index.php?option=com_content&view=category&layout=blog&id=115&Itemid=172

Edit : i planned to .... but starcraft 2 is out \o/ brb  :dink:


Title: Re: Buddhabrot on GPU
Post by: cbuchner1 on July 27, 2010, 09:31:34 PM
I'm planning to rewrite some part of my app to use Image2D. nice tutorial here :  http://www.cmsoft.com.br/index.php?option=com_content&view=category&layout=blog&id=115&Itemid=172

Edit : i planned to .... but starcraft 2 is out \o/ brb  :dink:

ah, the equivalent of texture access in CUDA. I use this only in the post filtering technique for noise reduction, but the algorithm is currently too aggressive. I prefer the look of the original (noisier) images.
I could also use textures to map iterations to color. Basically a freely definable color texture would be more versatile than a fixed wavelength to RGB mapping formula. And it could make use of a second dimension too.

I also gamed a lot during the last week, mainly "Alan Wake" on xbox 360 and "Dead Space" on PC (with nVidia 3dVision goggles)


Title: Re: Buddhabrot on GPU
Post by: kram1032 on July 27, 2010, 10:33:32 PM
yay that looks great :D

So I had a good idea :)

Does this colouring technique generally feature different aspects of the fractal or do the overal details look just like with usual nebulabrot colouring methods? (eg: how does the full object look like in comparison to the standard colouring method?^^)


Title: Re: Buddhabrot on GPU
Post by: cbuchner1 on July 28, 2010, 12:44:31 AM
Does this colouring technique generally feature different aspects of the fractal or do the overal details look just like with usual nebulabrot colouring methods? (eg: how does the full object look like in comparison to the standard colouring method?^^)

Generally the new method renders everything to the configured maximum iteration count, however some short or long orbits may be outside of the visible spectrum, depending on how the mapping for iterations to wavelength is configured.

Now the current wavelength to RGB code limits the wavelength to 380 to 780nm, values larger or smaller are bounded to the limits. The color intensity near the edges gets reduced somewhat, but due to the bounding this does never entirely fade to "invisible" light emission

(http://www.physics.sfasu.edu/astro/color/rgb.gif)

This may be one of the reasons why there is so much red in the deeper zooms. I will try making the function unbounded and fade to zero at the edges and see if this will improve the situation.

Here are more ideas I am toying with:

A) I am playing with the idea to render the whole thing in three dimensions because I own one of these modern 3D monitors and shutter glasses. (two views are generated, either using orbit length as a z value and adding a distance-dependent parallax displacement - or alternatively rendering a Buddhagram and rotating one axis slightly to create parallaxed views),

B) not just creating emissive light spectra, but also absorption spectra similar to the ones observed in space. So some orbits or iteration counts would then remove some light again. I need to think more about this, and how it could be integrated into the render process (alpha blending or similar methods?). Individual orbits - possibly very long ones - would then swallow light at certain wavelengths.



Title: Re: Buddhabrot on GPU
Post by: kram1032 on July 28, 2010, 02:31:20 AM
All those ideas sound awesome :D
I'd love to see a buddhagram version of this :D
You could also try a buddhabulb...
or a, err.... bulbagram?

However, all I meant was, how does the full buddhabrot spectral version look like? (As you did show a zoom of it^^)


Title: Re: Buddhabrot on GPU
Post by: ker2x on July 28, 2010, 08:44:06 PM
updated the public git repository : http://github.com/ker2x/WinBuddhaOpenCL


Title: Re: Buddhabrot on GPU
Post by: ker2x on July 28, 2010, 09:01:57 PM
I found a weird optimization. It is, in fact, documented in NVidia OpenCL Best Practice Guide :

Register dependencies arise when an instruction uses a result stored in a register written by an instruction before it. The latency on current CUDA-enabled GPUs is approximately 24 cycles, so threads must wait 24 cycles before using an arithmetic result.


So this code :
Code:
        while( (iter < maxIter) && ((zr*zr+zi*zi) < escapeOrbit) )
        {
            temp = zr * zi;
            zr = zr*zr - zi*zi + cr;
            zi = temp + temp + ci;
            //etc ....
        }

is faster than :

Code:
        while( (iter < maxIter) && ((zr2+zi2) < escapeOrbit) )
        {
            temp = zr * zi;
            zr2 = zr * zr;
            zi2 = zi * zi;
            zr = zr2 - zi2 + cr;
            zi = temp + temp + ci;
            //etc ....
        }


Title: Re: Buddhabrot on GPU
Post by: ker2x on August 07, 2010, 02:10:59 AM
i found an insane bug in my code.
I was generating random complex point only in the range of visible screen, instead of point in the range of -2 to 2.
Oddly enough, the result was not bad at all.
I patched it, but it's take much more time to generate a good looking buddhabrot  :sad1:


Title: Re: Buddhabrot on GPU
Post by: cbuchner1 on August 09, 2010, 01:36:52 AM
i found an insane bug in my code.

oops! ;)  It appears you've switched to Mandelbulbs now.

I am making some progress with nVidia's 3DVision stereo blit API. Now I need to merge this code with some nVidia sample code that allows CUDA to write to a DirectX texture - when done I should be able to do some Buddhabrot renders in 3D stereo.

The one year old nvidia drivers 186.18 have an anaglyph mode ("3DVision Discover") that shows the output in red/cyan anaglyph mode. Too bad they removed this option for older GPUs in later drivers. I actually had to go back to this ancient driver to do some testing on my laptop.

The thing needs to run in full screen mode, that is the only downside.


Title: Re: Buddhabrot on GPU
Post by: ker2x on August 09, 2010, 06:43:58 AM
i found an insane bug in my code.

oops! ;)  It appears you've switched to Mandelbulbs now.


I'm learning mandelbulb and this thing named "triplex" by coding.
I'll be back to buddhabrot soon after that, i'm not satisfied with actual "implementation" of both 3D Mandelbrot and 3D Buddhabrot.
I'm not sure if i'll find anything, but i have to try ... because i can :)


Title: Re: Buddhabrot on GPU
Post by: cbuchner1 on August 11, 2010, 02:11:55 AM
I am making some progress with nVidia's 3DVision stereo blit API.

I finally managed to render something to screen using CUDA in 3D stereoscopic mode. Both red/cyan glasses and LCD shutter goggles work.  3D Buddhas approaching anytime now ,)


Title: Re: Buddhabrot on GPU
Post by: ker2x on September 06, 2010, 09:42:06 PM
guess what i bought today ...

http://www.nvidia.com/object/3d-vision-home-users.html   :embarrass:


Title: Re: Buddhabrot on GPU
Post by: cbuchner1 on September 06, 2010, 10:13:16 PM
guess what i bought today ...

What's the monitor you're using with these fantastic nerd goggles?


Title: Re: Buddhabrot on GPU
Post by: ker2x on September 07, 2010, 08:50:56 AM
guess what i bought today ...

What's the monitor you're using with these fantastic nerd goggles?


samsung 2233RZ, 22"
it was one of the 1st 3D Vision screen. Very expensive (450€) when it was out, it now cost 250€ (bought it last week) which is fair for a good screen (still a bit expensive for a 22"), and the cheapest 120hz "3D visions" screen i found.

A lot of games are not really playable with 3D Vision : ugly 2D effect, 2D billboard that render at wrong depth, ...
For same games it's "ok" : texts render at wrong depth (eg: guild wars)
And for a few games, it's perfect and mind blowing : Left4Dead, RUSE, Avatar, ...


Title: Re: Buddhabrot on GPU
Post by: ker2x on December 03, 2010, 08:57:47 PM
I cleaned my openCL code.

Code:
//Check if choosen point is in MSet
bool isInMSet(
    const float2 c,
    const uint minIter,
    const uint maxIter,
    const float escapeOrbit)
{
    int iter = 0;
    float2 z = 0.0;

    if( !(((c.x-0.25)*(c.x-0.25) + (c.y * c.y))*(((c.x-0.25)*(c.x-0.25) + (c.y * c.y))+(c.x-0.25)) < 0.25* c.y * c.y))  //main cardioid
    {
        if( !((c.x+1.0) * (c.x+1.0) + (c.y * c.y) < 0.0625))            //2nd order period bulb
        {
            if (!(( ((c.x+1.309)*(c.x+1.309)) + c.y*c.y) < 0.00345))    //smaller bulb left of the period-2 bulb
            {
                if (!((((c.x+0.125)*(c.x+0.125)) + (c.y-0.744)*(c.y-0.744)) < 0.0088))      // smaller bulb bottom of the main cardioid
                {
                    if (!((((c.x+0.125)*(c.x+0.125)) + (c.y+0.744)*(c.y+0.744)) < 0.0088))  //smaller bulb top of the main cardioid
                    {
                        while( (iter < maxIter) && (z.x*z.x + z.y*z.y < escapeOrbit) )      //Bruteforce check 
                        {
                            z = (float2)(z.x * z.x - z.y * z.y, (z.x * z.y * 2.0)) + c;
                            iter++;
                        }
                        if( (iter > minIter) && (iter < maxIter))
                        {
                            return false;
                        }
                    }
                }
            }
        }
    }
    return true;
}

//Main kernel
__kernel void buddhabrot(
    const float realMin,
    const float realMax,
    const float imaginaryMin,
    const float imaginaryMax,
    const uint  minIter,
    const uint  maxIter,
    const uint  width,
    const uint  height,
    const float escapeOrbit,
    const uint4 minColor,
    const uint4 maxColor,
    __global float2* randomXYBuffer,
    __global uint4*  outputBuffer)
{

    float2 rand = randomXYBuffer[get_global_id(0)];   

    const float deltaReal = (realMax - realMin);
    const float deltaImaginary = (imaginaryMax - imaginaryMin);

    //mix(a,b,c) = a + (b-a)*c //(c must be in the range 0.0 ... 1.0
    float2 c = (float2)(mix(realMin, realMax, rand.x) , mix(imaginaryMin, imaginaryMax, rand.y));

    if( isInMSet(c, minIter, maxIter, escapeOrbit) == false)
    {
        int x, y;
        int iter = 0;
        float2 z = 0.0;
   
        while( (iter < maxIter) && ((z.x*z.x+z.y*z.y) < escapeOrbit) )
        {
            z = (float2)(z.x * z.x - z.y * z.y, (z.x * z.y * 2.0)) + c;
            x = (width * (z.x - realMin) / deltaReal);
            y = (height * (z.y - imaginaryMin) / deltaImaginary);

            if( (iter > minIter) && (x>0) && (y>0) && (x<width) && (y<height) )
            {
                if( (iter > minColor.x) && (iter < maxColor.x) ) { outputBuffer[x + (y * width)].x++; }
                if( (iter > minColor.y) && (iter < maxColor.y) ) { outputBuffer[x + (y * width)].y++; }
                if( (iter > minColor.z) && (iter < maxColor.z) ) { outputBuffer[x + (y * width)].z++; }
            }
            iter++;
        }
    }
}

__kernel void xorshift(
    uint s1,
    uint s2,
    const int bufferSize,
    __global float2* randomXYBuffer
)
{
    uint st;

    for(int i=0; i < bufferSize; i++)
    {
        st = s1 ^ (s1 << 11);
        s1 = s2;
        s2 = s2 ^ (s2 >> 19) ^ ( st ^ (st >> 18));
        randomXYBuffer[i] = (float2)((float)st / UINT_MAX,(float)s1 / UINT_MAX);
    }
}


Now i'll be able to test some other idea :)


Title: Re: Buddhabrot on GPU
Post by: ant123 on April 03, 2011, 11:58:16 PM
you might be able to make it using shaders for milkdrop vis, although it would faster it could look abit lamer.