Logo by AGUS - Contribute your own Logo!

END OF AN ERA, FRACTALFORUMS.COM IS CONTINUED ON FRACTALFORUMS.ORG

it was a great time but no longer maintainable by c.Kleinhuis contact him for any data retrieval,
thanks and see you perhaps in 10 years again

this forum will stay online for reference
News: Did you know ? you can use LaTex inside Postings on fractalforums.com!
 
*
Welcome, Guest. Please login or register. March 29, 2024, 12:09:13 AM


Login with username, password and session length


The All New FractalForums is now in Public Beta Testing! Visit FractalForums.org and check it out!


Pages: [1] 2 3 ... 7   Go Down
  Print  
Share this topic on DiggShare this topic on FacebookShare this topic on GoogleShare this topic on RedditShare this topic on StumbleUponShare this topic on Twitter
Author Topic: Buddhabrot on GPU  (Read 36801 times)
0 Members and 2 Guests are viewing this topic.
ker2x
Fractal Molossus
**
Posts: 795


WWW
« on: July 12, 2010, 06:41:02 PM »

I'm moving this discussion : http://www.fractalforums.com/images-showcase-(rate-my-fractal)/the-infinity-fields-(detailed-buddhabrot-zoom)/
on this thread (as requested).

The discussion is about fiding an efficient way to compute a buddhabrot on a GPU :

The GPU is not supposed to be efficient with that kind of computation, but let's try ! smiley

The speed problem is with the scattered read-modify-writes to global memory. I am thinking mainly about the CUDA architecture now:

Wouldn't it be faster to use the 16kb of shared memory (64 kb on Fermi) as some kind of independent "mini framebuffer" tiles, and accumulating the writes within shared memory only - as if one rendered a lot of independent deep zooms? Techniques to render Buddhabrot zooms exist (with appropriate non-uniform sampling optimizations). And applying these techniques to individual tiles that make up a larger image might just work. Every multiprocessor would get to work on its own tile (shared memory is individual to each multiprocessor). For those tiles that finish rendering, the multiprocessor will immediately get to work on another tile (on Fermi at least). One would not need any writes to global memory - speeding up the process by an order of magnitude maybe -  until the very end when the completed tile is written out.

Christian


Logged

often times... there are other approaches which are kinda crappy until you put them in the context of parallel machines
(en) http://www.blog-gpgpu.com/ , (fr) http://www.keru.org/ ,
Sysadmin & DBA @ http://www.over-blog.com/
ker2x
Fractal Molossus
**
Posts: 795


WWW
« Reply #1 on: July 12, 2010, 07:00:19 PM »

I'm planning to try in C# (Using Visual Express 2010) + OpenTk and Cloo ( http://www.opentk.com (Cloo in the OpenCL framework for C#))

Edit :
i finished to write the boring code (and learning OpenTK and Cloo).
For now i just have a very classic Mandelbrot that prove that i understood how to do openCL in C# smiley
In the next few day, i'll rewrite my Mandelbrot app to (try to) render a buddhabrot in OpenCL \o/

« Last Edit: July 12, 2010, 11:29:33 PM by ker2x » Logged

often times... there are other approaches which are kinda crappy until you put them in the context of parallel machines
(en) http://www.blog-gpgpu.com/ , (fr) http://www.keru.org/ ,
Sysadmin & DBA @ http://www.over-blog.com/
cbuchner1
Fractal Phenom
******
Posts: 443


« Reply #2 on: July 12, 2010, 08:39:46 PM »

I'm planning to try in C# (Using Visual Express 2010) + OpenTk and Cloo ( http://www.opentk.com (Cloo in the OpenCL framework for C#))

More details.

nvidia Fermi architecture has up to 48kb of shared memory (+16kb L1 cache) per multiprocessor.  This fits a 64x64 pixel RGB tile with 32 bits per color channel.

And Fermi has 768kb of common L2 cache. This should greatly boost performance of read-modify-write operations as needed for Buddhabrots when accessing the card's main (global) memory. Also atomic operations are said to work much faster on Fermi compared to previous generations.

Not sure how much of these hardware features are accessible through OpenCL. I am more the CUDA person myself.

I just ordered a GTX 460 card with 1GB for some tinkering. My first Fermi based card. ;-) Finally a fermi card that does not set the house on fire and has a price point somewhere below insanity.

EDIT: a DX11 compute shader version is found here: http://www.yakiimo3d.com/2010/03/29/dx11-directcompute-buddhabrot-nebulabrot/
some further AMD optimizations in this thread: http://forum.beyond3d.com/showthread.php?t=57042
« Last Edit: July 12, 2010, 10:01:31 PM by cbuchner1 » Logged
ker2x
Fractal Molossus
**
Posts: 795


WWW
« Reply #3 on: July 12, 2010, 11:40:35 PM »


Interesting to see that it's still fast with a non-optimized version.

If anyone want a working codebase for OpenTK + Cloo, here is my code (it render a mandelbrot for now) :
Code:
using System;
using System.Collections.Generic;
using System.Text;
using System.Drawing;
using System.Drawing.Imaging;
using System.Runtime.InteropServices;

using OpenTK;
using OpenTK.Graphics;
using OpenTK.Graphics.OpenGL;
using OpenTK.Input;



using Cloo;


namespace TKBuddhabrot
{
    class TKBuddhabrot : GameWindow
    {

private static string kernelSource = @"
__kernel void mandelbrot(
  const float deltaReal,
  const float deltaImaginary,
  const float realMin,
  const float imaginaryMin,
  const unsigned int maxIter,
  const unsigned int escapeOrbit,
  const unsigned int hRes,
  __global int* outputi
) {

  int xId = get_global_id(0);
  int yId = get_global_id(1);

  float realPos = realMin + (xId * deltaReal);
  float imaginaryPos = imaginaryMin + (yId * deltaImaginary);
  float real = realPos;
  float imaginary = imaginaryPos;
  float realSquared = real * real;
  float imaginarySquared = imaginary * imaginary;

  int iter = 0;
  while ( (iter < maxIter) && ((realSquared + imaginarySquared) < escapeOrbit) )
  {
    imaginary = (2 * (real * imaginary)) + imaginaryPos;
    real = realSquared - imaginarySquared + realPos;
    realSquared = real * real;
    imaginarySquared = imaginary * imaginary;
    iter++;
  }
  if(iter >= maxIter){
        iter = 0;
  }
  outputi[(yId * hRes) + xId] = iter;
}



";

        ComputePlatform platform;
        ComputeContextPropertyList properties;
        ComputeContext context;

        Bitmap bmp;

        float realMin, realMax, imaginaryMin, imaginaryMax, deltaReal, deltaImaginary;
        int maxiter, screenSizeInPixel, escapeOrbit;

        static int initialScreenWidth = 800;
        static int initialScreenHeight = 800;



        /// <summary>Creates a window with the specified title.</summary>
        public TKBuddhabrot() : base(initialScreenWidth, initialScreenHeight, GraphicsMode.Default, "TKBuddhabrot")
        {
            VSync = VSyncMode.On;
        }

        /// <summary>Load resources here.</summary>
        /// <param name="e">Not used.</param>
        protected override void OnLoad(EventArgs e)
        {
            base.OnLoad(e);

            //Create Bitmap
            bmp = new Bitmap(ClientRectangle.Width, ClientRectangle.Height);

            //OpenGL Stuff
            GL.ClearColor(0.1f, 0.2f, 0.5f, 0.0f);

            //OpenCL initialisation
            platform = ComputePlatform.Platforms[0];
            Console.WriteLine("Compute platform : " + platform.ToString());

            properties = new ComputeContextPropertyList(platform);           
            context = new ComputeContext(platform.Devices, properties, null, IntPtr.Zero);
            Console.WriteLine("Compute context : " + context.ToString());

            //Mandelbrot Specific
            realMin = -2.25f;
            realMax = 0.75f;
            imaginaryMin = -1.5f;
            imaginaryMax = 1.5f;
            maxiter = 64;
            escapeOrbit = 4;

            deltaReal = (realMax - realMin) / (ClientRectangle.Width - 1);
            deltaImaginary = (imaginaryMax - imaginaryMin) / (ClientRectangle.Height - 1);
            screenSizeInPixel = ClientRectangle.Width * ClientRectangle.Height;

            //OpenCL Buffer
            ComputeBuffer<float> kernelOutput = new ComputeBuffer<float>(context, ComputeMemoryFlags.WriteOnly, screenSizeInPixel);

            //Build OpenCL kernel
            ComputeProgram program = new ComputeProgram(context, new string[] { kernelSource });
            program.Build(null, null, null, IntPtr.Zero);
            ComputeKernel kernel = program.CreateKernel("mandelbrot");

            //OpenCL args
            //  const float deltaReal,
            //  const float deltaImaginary,
            //  const float realMin,
            //  const float imaginaryMin,
            //  const unsigned int maxIter,
            //  const unsigned int escapeOrbit,
            //  const unsigned int hRes,
            //  __global int* outputi

            kernel.SetValueArgument<float>(0, deltaReal);
            kernel.SetValueArgument<float>(1, deltaImaginary);
            kernel.SetValueArgument<float>(2, realMin);
            kernel.SetValueArgument<float>(3, imaginaryMin);
            kernel.SetValueArgument<int>(4, maxiter);
            kernel.SetValueArgument<int>(5, escapeOrbit);
            kernel.SetValueArgument<int>(6, ClientRectangle.Width);
            kernel.SetMemoryArgument(7, kernelOutput);

            //Execute
            ComputeCommandQueue commands = new ComputeCommandQueue(context, context.Devices[0], ComputeCommandQueueFlags.None);
            ComputeEventList events = new ComputeEventList();
            commands.Execute(kernel, null, new long[] { ClientRectangle.Width, ClientRectangle.Height }, null, events);

            //Get result
            int[] kernelResult = new int[screenSizeInPixel];
            GCHandle kernelResultHandle = GCHandle.Alloc(kernelResult, GCHandleType.Pinned);

            commands.Read(kernelOutput, false, 0, screenSizeInPixel, kernelResultHandle.AddrOfPinnedObject(), events);
            commands.Finish();

            //Finish openCL stuff
            kernelResultHandle.Free();
           
            int maxfound = 0;
            foreach (int iter in kernelResult)
            {
                if (iter > maxfound) maxfound = iter;
            }

            //Use the result
            int x, y;
            for (x = 0; x < bmp.Width; x++)
            {
                for (y = 0; y < bmp.Height; y++)
                {
                    Color c = Color.FromArgb((int)((float)kernelResult[x + y * bmp.Width] / maxfound * 255.0),
                        (int)((float)kernelResult[x + y * bmp.Width] / maxfound * 255.0),
                        (int)((float)kernelResult[x + y * bmp.Width] / maxfound * 255.0)
                        );

                    bmp.SetPixel(x, y, c);
                }
            }

            BitmapData bmp_data = bmp.LockBits(new Rectangle(0, 0, bmp.Width, bmp.Height), ImageLockMode.ReadOnly, System.Drawing.Imaging.PixelFormat.Format32bppArgb);

            GL.TexImage2D(TextureTarget.Texture2D, 0, PixelInternalFormat.Rgba, bmp_data.Width, bmp_data.Height, 0,
                OpenTK.Graphics.OpenGL.PixelFormat.Bgra, PixelType.UnsignedByte, bmp_data.Scan0);

            bmp.UnlockBits(bmp_data);
            GL.TexParameter(TextureTarget.Texture2D, TextureParameterName.TextureMinFilter, (int)TextureMinFilter.Linear);
            GL.TexParameter(TextureTarget.Texture2D, TextureParameterName.TextureMagFilter, (int)TextureMagFilter.Linear);

            Console.WriteLine("done");

        }

        /// <summary>
        /// Called when your window is resized. Set your viewport here. It is also
        /// a good place to set up your projection matrix (which probably changes
        /// along when the aspect ratio of your window).
        /// </summary>
        /// <param name="e">Not used.</param>
        protected override void OnResize(EventArgs e)
        {
            base.OnResize(e);

            GL.Viewport(ClientRectangle.X, ClientRectangle.Y, ClientRectangle.Width, ClientRectangle.Height);

            Matrix4 projection = Matrix4.CreatePerspectiveFieldOfView((float)Math.PI / 4, Width / (float)Height, 1.0f, 64.0f);
            GL.MatrixMode(MatrixMode.Projection);
            GL.LoadMatrix(ref projection);
        }

        /// <summary>
        /// Called when it is time to setup the next frame. Add you game logic here.
        /// </summary>
        /// <param name="e">Contains timing information for framerate independent logic.</param>
        protected override void OnUpdateFrame(FrameEventArgs e)
        {
            base.OnUpdateFrame(e);

            if (Keyboard[Key.Escape])
                Exit();
        }

        /// <summary>
        /// Called when it is time to render the next frame. Add your rendering code here.
        /// </summary>
        /// <param name="e">Contains timing information.</param>
        protected override void OnRenderFrame(FrameEventArgs e)
        {
            base.OnRenderFrame(e);

            GL.Clear(ClearBufferMask.ColorBufferBit | ClearBufferMask.DepthBufferBit);

            GL.Disable(EnableCap.DepthTest);
            GL.Enable(EnableCap.Texture2D);
            GL.Enable(EnableCap.Blend);
            GL.BlendFunc(BlendingFactorSrc.SrcAlpha, BlendingFactorDest.OneMinusSrcAlpha);


            GL.MatrixMode(MatrixMode.Projection);

            GL.LoadIdentity();
            GL.Ortho(0, ClientRectangle.Width, ClientRectangle.Height, 0, 0, 1);
         
 
            GL.Begin(BeginMode.Polygon);
                GL.TexCoord2(0.0, 1.0);
                GL.Vertex2(0, 0);

                GL.TexCoord2(1.0, 1.0);
                GL.Vertex2(ClientRectangle.Width, 0);

                GL.TexCoord2(1.0, 0.0);
                GL.Vertex2(ClientRectangle.Width, ClientRectangle.Height);

                GL.TexCoord2(0.0, 0.0);
                GL.Vertex2(0,ClientRectangle.Height);
            GL.End();
           
            SwapBuffers();
        }

        /// <summary>
        /// The main entry point for the application.
        /// </summary>
        [STAThread]
        static void Main()
        {
            // The 'using' idiom guarantees proper resource cleanup.
            // We request 30 UpdateFrame events per second, and unlimited
            // RenderFrame events (as fast as the computer can handle).
            using (TKBuddhabrot game = new TKBuddhabrot())
            {
                Console.WriteLine("Display device List (may be usefull for debug)");
                foreach (DisplayDevice device in DisplayDevice.AvailableDisplays)
                {

                    Console.WriteLine("-------------");
                    Console.WriteLine("is primary : " + device.IsPrimary);
                    Console.WriteLine("bound : " + device.Bounds);
                    Console.WriteLine("Refresh rate : " + device.RefreshRate);
                    Console.WriteLine("bpp : " + device.BitsPerPixel);
                    //foreach (DisplayResolution res in device.AvailableResolutions) { Console.WriteLine(res); }

                }
                Console.WriteLine("-------------");
                game.Run(30.0);
            }
        }


    }
}
« Last Edit: July 12, 2010, 11:46:48 PM by ker2x » Logged

often times... there are other approaches which are kinda crappy until you put them in the context of parallel machines
(en) http://www.blog-gpgpu.com/ , (fr) http://www.keru.org/ ,
Sysadmin & DBA @ http://www.over-blog.com/
lycium
Fractal Supremo
*****
Posts: 1158



WWW
« Reply #4 on: July 13, 2010, 12:45:17 AM »

I'm planning to try in C# (Using Visual Express 2010) + OpenTk and Cloo ( http://www.opentk.com (Cloo in the OpenCL framework for C#))

More details.

nvidia Fermi architecture has up to 48kb of shared memory (+16kb L1 cache) per multiprocessor.  This fits a 64x64 pixel RGB tile with 32 bits per color channel.

And Fermi has 768kb of common L2 cache. This should greatly boost performance of read-modify-write operations as needed for Buddhabrots when accessing the card's main (global) memory. Also atomic operations are said to work much faster on Fermi compared to previous generations.

Not sure how much of these hardware features are accessible through OpenCL. I am more the CUDA person myself.

I just ordered a GTX 460 card with 1GB for some tinkering. My first Fermi based card. ;-) Finally a fermi card that does not set the house on fire and has a price point somewhere below insanity.

EDIT: a DX11 compute shader version is found here: http://www.yakiimo3d.com/2010/03/29/dx11-directcompute-buddhabrot-nebulabrot/
some further AMD optimizations in this thread: http://forum.beyond3d.com/showthread.php?t=57042

chris, you're right of course about the fermi arch, but you didn't actually buy one of those chips! the gtx 460 lacks the cache and many compute-oriented features...
Logged

ker2x
Fractal Molossus
**
Posts: 795


WWW
« Reply #5 on: July 13, 2010, 09:53:58 AM »

chris, you're right of course about the fermi arch, but you didn't actually buy one of those chips! the gtx 460 lacks the cache and many compute-oriented features...

Ho wow... and i was ready to buy one. Thx for the info !
Logged

often times... there are other approaches which are kinda crappy until you put them in the context of parallel machines
(en) http://www.blog-gpgpu.com/ , (fr) http://www.keru.org/ ,
Sysadmin & DBA @ http://www.over-blog.com/
cbuchner1
Fractal Phenom
******
Posts: 443


« Reply #6 on: July 13, 2010, 10:28:52 AM »

chris, you're right of course about the fermi arch, but you didn't actually buy one of those chips! the gtx 460 lacks the cache and many compute-oriented features...

(*) citation needed

GTX 460 is Compute Capability 2.1 and has a higher double precision throughput than the GTX 470 for example. I am all excited about it.

After some research, I found out the L2 caches were indeed shrunk a bit.
L2-Caches on GTX 460: 512 KB on the 1GB model, 384 kb on the 768MB model
                   (768 KB on GF100, i.e. GTX 470/480)
« Last Edit: July 13, 2010, 11:49:39 AM by cbuchner1 » Logged
hobold
Fractal Bachius
*
Posts: 573


« Reply #7 on: July 13, 2010, 01:58:32 PM »

All the details you could ever want and more:

http://www.anandtech.com/show/3809/nvidias-geforce-gtx-460-the-200-king

In short, the new GF104 chip (as used in the GeForce460) has the same computing capabilities as the older GF104 chip (as used in GeForce 465 and upwards). The only difference is that GF104 has a slightly lower capacity in a few areas, but not in any fundamentally significant ways.

The main differences are related to silicon fabrication technology (which is good for power consumption, and good for price due to better silicon yield) and the fact that the GF104 is the first superscalar GPU. Being superscalar means the GF104 implements another type of parallelism that has been around in CPUs since 1992 (IBM's single chip RIOS implementation as a commercial product, better known as POWER1), but not yet in GPUs until today. The end result is that GF104 is considerably smarter utilizing its computational resources, which helps both absolute performance and performance per Watt.

The GF104 is not the new king of the hill, but finally closes the gap to AMD, and re-establishes a little bit of a technological advantage at Nvidia. Superscalarism should be a significant step for GPU computing, because it is a stepping stone to dynamic out of order execution.

At the $200 price point, the GeForce460 is now the best alternative. But if you can afford to spend more, or have less to spend, AMD/ATI may have the better options for you.
Logged
ker2x
Fractal Molossus
**
Posts: 795


WWW
« Reply #8 on: July 13, 2010, 03:24:21 PM »

All the details you could ever want and more:

http://www.anandtech.com/show/3809/nvidias-geforce-gtx-460-the-200-king

In short, the new GF104 chip (as used in the GeForce460) has the same computing capabilities as the older GF104 chip (as used in GeForce 465 and upwards). The only difference is that GF104 has a slightly lower capacity in a few areas, but not in any fundamentally significant ways.

The main differences are related to silicon fabrication technology (which is good for power consumption, and good for price due to better silicon yield) and the fact that the GF104 is the first superscalar GPU. Being superscalar means the GF104 implements another type of parallelism that has been around in CPUs since 1992 (IBM's single chip RIOS implementation as a commercial product, better known as POWER1), but not yet in GPUs until today. The end result is that GF104 is considerably smarter utilizing its computational resources, which helps both absolute performance and performance per Watt.

The GF104 is not the new king of the hill, but finally closes the gap to AMD, and re-establishes a little bit of a technological advantage at Nvidia. Superscalarism should be a significant step for GPU computing, because it is a stepping stone to dynamic out of order execution.

At the $200 price point, the GeForce460 is now the best alternative. But if you can afford to spend more, or have less to spend, AMD/ATI may have the better options for you.

very interesting article.

But i'm confused :
- The 460 1GB is much, much, much better than the 460 756MB
- The 460 seems to be better than the 465
- the GF104 seems to be better than the GF100.

But... isn't the overpriced GTX480 based on a GF100 ? (and not a GF104)
What's the point in buying a GF480 then ? (other han saving money, of course)

thx. (PS: tomorrow is a non-working day here, i'll work on OpenCL buddhabrot  grin )
Logged

often times... there are other approaches which are kinda crappy until you put them in the context of parallel machines
(en) http://www.blog-gpgpu.com/ , (fr) http://www.keru.org/ ,
Sysadmin & DBA @ http://www.over-blog.com/
hobold
Fractal Bachius
*
Posts: 573


« Reply #9 on: July 13, 2010, 05:34:05 PM »

The only point in buying a GF100 is if you desperately need double precision and don't care about the downsides (price, power).

I don't want to incite a flame war, so please do not read the following as  favouritism or prejudice. The fact of the matter is that Nvidia made a few unfortunate design decisions in the original GF 100 (Fermi) that came back to bite them. The newer GF 104 fixes most of those issues, and is a bit scaled down to better target the mass market instead of the computing high end. For example, GF 104 lacks ECC protected memory which GF 100 had, but was really overkill for graphics (and even four our fractal purposes, an incorrect pixel here or there doesn't matter all that much). AMD/ATI played it safe by aiming lower, and was lucky enough to take the crown for the moment. The real losers are in the high performance computing market, because now there is no GPU product anymore that tries to match the level of reliability of good server hardware.

Rumour has it that GF 100 is already out of production after a rather small run of a few thousand chips. The newer GF 104 is the immediate future, and will probably see noteworthy clock speed increases during its product life cycle. The slightly weird distinction between GeForce460 with 786MB and 1024MB is probably just a result of Nvidia being overly cautious right now. But once production has ramped up, we might see Nvidia attacking AMD more aggressively. And we can hope that there will be a bit of a price war between the two. smiley
Logged
ker2x
Fractal Molossus
**
Posts: 795


WWW
« Reply #10 on: July 13, 2010, 05:36:45 PM »

There is a way to generate random number (non-crypto) in the GPU :

Parallel Random Number Generation Using OpenMP, OpenCL and PGI Accelerator Directives
http://www.pgroup.com/lit/articles/insider/v2n2a4.htm

The openCL implementation show : throughput = 3476.850472 [MB/s]
Hum.... should be enough  grin

While the mersenne twister is not suitable for cryptography (predictability), It has a very long period of 2^19937 − 1, should be enough for our usage smiley

Edit : and another one http://forums.nvidia.com/index.php?showtopic=101390
Edit2 : and an exemple code directly from NVidia http://developer.download.nvidia.com/compute/cuda/3_0/sdk/website/OpenCL/website/samples.html#oclMersenneTwister
« Last Edit: July 13, 2010, 05:57:45 PM by ker2x » Logged

often times... there are other approaches which are kinda crappy until you put them in the context of parallel machines
(en) http://www.blog-gpgpu.com/ , (fr) http://www.keru.org/ ,
Sysadmin & DBA @ http://www.over-blog.com/
cbuchner1
Fractal Phenom
******
Posts: 443


« Reply #11 on: July 13, 2010, 07:46:40 PM »

The only point in buying a GF100 is if you desperately need double precision and don't care about the downsides (price, power).

I am certainly not flaming, but having 480 cores (GTX 480) vs. 240 (GTX 285) may be a valid argument too.

And the support for function pointers, recursion, new/delete operators (in one of the upcoming CUDA toolkits) which clearly grants the programmer more options in algorithm design... Especially the recursion could be useful with fractals.
« Last Edit: July 13, 2010, 08:08:58 PM by cbuchner1 » Logged
ker2x
Fractal Molossus
**
Posts: 795


WWW
« Reply #12 on: July 13, 2010, 09:42:12 PM »

my port to Buddhabroth-openCL is going well, but i need help with this please : http://www.fractalforums.com/programming/need-help-to-convert-(abs(-1-0-sqrt(1-(4*c))-)-to-c/
Logged

often times... there are other approaches which are kinda crappy until you put them in the context of parallel machines
(en) http://www.blog-gpgpu.com/ , (fr) http://www.keru.org/ ,
Sysadmin & DBA @ http://www.over-blog.com/
ker2x
Fractal Molossus
**
Posts: 795


WWW
« Reply #13 on: July 13, 2010, 10:59:41 PM »

i finally have a working openCL buddhabroth. (inspired from my fortran code)

Not efficient or interesting, but it works \o/



Here is the (still ugly and undocumented) openCL code (good luck to decypher it) :

I'll happily help to decypher, comment, and accept any (good or bad) critics.

Code:

bool isInMSet(
  float cr,
  float ci,
  const unsigned int maxIter,
  const float escapeOrbit
)
{
    int iter = 0;
    float zr = 0.0;
    float zi = 0.0;
    float zr2 = zr * zr;
    float zi2 = zi * zi;
    float temp = 0.0;

    //Check if c is in the 2nd order period bulb.
    if( sqrt( ((cr+1.0) * (cr+1.0)) + (ci * ci) ) < 0.25 )
    {
        return true;
    }

    //Check if c is in the main cardiod
    //IF ((ABS( 1.0 - SQRT(1-(4*c)) ))  < 1.0 ) THEN RETURN TRUE (main cardioid)

     while ( (iter < maxIter) && ((zr2 + zi2) < escapeOrbit) )
    {
        temp = zr * zi;
        zr2 = zr * zr;
        zi2 = zi * zi;
        zr = zr2 - zi2 + cr;
        zi = temp + temp + ci;
        iter++;
    }

    if ( iter < maxIter )
    {
        return false;
    } else {
        return true;
    }

}   

__kernel void mandelbrot(
  const float realMax,
  const float imaginaryMax,
  const float realMin,
  const float imaginaryMin,
  const unsigned int maxIter,
  const unsigned int escapeOrbit,
  const unsigned int hRes,
  __global int* outputi
) {

  const int xId = get_global_id(0);
  const int yId = get_global_id(1);
  const int maxX = get_global_size(0);
  const int maxY = get_global_size(1);

  float deltaReal = (realMax - realMin) / (maxX - 1);
  float deltaImaginary = (imaginaryMax - imaginaryMin) / (maxY - 1);

  float realPos = realMin + (xId * deltaReal);
  float imaginaryPos = imaginaryMin + (yId * deltaImaginary);

  float real = realPos;
  float imaginary = imaginaryPos;

  if(isInMSet(real, imaginary, maxIter, escapeOrbit) == false)
  {

    int iter = 0;
    float zr = 0.0;
    float zi = 0.0;
    float zr2 = zr * zr;
    float zi2 = zi * zi;
    float cr = real;
    float ci = imaginary;
    float temp = 0.0;
   
    while ((iter < maxIter))
    {
        temp = zr * zi;
        zr2 = zr * zr;
        zi2 = zi * zi;
        zr = zr2 - zi2 + cr;
        zi = temp + temp + ci;
        int x = (int)(maxX * (zr - realMin) / (realMax - realMin));
        int y = (int)(maxY * (zi - imaginaryMin) / (imaginaryMax - imaginaryMin));
       
        if( (x > 0) && (y > 0) && (x < maxX) && (y < maxY) && (iter > 2))
        {
            outputi[(y * hRes) + x] += 1;
        }
        iter++;
    }

 
  }

}

Logged

often times... there are other approaches which are kinda crappy until you put them in the context of parallel machines
(en) http://www.blog-gpgpu.com/ , (fr) http://www.keru.org/ ,
Sysadmin & DBA @ http://www.over-blog.com/
cbuchner1
Fractal Phenom
******
Posts: 443


« Reply #14 on: July 13, 2010, 11:33:53 PM »

i finally have a working openCL buddhabroth. (inspired from my fortran code)

Nice code, but we're not cooking soup here ( http://en.wikipedia.org/wiki/Brothevil
Logged
Pages: [1] 2 3 ... 7   Go Down
  Print  
 
Jump to:  

Related Topics
Subject Started by Replies Views Last post
Interpolations in Buddhabrot General Discussion woronoi 4 3670 Last post November 07, 2016, 09:40:52 AM
by woronoi
Buddhabrot everywhere !! Images Showcase (Rate My Fractal) ker2x 0 698 Last post September 13, 2016, 05:42:09 PM
by ker2x
buddhabrot x20838019 Images Showcase (Rate My Fractal) ker2x 0 639 Last post September 20, 2016, 09:58:04 PM
by ker2x
Buddhabrot Mag(nifier) - A realtime buddhabrot zoomer Announcements & News « 1 2 3 4 » Sharkigator 46 18666 Last post September 30, 2017, 11:26:53 AM
by Sharkigator
just another buddhabrot Still Frame - Wildstyle claude 0 957 Last post June 20, 2017, 08:43:10 PM
by claude

Powered by MySQL Powered by PHP Powered by SMF 1.1.21 | SMF © 2015, Simple Machines

Valid XHTML 1.0! Valid CSS! Dilber MC Theme by HarzeM
Page created in 0.379 seconds with 24 queries. (Pretty URLs adds 0.02s, 2q)