Logo by AGUS - Contribute your own Logo!

END OF AN ERA, FRACTALFORUMS.COM IS CONTINUED ON FRACTALFORUMS.ORG

it was a great time but no longer maintainable by c.Kleinhuis contact him for any data retrieval,
thanks and see you perhaps in 10 years again

this forum will stay online for reference
News: Follow us on Twitter
 
*
Welcome, Guest. Please login or register. November 20, 2025, 10:40:35 AM


Login with username, password and session length


The All New FractalForums is now in Public Beta Testing! Visit FractalForums.org and check it out!


Pages: [1]   Go Down
  Print  
Share this topic on DiggShare this topic on FacebookShare this topic on GoogleShare this topic on RedditShare this topic on StumbleUponShare this topic on Twitter
Author Topic: Speeding up Nebulabrot  (Read 3732 times)
0 Members and 1 Guest are viewing this topic.
PurpleBlu3s
Alien
***
Posts: 28



« on: December 12, 2011, 09:59:00 AM »

Hi, after a break from fractal generating, I've gone back to my Buddhabrot/Nebulabrot program and am looking to speed it up. As far as I can see it is not possible to produce a sort of generic data set that produces Nebulabrot using different colouring - the idea being to do the long generation once, then apply different colours instead of having to generate the whole fractal every time I want to try a different colour scheme. If there is a way to do this I would greatly appreciate guidance!

Additionally, I would like to know whether it's worth porting my code to C, as it's currently written in Java, which isn't a fast language. I don't really know how to generate images in C though, so any pointers (forgive the pun) would be appreciated.

Just as a benchmark, my Java program takes about 16 hours to produce a Nebulabrot with 8000 iterations, 1 billion orbits, 2400x2400 pixels - the image: http://dl.dropbox.com/u/16319566/8000it-1borbits-01c.png.

Thanks in advance for any help.
Logged
huminado
Guest
« Reply #1 on: December 12, 2011, 07:07:20 PM »

if you are already using multithreading and still not happy with performance, porting to C is probably a good idea.
Logged
cKleinhuis
Administrator
Fractal Senior
*******
Posts: 7044


formerly known as 'Trifox'


WWW
« Reply #2 on: December 12, 2011, 09:57:40 PM »

you know, most improvements gain is sure by rendering on gpu, cheesy
and you could keep your java base ...
Logged

---

divide and conquer - iterate and rule - chaos is No random!
PurpleBlu3s
Alien
***
Posts: 28



« Reply #3 on: December 13, 2011, 01:19:05 PM »

you know, most improvements gain is sure by rendering on gpu, cheesy
and you could keep your java base ...


I have no clue how to do that. Where do I start?
Logged
cKleinhuis
Administrator
Fractal Senior
*******
Posts: 7044


formerly known as 'Trifox'


WWW
« Reply #4 on: December 13, 2011, 02:41:45 PM »

check out fragmentarium
http://syntopia.github.com/Fragmentarium/

he is a member on this forum too, you need a gpu enabled graphics card, but most of nowadays cards support it ....
Logged

---

divide and conquer - iterate and rule - chaos is No random!
Syntopia
Fractal Molossus
**
Posts: 681



syntopiadk
WWW
« Reply #5 on: December 13, 2011, 08:03:01 PM »

Hi, I think there is a thread on this: http://www.fractalforums.com/programming/buddhabrot-on-gpu/

I'm don't think the approach used in Fragmentarium will help here: for something to run efficiently on a GPU you must be able to divide the job into many separate tasks (threads). For a GPU, the number of threads must be even larger than the number of computational cores, to hide the different kind of latencies. This means 1000+ of tasks on a modern GPU.  In Fragmentarium each pixel  is calculated in its own thread - which is the dream job for a GPU - roughly a million threads, which can be executed independently. THis is erfect for ray marched, distance estimated 3D fractals, but I don't think it is easy to frame the Buddabrot formula in this setting.
Logged
ker2x
Fractal Molossus
**
Posts: 795


WWW
« Reply #6 on: December 21, 2011, 01:35:01 AM »

Hi, after a break from fractal generating, I've gone back to my Buddhabrot/Nebulabrot program and am looking to speed it up. As far as I can see it is not possible to produce a sort of generic data set that produces Nebulabrot using different colouring - the idea being to do the long generation once, then apply different colours instead of having to generate the whole fractal every time I want to try a different colour scheme. If there is a way to do this I would greatly appreciate guidance!

Additionally, I would like to know whether it's worth porting my code to C, as it's currently written in Java, which isn't a fast language. I don't really know how to generate images in C though, so any pointers (forgive the pun) would be appreciated.

Just as a benchmark, my Java program takes about 16 hours to produce a Nebulabrot with 8000 iterations, 1 billion orbits, 2400x2400 pixels - the image: http://dl.dropbox.com/u/16319566/8000it-1borbits-01c.png.

Thanks in advance for any help.

Feel free to take a look at our (emmmile and me) code/work on buddha++ (and use the part of the code in your own software if you wish) : https://github.com/emmmile/buddha

Logged

often times... there are other approaches which are kinda crappy until you put them in the context of parallel machines
(en) http://www.blog-gpgpu.com/ , (fr) http://www.keru.org/ ,
Sysadmin & DBA @ http://www.over-blog.com/
ker2x
Fractal Molossus
**
Posts: 795


WWW
« Reply #7 on: December 21, 2011, 01:45:01 AM »

Hi, I think there is a thread on this: http://www.fractalforums.com/programming/buddhabrot-on-gpu/

I'm don't think the approach used in Fragmentarium will help here: for something to run efficiently on a GPU you must be able to divide the job into many separate tasks (threads). For a GPU, the number of threads must be even larger than the number of computational cores, to hide the different kind of latencies. This means 1000+ of tasks on a modern GPU.  In Fragmentarium each pixel  is calculated in its own thread - which is the dream job for a GPU - roughly a million threads, which can be executed independently. THis is erfect for ray marched, distance estimated 3D fractals, but I don't think it is easy to frame the Buddabrot formula in this setting.


That's correct.

Below is an openCL implementation of the buddhabrot. but this is inefficient. The memory latency of the GPU (and branching) is a huge bottleneck and there isn't anything you can do about it.
The threads can write anywhere (and it's not predictible, of course) on the global memory, and the global memory latency kill the performances.

Code:
bool isInMSet(
    float cr,
    float ci,
    const uint maxIter,
    const float escapeOrbit)
{
    int iter = 0;
    float zr = 0.0;
    float zi = 0.0;
    float ci2 = ci*ci;
    float temp;

    //Quick rejection check if c is in 2nd order period bulb
    if( (cr+1.0) * (cr+1.0) + ci2 < 0.0625) return true;

    //Quick rejection check if c is in main cardioid
    float q = (cr-0.25)*(cr-0.25) + ci2;
    if( q*(q+(cr-0.25)) < 0.25*ci2) return true;


    // test for the smaller bulb left of the period-2 bulb
    if (( ((cr+1.309)*(cr+1.309)) + ci*ci) < 0.00345) return true;

    // check for the smaller bulbs on top and bottom of the cardioid
    if ((((cr+0.125)*(cr+0.125)) + (ci-0.744)*(ci-0.744)) < 0.0088) return true;
    if ((((cr+0.125)*(cr+0.125)) + (ci+0.744)*(ci+0.744)) < 0.0088) return true;

    while( (iter < maxIter) && ((zr*zr+zi*zi) < escapeOrbit) )
    {
        temp = zr * zi;
        zr = zr*zr - zi*zi + cr;
        zi = temp + temp + ci;
        iter++;
    }

    if( iter < maxIter)
    {
        return false;
    } else {
        return true;
    }

}


//Main kernel
__kernel void buddhabrot(
    const float realMin,
    const float realMax,
    const float imaginaryMin,
    const float imaginaryMax,
    const uint  minIter,
    const uint  maxIter,
    const uint  width,
    const uint  height,
    const float escapeOrbit,
    const uint4 minColor,
    const uint4 maxColor,
    __global float2* randomXYBuffer,
    __global uint4*  outputBuffer)
{
    float2 rand = randomXYBuffer[get_global_id(0)];   

    const float deltaReal = (realMax - realMin);
    const float deltaImaginary = (imaginaryMax - imaginaryMin);

    //mix(a,b,c) = a + (b-a)*c //(c must be in the range 0.0 ... 1.0
    //float cr = realMin + rand.x * deltaReal ;
    //float cr = realMin + (realMax - realMin) * rand.x ;
    //float ci = imaginaryMin + rand.y * deltaImaginary ;
    float cr = mix(realMin, realMax, rand.x);
    float ci = mix(imaginaryMin, imaginaryMax, rand.y);

    int x, y;
    int iter   = 0;
    float zr   = 0.0;
    float zi   = 0.0;
    float temp = 0.0;


    if( isInMSet(cr,ci, maxIter, escapeOrbit) == false)
    {   
        while( (iter < maxIter) && ((zr*zr+zi*zi) < escapeOrbit) )
        {
            temp = zr * zi;
            zr = zr*zr - zi*zi + cr;
            zi = temp + temp + ci;

            x = ((width) * (zr - realMin) / deltaReal);
            y = ((height) * (zi - imaginaryMin) / deltaImaginary);

            if( (iter > minIter) && (x>0) && (y>0) && (x<width) && (y<height) )
            {
                if( (iter > minColor.x) && (iter < maxColor.x) ) { outputBuffer[x + (y * width)].x++; }
                if( (iter > minColor.y) && (iter < maxColor.y) ) { outputBuffer[x + (y * width)].y++; }
                if( (iter > minColor.z) && (iter < maxColor.z) ) { outputBuffer[x + (y * width)].z++; }
            }
            iter++;
        }
    }
}

__kernel void xorshift(
    uint s1,
    uint s2,
    uint s3,
    uint s4,
    const int bufferSize,
    __global float2* randomXYBuffer
)
{
    uint st;
    float2 tmp;

    for(int i=0; i < bufferSize; i++)
    {
        st = s1 ^ (s1 << 11);
        s1 = s2;
        s2 = s3;
        s3 = s4;
        s4 = s4 ^ (s4 >> 19) ^ ( st ^ (st >> 18));
        tmp.x = (float)s4 / UINT_MAX;

        st = s1 ^ (s1 << 11);
        s1 = s2;
        s2 = s3;
        s3 = s4;
        s4 = s4 ^ (s4 >> 19) ^ ( st ^ (st >> 18));
        tmp.y = (float)s4 / UINT_MAX;
        randomXYBuffer[i] = tmp;

    }
}
Logged

often times... there are other approaches which are kinda crappy until you put them in the context of parallel machines
(en) http://www.blog-gpgpu.com/ , (fr) http://www.keru.org/ ,
Sysadmin & DBA @ http://www.over-blog.com/
Pages: [1]   Go Down
  Print  
 
Jump to:  

Related Topics
Subject Started by Replies Views Last post
High detail nebulabrot: Space Lotus VS Buddha Images Showcase (Rate My Fractal) aluminumstudios 2 2984 Last post July 16, 2012, 02:00:38 AM
by aluminumstudios
Tips on speeding up Scratch-based fractal programs? Programming greentexas 3 16770 Last post August 06, 2017, 08:25:16 PM
by tsl

Powered by MySQL Powered by PHP Powered by SMF 1.1.21 | SMF © 2015, Simple Machines

Valid XHTML 1.0! Valid CSS! Dilber MC Theme by HarzeM
Page created in 0.278 seconds with 24 queries. (Pretty URLs adds 0.01s, 2q)