ker2x
Fractal Molossus
 
Posts: 795
|
 |
« Reply #15 on: July 13, 2010, 11:41:45 PM » |
|
woops  )) BTW... here is a nicer buddhabro t
|
|
|
Logged
|
|
|
|
cbuchner1
|
 |
« Reply #16 on: July 13, 2010, 11:45:17 PM » |
|
really much better picture, but where does the asymmetry come from?
|
|
|
Logged
|
|
|
|
ker2x
Fractal Molossus
 
Posts: 795
|
 |
« Reply #17 on: July 14, 2010, 12:10:35 AM » |
|
really much better picture, but where does the asymmetry come from?
huhhh weird... i don't know. It may come from here : x = (int)(maxX * (zr - realMin) / (realMax - realMin)); y = (int)(maxY * (zi - imaginaryMin) / (imaginaryMax - imaginaryMin)); i'm not sure, but using (int) the fractional part may be discarded (truncated) instead of being rounded to the nearest integer. I check that problem. (i have a bigger problem with "Out of ressource exception" when i have too many loops, but i need to read much more doc to understand that problem, so i check the problem you found now) Edit : nope, it's something else Edit : problem solved : const int maxX = get_global_size(0)-1; const int maxY = get_global_size(1)-1; intead of : const int maxX = get_global_size(0); const int maxY = get_global_size(1);
|
|
« Last Edit: July 14, 2010, 12:30:01 AM by ker2x »
|
Logged
|
|
|
|
cbuchner1
|
 |
« Reply #18 on: July 14, 2010, 12:23:08 AM » |
|
huhhh weird... i don't know.
after reading the discussion here: http://erleuchtet.org/2010/07/ridiculously-large-buddhabrot.html I think the bright circles in your image are mostly caused by excape trajectories with very long orbits. And you may not run enough input samples to get a good "average" of all possible orbits. So a few orbits will stand out (all these bright "curls" in the image) If you happen to sample in a regular grid (instead of a random sampling) and this grid is not perfectly symmetrical on the imaginary axis, maybe that would be causing the asymmetry.
|
|
|
Logged
|
|
|
|
ker2x
Fractal Molossus
 
Posts: 795
|
 |
« Reply #19 on: July 14, 2010, 12:32:59 AM » |
|
huhhh weird... i don't know.
after reading the discussion here: http://erleuchtet.org/2010/07/ridiculously-large-buddhabrot.html I think the bright circles in your image are mostly caused by excape trajectories with very long orbits. And you may not run enough input samples to get a good "average" of all possible orbits. So a few orbits will stand out (all these bright "curls" in the image) If you happen to sample in a regular grid (instead of a random sampling) and this grid is not perfectly symmetrical on the imaginary axis, maybe that would be causing the asymmetry. Yes, i haven't implemented the RNG in openCL yet. i directly map pixel -> complex plane coordinate. See my edit above : the grid wasn't symetrical because i assumed that maxX was equal to the number pixel on X. but it's number of pixel - 1. (from 0 to size-1) Still not that... i'll see tomorrow. i need to solve this problem of "out of ressource exception".
|
|
« Last Edit: July 14, 2010, 01:19:01 AM by ker2x »
|
Logged
|
|
|
|
ker2x
Fractal Molossus
 
Posts: 795
|
 |
« Reply #20 on: July 14, 2010, 01:50:43 PM » |
|
The latest version of a working openCL code : //A function to check if the choosen point is in the mandelbrot set bool isInMSet( float cr, float ci, const unsigned int maxIter, const float escapeOrbit ) { int iter = 0; float zr = 0.0; float zi = 0.0; float zr2 = zr * zr; float zi2 = zi * zi; float temp = 0.0;
//Quick rejection check if c is in the 2nd order period bulb. if( sqrt( ((cr+1.0) * (cr+1.0)) + (ci * ci) ) < 0.25 ) { return true; }
//Quick rejection check if c is in the main cardiod //IF ((ABS( 1.0 - SQRT(1-(4*c)) )) < 1.0 ) THEN RETURN TRUE (main cardioid) float tempi = ci*(-4.0); float tempr = 1.0 - cr*4.0; float theta = atan2(tempi,tempr)/2.0; float r = pow((tempr*tempr + tempi*tempi),0.25); tempr = 1.0 - r * cos(theta); tempi = -r * sin(theta); if( (tempr * tempr + tempi * tempi) < 1.0) { return true; }
//Bruteforce check if c is escaping escapeOrbit (with a good old iteration up to maxIter) while( (iter < maxIter) && ((zr2 + zi2) < escapeOrbit) ) { temp = zr * zi; zr2 = zr * zr; zi2 = zi * zi; zr = zr2 - zi2 + cr; zi = temp + temp + ci; iter++; }
if ( iter < maxIter ) { return false; } else { return true; }
}
//Main kernel
__kernel void buddhabrot( const float realMax, const float imaginaryMax, const float realMin, const float imaginaryMin, const unsigned int maxIter, const unsigned int escapeOrbit, const unsigned int hRes, const float offset, __global int* outputi ) {
const int xId = get_global_id(0); const int yId = get_global_id(1); const int offsetStep = get_global_id(2);
const int maxX = get_global_size(0); const int maxY = get_global_size(1);
const float deltaReal = (realMax - realMin) / (maxX - 1); const float deltaImaginary = (imaginaryMax - imaginaryMin) / (maxY - 1);
float cr = realMin + (xId * deltaReal) + (offsetStep * offset); float ci = imaginaryMin + (yId * deltaImaginary);
int iter = 0; float zr = 0.0; float zi = 0.0; float zr2 = zr * zr; float zi2 = zi * zi; float temp = 0.0; int x, y; if(isInMSet(cr, ci, maxIter, escapeOrbit) == false) { iter = 0; zr = 0.0; zi = 0.0; zr2 = zr * zr; zi2 = zi * zi; temp = 0.0; while ((iter < maxIter) && ((zr2 + zi2) < escapeOrbit) ) { temp = zr * zi; zr2 = zr * zr; zi2 = zi * zi; zr = zr2 - zi2 + cr; zi = temp + temp + ci; x = (maxX * (zr - realMin) / (realMax - realMin)); y = (maxY * (zi - imaginaryMin) / (imaginaryMax - imaginaryMin)); if( (x > 0) && (y > 0) && (x < maxX) && (y < maxY) && (iter > 2)) { outputi[(y * hRes) + x] += 1; } iter++; } //EndWhile } //EndIf } //EndKernel
I still have this out of ressource problem returned by "clEnqueueReadBuffer" when the openCL code take to much time to process. It's seems to be a NVIDIA/Windows problem. According to what i understand, windows think that the nvidia driver is frozen and it restart the driver. I'm planning to code the random number generator, then do something like that : http://www.yakiimo3d.com/2010/03/29/dx11-directcompute-buddhabrot-nebulabrot/Refresh the display every seconds, or something like that, so it won't timeout.
|
|
|
Logged
|
|
|
|
hobold
Fractal Bachius

Posts: 573
|
 |
« Reply #21 on: July 14, 2010, 03:21:16 PM » |
|
The only point in buying a GF100 is if you desperately need double precision and don't care about the downsides (price, power).
I am certainly not flaming, but having 480 cores (GTX 480) vs. 240 (GTX 285) may be a valid argument too. And the support for function pointers, recursion, new/delete operators (in one of the upcoming CUDA toolkits) which clearly grants the programmer more options in algorithm design... Especially the recursion could be useful with fractals. To clarify, my comparison was between GeForce 480 (GF100 chip) and GeForce 460 (GF104 chip). The GeForce 285 (GT200 chip) is an older generation and lacks capabilities such as those you mentioned from the latest CUDA version.
|
|
|
Logged
|
|
|
|
cbuchner1
|
 |
« Reply #22 on: July 14, 2010, 03:35:44 PM » |
|
To clarify, my comparison was between GeForce 480 (GF100 chip) and GeForce 460 (GF104 chip). The GeForce 285 (GT200 chip) is an older generation and lacks capabilities such as those you mentioned from the latest CUDA version.
Ah, ok I misunderstood what you meant. Check out this thread: http://forums.nvidia.com/index.php?s=ef4693f2411102e7259ba57bdba8f89f&showtopic=173877&pid=1087178&st=0&#entry1087178"Double precision [on GTX 460] is 1/6th of the FP32 performance which is better than the 1/8th performance on the GTX470/480." Unless you get a Tesla based on GF100 (which has all Double Precision ALUs enabled), the GF104 might be a better deal when you intend to run double precision arithmetics. The ratio of enabled double precision ALUs to CUDA cores has been improved.
|
|
|
Logged
|
|
|
|
ker2x
Fractal Molossus
 
Posts: 795
|
 |
« Reply #23 on: July 14, 2010, 03:56:09 PM » |
|
Currently working on the mersenne twister implementation... a real pain in the *ss 
|
|
|
Logged
|
|
|
|
cbuchner1
|
 |
« Reply #24 on: July 14, 2010, 05:21:11 PM » |
|
Currently working on the mersenne twister implementation... a real pain in the *ss  I might try this three-step approach in CUDA. Maybe I'll even draw the random numbers on the CPU. The speed-up of moving this to the GPU might be negligible. a) find good candidates (i.e. very long escape trajectories, store the corresponding starting coordinates and orbit length) b) sort this list according to descending orbit length c) process trajectories of comparable length in the same work units ("blocks" in CUDA) with b+c) I can guarantee that all threads belonging to the same work unit will roughly terminate simultaneously. This may give a performance boost because no threads will be idling. My GTX 460 is waiting for me in the mail... UPDATE: what a pity! It's not working properly in my Windows XP Prof. 64bit machine. The driver does not initialize the card properly. Hmm...
|
|
« Last Edit: July 14, 2010, 11:35:49 PM by cbuchner1 »
|
Logged
|
|
|
|
ker2x
Fractal Molossus
 
Posts: 795
|
 |
« Reply #25 on: July 15, 2010, 01:18:10 AM » |
|
I rewrote the whole app. i use System.Windows.Forms instead of OpenTK (but still use Cloo for OpenCL). The mersenne twister is still a work in progress, much harder than expected. 
|
|
|
Logged
|
|
|
|
|
ker2x
Fractal Molossus
 
Posts: 795
|
 |
« Reply #27 on: July 16, 2010, 12:33:38 AM » |
|
i still have a lot of problems to implements de RNG. So i implemented it on the cpu host for now. It not yet optimised (by far!), but this result take ~5s I'm sure i can do much better  
|
|
|
Logged
|
|
|
|
cbuchner1
|
 |
« Reply #28 on: July 16, 2010, 12:45:36 AM » |
|
Splendid! Now we need the nebula color scheme.
|
|
|
Logged
|
|
|
|
kram1032
|
 |
« Reply #29 on: July 16, 2010, 12:59:39 AM » |
|
getting better 
|
|
|
Logged
|
|
|
|
|