ker2x,
Thanks for the description. Does the NVIDA book cover OpenCL?
Only a small chapter.
If you follow the NVidia book chapter by chapter (and you should) you learn CUDA first, then there is a chapter that explain how to write an OpenCL code according to what you wrote in CUDA.
It's not so bad, if you didn't knew OpenCL first.
The problem is : there is a *lot* to remember and the names are sometimes very confusing, and the naming differences between OpenCL and CUDA even more confusing.
Beside that, it's the very same architecture, you write the very same C code for the Cuda/OpenCL Kernel and function, and the optimisation tips and tricks are exactly the sames.
OpenCL and CUDA are just 2 differents API to "talk" to the GPU Card. But the code you upload to the GPU is the same code for both, it's only the "host" (cpu) code that change.
And is there any built-in support for higher precision math?
i don't know any useable arbitrary precision lib for gpu yet. you should probably take a close look to
http://www.mpir.org/I really want a card that supports double precision in hardware. However, Apple is really terrible about video card support. They support only a couple of cards for any given machine, and they are always yesterday's middle-of-the-road cards.
what you (usually) want is the latest generation of GPU (with latest feature, eg: double precision) , not the fastest card.
I'm perfectly happy developping GPU code on my laptop powered by a ION2.
I'm not buying a $150 latest generation GPU card for my desktop because i'm a gamer and my happier with my old highend 8800GTX than a last-generation low-end GFX Card.
If you're not gaming (or happy with gaming on apple hardware (lol?)) i suggest to buy a low end PC with a low-end last generation gfx card, and here you go !
On another subject, from the reading I've done about GPGPU, they are not great for fractal calculations because they are highly tuned to SIMD operations. How do you handle situations where each GPU needs to run a different code-path in order to iterate a separate pixel?
I didn't know any nice way to avoid that problem. So ... Bruteforce and confidence into the GPU Scheduler.
As far as i understood, different code-path do not block all the others threads just 1 cuda core (middle-range and high-end card have between 256 to 512 Cores)
Considering our (fractals) problem, you will not have 100% GPU occupancy, live with it.
But even a 50% Occupancy is still incredibely fast