Work on implementation of OpenCL in Mandelbulber is in progress.
Sweet!
I'm just after first trials with this. It works pretty nice. ....but... it is much more complicated than I thought before I started. I want to share some of observations:
Negatives:
- OpenCL allows to use only C99 version of C language, which has a lot of limitations (no classes, no global variables, etc...). I have to convert everything from C++ to C99. It means that I have to rewrite every function.
It has richer native types than C99, no? Like vecN, matNxM etc. And declare where you use.
I've been playing with some C++ 'glue' that allows compiling/running of (boxplorer2) glsl (~openCL) shader code as C++. That works remarkably well, given some discipline. Could be used for debugging the shader. Or, my main goal, to have the exact same DE available to scripting in the CPU.
- Only possible to use float type variables (no doubles). It limits calculation accuracy and of course maximum zoom
Yeah, gets video card specific. And 4+x slower.
The video card / driver / OS support issues are most annoying for an application if you're not the only person using it.
- Available to use only video card built-in memory. It limits maximum image resolution in some cases. Of course it is possible to render image in smaller blocks, but then there will be not possible to render of some effects by GPU.
- Very difficult debugging of kernel program. There is no possibility to use printf() or some other functions to observe what is going on in the program. There is also no debugger. It is only possible to see compiler errors.
Neutral:
- Philosophy of writing programs is completely different. Parallel computation need different program structure than for ordinary CPUs. Program has to be compiled dynamically and build from small "bricks" to get good performance, because too long code with many branches causes smaller sizes of workgroups (slower computation).
Positives:
- Very fast computation. In some cases it works 20-30 times faster than on CPU (I'm comparing Intel Core 2 Quad 8200 and GeForce 9600 GT). On my graphics card there is possible to have up to 512 parallel threads (for longer code about 128).
- possible to use native_xxxx() math functions which are incredibly fast
- built-in many math functions and vector types
- My graphics card heats my room
Implementation of OpenCl in Mandelbulber will take a lot of time, but it is worth to do it. Preview speed is much more faster.
Real-time navigation is very addictive
Also, glsl shaders could be made to run in browsers using webGL etc. Since the fractal rendering is such a sweet spot parallel problem (low memory bandwidth, high flops, likes ray-tracing effects), it is nicely aligned with the GPU manufacturers' efforts. Performance and number of deployment platforms (pads, phones) increases will only widen the gap with pure CPU implementations.