Yesss, it works! Thank you. I tried all three modes and they worked.
Thanks! This is great information! I will fix M_PI constants problem in the code
I still have the original problem with the include paths - I have to hardcode those in the shaders. One thing I noticed, is that the path separator is a slash, instead of a Windows backslash separator. Perhaps the Nvidia compiler is more picky about this?
I will make to use only "\" for Windows version. This could be a reason why it ignores path for includes
One of the engines (the cl_engine.cl) has the following pragma, which makes Mandelbulber crash for me:
#pragma OPENCL EXTENSION cl_khr_byte_addressable_store : enable
After removing it, it seems to works. The pragma is not present in the other cl_engine's.
I left it in the code by mistake. I will remove it at all
Once in while (not reproducible), Mandelbulber crashes, with the following error:
ERROR: Context::Context() (-2)
This error means CL_DEVICE_NOT_AVAILABLE. This is strange.
If I enable Depth of Field, the rendering becomes very slow: an image taking 25s on the CPU, now takes more like 25 minutes on the GPU (I stopped it before finishing). I guess your are switching to DOF using multiple samples on the GPU?
You are right. GPU renders DOF using multiple samples. Now only this mode is available. On fast GPUs rendering time is acceptable and rendering effect is worth to wait. Here is an example:
http://krzysztofmarczak.deviantart.com/art/Mandelbox-rendered-with-OpenCL-engine-392452343Finally, there seems to be some artifacts (overstepping) when rendering Fast and Normal modes. But full looks close to CPU version.
Fast and normal modes uses simplified algorithms that's why quality is lower.
I have attached an image with some rendering examples and the rendering time (at 800x600). As can be seen the 'Full' renderer is not much faster than CPU rendering, but my graphics card (a 310M) is also a low-end card, several generations old. On Fragmentarium, with the standard DE-raytracer, a 800x600 a Mandelbulb renders around 0.15s (Fast-raytracer.frag) or 0.3s (DE-raytracer.frag), but that does not say much since the algorithms are not the same.
This is for investigation. I tested it on nVidia GeForce 9600GT (on Linux), which is not so new. Results were much better: 0.8s in fast mode and 3s in full shaders. When you checked rendering time was this a time of first render (including kernel compiling time)? If you start render first time it takes much longer than second time. It is also possible that I used some instructions in the code which are executed very slow on older cards. Is the code of Fragmentarium available to look inside? Maybe when I compare the code I will find some way to optimize my code.
On my ATI Radeon 7800 it takes 0.22s in all shaders mode and 0.085s in fast mode.
I know it's not the standard approach, but this example shows the difference in palette handling. rendered with identical settings - custom palette color speed 100
>edit
seems opencl ignores mandelbox coloring parameters...
It is because for Mandelbox I had to use diffent code in OpenCL program. The function which is in CPU version worked terribly slow on GPU (too many conditions). I use optimized version (in bold is what was changed). There was no way to use the same coloring algorithm. However I'm planning to implement also full Mandelbox formula.
Mandelbox code:
int3 cond1, cond2;
cond1 = isgreater(z, foldingLimit);
cond2 = isless(z, -foldingLimit);
z = select(z, foldingValue - z, cond1);
z = select(z, -foldingValue - z, cond2);
float rr = dot(z,z);
float m = scale;
if (rr < mr2) m = native_divide(scale, mr2);
else if (rr < fr2) m = native_divide(scale,rr);
z = Matrix33MulFloat3(consts->fractal.mandelbox.mainRot, z);
z = z * m + c;
tgladDE = tgladDE * fabs(m) + 1.0f;
r = length(z);
colourMin += fabs(m);
if(r>1024.0f)
{
distance = r / fabs(tgladDE);
out.colourIndex = colourMin / i * 300.0f;
break;
}