woops, i didn't notice there was reply. Probably because i didn't expect any
Sooo ....
@quaz0r : yes, the website isn't coder friendly and the documentation isn't always clear.
my impression is that these guys have created libraries that encapsulate array management (and a bunch of other stuff) that coders would otherwise have to do "by hand" anyways and put it all into convenient libraries so that you can invest time developing ideas instead of wasting time coding the lowlevel stuff
@3dickulus : YES !
I'll try to explain :
- you don't write cuda/opencl kernel. (you can, there is a documentation about interoperability between arrayfire and your regular cuda/opencl code)
- It's not a software that "convert" regular code to something that run on GPU.
- It's all about array.
This code, from the link in my first post :
array mandelbrot(const array &in, int iter, float maxval)
{
array C = in;
array Z = C;
array mag = constant(0, C.dims());
for (int ii = 1; ii < iter; ii++) {
// Do the calculation
Z = Z * Z + C;
// Get indices where abs(Z) crosses maxval
array cond = (abs(Z) > maxval).as(f32);
mag = af::max(mag, cond * ii);
// If abs(Z) cross maxval, turn off those locations
C = C * (1 - cond);
Z = Z * (1 - cond);
// Ensuring the JIT does not become too large
C.eval();
Z.eval();
}
// Normalize
return mag / maxval;
}
Will run on the gpu. It's still specialized code and in no way general purpose code. if you don't use af::methods it won't run on the gpu. arrayfire is just a library. But a cool one
The cool stuff :
- You can create (bloated) "unified" binary that will try, in order, to use the following devices : CUDA -> OpenCL -> CPU. It try to detect and use your cuda device. If the user don't have any it will try openCL device. If it still can't it will fallback to CPU only. You don't want to do this manually (you sure can but you write your code 3 times)
- Device detection & initialization is painless. You can do some custom stuff but you can also pretty much ignore it and let the lib do its stuff and hope for the best (works for me)
- lots of commonly used set of methods (computer vision, neural networks, random number generation, linear algebra, signal processing, image processing, statistics, ...).
- eg : random number generation will use CuRand if a cuda device is detected.
- provide a very simple (too limited, imho, visualisation only, no interaction) set of graphics methods : open window, 2D/3D plots, histograms, ...
I'm pretty sure it's not as efficient as handcrafted cuda/opencl kernel code.
But ... writing useable & efficient cuda/opencl code can be a major pain in the back.
Writing this "unified binary" thingy *is* a major pain in the proverbial anatomy.
imho, it worth it to try arrayfire first for new code then fallback to handcrafted kernel code if arrayfire is doing it wrong.
As someone who like to write ASM code for fun, i know that handcrafted code isn't always faster
That's pretty much it.
- Write a few line to initialize your cuda/opencl/cpu device
- write your af::array stuff
- use
http://www.arrayfire.com/docs/interop_cuda.htm and/or
http://www.arrayfire.com/docs/interop_opencl.htm for your handcrafted kernel code as needed (but you loose the unified binary thingy as soon as you do it)
At the very least, it's a cool (but bloated) wrapper that still let you use your handcrafted kernel.
At the very best, it's what's written on the website and it just works. Sometime a x10 speedup is enough and you don't always need more at the cost of extra development time.
That's why we have stuff like java, c#, ruby, python, php, you-name-it : they are usually slower than C/fortran/asm but your development time is reduced.