The code has to be compiled on every different GPU (again: in OpenCL. CUDA is more portable, but works only on NVIDIA's GPUs)
This statement is a bit ... imprecise.
OpenCL's default way of delivering GPU code is as C99 (more precisely "OpenCL C", which is a slightly reduced C99 plus some extensions) source code. CUDA's default way of delivering GPU code is as precompiled bytecode, but targeted at a specific GPU feature set (which Nvidia calls "compute capability level", that closely tracks the historical generations of Nvidia GPUs).
OpenCL C will be compiled at runtime of the application for whatever compute device is present, be it a multicore CPU, a GPU by either AMD or Nvidia or ARM or Qualcomm ..., or a non-graphical compute accelerator (like the SPUs in a PlayStation3), or even an FPGA (field programmable gate array). CUDA C will not be compiled at runtime, but a CUDA application can pack several versions of bytecode for various different Nvidia GPUs.
Having acquainted myself with both programming platforms in the recent weeks, I cannot name a clear winner. Both have unique strengths:
- CUDA is easier to learn, faster to get something up and running, and has more mature tools
- OpenCL is more portable, the limitations of OpenCL C force you to better understand the limits of GPUs and program accordingly, and having a compiler available at runtime can be very powerful (very dangerous, too, if you consider malware)
To anyone out there thinking about getting their feet wet with GPGPU programming, I recommend starting with CUDA if you happen to have an Nvidia GPU. In the long run you might want to write your GPU functions more minimalistically, OpenCL style, and even switch over to OpenCL for portability and to use all processors (CPU + GPU) in a machine.
Those who want to start out with OpenCL (or have no choice) should find some good example code and tweak that. Alternatively, there exist a number of wrappers for OpenCL that reduce the amount of "boilerplate code". But this might give up some amount of portability, especially in the case of platform specific wrappers like, say, Apple's under MacOS X.
I will use OpenCL for my own projects, but I have had the luxury of starting out with CUDA. Starting from OpenCL directly would have been harder.