Logo by Pauldelbrot - Contribute your own Logo!

END OF AN ERA, FRACTALFORUMS.COM IS CONTINUED ON FRACTALFORUMS.ORG

it was a great time but no longer maintainable by c.Kleinhuis contact him for any data retrieval,
thanks and see you perhaps in 10 years again

this forum will stay online for reference
News: Visit us on facebook
 
*
Welcome, Guest. Please login or register. April 25, 2024, 01:07:00 PM


Login with username, password and session length


The All New FractalForums is now in Public Beta Testing! Visit FractalForums.org and check it out!


Pages: [1]   Go Down
  Print  
Share this topic on DiggShare this topic on FacebookShare this topic on GoogleShare this topic on RedditShare this topic on StumbleUponShare this topic on Twitter
Author Topic: Mandelbulber and OpenCL  (Read 10700 times)
0 Members and 1 Guest are viewing this topic.
Buddhi
Fractal Iambus
***
Posts: 895



WWW
« on: September 26, 2011, 07:32:03 PM »

Work on implementation of OpenCL in Mandelbulber is in progress. I'm just after first trials with this. It works pretty nice. ....but... it is much more complicated than I thought before I started. I want to share some of observations:

Negatives:
- OpenCL allows to use  only C99 version of C language, which has a lot of limitations (no classes, no global variables, etc...). I have to convert everything from C++ to C99. It means that I have to rewrite every function.
- Only possible to use float type variables (no doubles). It limits calculation accuracy and of course maximum zoom
- Available to use only video card built-in memory. It limits maximum image resolution in some cases. Of course it is possible to render image in smaller blocks, but then there will be not possible to render of some effects by GPU.
- Very difficult debugging of kernel program. There is no possibility to use printf() or some other functions to observe what is going on in the program. There is also no debugger. It is only possible to see compiler errors.

Neutral:
- Philosophy of writing programs is completely different. Parallel computation need different program structure than for ordinary CPUs. Program has to be compiled dynamically and build from small "bricks" to get good performance, because too long code with many branches causes smaller sizes of workgroups (slower computation).

Positives:
- Very fast computation. In some cases it works 20-30 times faster than on CPU (I'm comparing Intel Core 2 Quad 8200 and GeForce 9600 GT). On my graphics card there is possible to have up to 512 parallel threads (for longer code about 128).
- possible to use native_xxxx() math functions which are incredibly fast
- built-in many math functions and vector types
- My graphics card heats my room  cheesy

Implementation of OpenCl in Mandelbulber will take a lot of time, but it is worth to do it. Preview speed is much more faster. I will make this step by step. In first version I will implement only few formulas and basic shaders. I hope it still will be possible to compile the program on other platforms like Windows on MacOS.

There is attached some example image rendered using Mandelbulber with OpenCL. Rendering time was 0.53s in 800x600 (with shadows and simple ambient occlusion)


* zrzut ekranu3.jpg (214.78 KB, 877x714 - viewed 835 times.)
Logged

cbuchner1
Fractal Phenom
******
Posts: 443


« Reply #1 on: September 26, 2011, 08:05:41 PM »


I am utterly impressed by the energy you put into this.

Compared to OpenCL, CUDA would offer a some advantages. It allows to define your own classes that are able to provide custom maths implementations. With operator overloading the arithmetics programming is very natural. Template metaprogramming is also possible. But the major disadvantage is that CUDA is tied to nVidia hardware.

Alternative arithmetics would be possible in OpenCL by defining custom maths functions that operate on your own data types (e.g. bignums, or double floats that emulate double precision)  using a pure "C" style API. That would be comparable to how GMP or MPIR operate.
Logged
fractower
Iterator
*
Posts: 173


« Reply #2 on: September 26, 2011, 08:18:09 PM »

If history is any indication. Speeding up the calculations will not decrease the time it takes to produce a fractal. Instead it will open up new spaces for exploration.
Logged
marius
Fractal Lover
**
Posts: 206


« Reply #3 on: September 26, 2011, 09:29:27 PM »

Work on implementation of OpenCL in Mandelbulber is in progress.
Sweet!
Quote
I'm just after first trials with this. It works pretty nice. ....but... it is much more complicated than I thought before I started. I want to share some of observations:

Negatives:
- OpenCL allows to use  only C99 version of C language, which has a lot of limitations (no classes, no global variables, etc...). I have to convert everything from C++ to C99. It means that I have to rewrite every function.
It has richer native types than C99, no? Like vecN, matNxM etc. And declare where you use.
I've been playing with some C++ 'glue' that allows compiling/running of (boxplorer2) glsl (~openCL) shader code as C++. That works remarkably well, given some discipline. Could be used for debugging the shader. Or, my main goal, to have the exact same DE available to scripting in the CPU.
Quote
- Only possible to use float type variables (no doubles). It limits calculation accuracy and of course maximum zoom
Yeah, gets video card specific. And 4+x slower.
The video card / driver / OS support issues are most annoying for an application if you're not the only person using it.
Quote
- Available to use only video card built-in memory. It limits maximum image resolution in some cases. Of course it is possible to render image in smaller blocks, but then there will be not possible to render of some effects by GPU.
- Very difficult debugging of kernel program. There is no possibility to use printf() or some other functions to observe what is going on in the program. There is also no debugger. It is only possible to see compiler errors.

Neutral:
- Philosophy of writing programs is completely different. Parallel computation need different program structure than for ordinary CPUs. Program has to be compiled dynamically and build from small "bricks" to get good performance, because too long code with many branches causes smaller sizes of workgroups (slower computation).

Positives:
- Very fast computation. In some cases it works 20-30 times faster than on CPU (I'm comparing Intel Core 2 Quad 8200 and GeForce 9600 GT). On my graphics card there is possible to have up to 512 parallel threads (for longer code about 128).
- possible to use native_xxxx() math functions which are incredibly fast
- built-in many math functions and vector types
- My graphics card heats my room  cheesy

Implementation of OpenCl in Mandelbulber will take a lot of time, but it is worth to do it. Preview speed is much more faster.
Real-time navigation is very addictive  grin

Also, glsl shaders could be made to run in browsers using webGL etc. Since the fractal rendering is such a sweet spot parallel problem (low memory bandwidth, high flops, likes ray-tracing effects), it is nicely aligned with the GPU manufacturers' efforts. Performance and number of deployment platforms (pads, phones) increases will only widen the gap with pure CPU implementations.
Logged
Syntopia
Fractal Molossus
**
Posts: 681



syntopiadk
WWW
« Reply #4 on: September 26, 2011, 10:25:14 PM »

Hi,

You are making progress quickly, Buddhi! Really looking forward to trying this!

Since I've just evaluated an OpenCL implementation of some CUDA code at work, I have a few comments too:

The cross-platform promise of OpenCL is nice, but the implementations are very different. Our OpenCL code was developed using NVidia's SDK, and we tried to see how portable it was. The results were not impressive:

AMD GPU OpenCL (512 MB ATI Radeon HD 4350): No support for Images (textures) at all on older generation AMD cards!
Intel OpenCL 1.1: Crashes - we don't know why yet
Mac CPU OpenCL: Has a max local_work_size of 1 (meaning no sync between threads possible)
Mac GPU OpenCL: Only supports OpenCL 1.0 (we required 1.1)
AMD CPU: Works, but very slow on CPU - ~2x slower than our single-threaded non-SSE CPU reference code.

So in the end only our Nvidia OpenCL version (both Windows and Linux, though) was acceptable. It should also be said that our OpenCL version is ~50% slower than the corresponding CUDA version.

I still believe OpenCL is the best choice in the long run - the implementations will obviously keep getting better and better.

It has richer native types than C99, no? Like vecN, matNxM etc. And declare where you use.
Actually, OpenCL is way behind GLSL here. OpenCL 1.0 didn't even have three-component vector types, a 'float3' is first part of OpenCL 1.1! And there are no matrix types yet (though 'floatnxm' is a reserved keyword for future use).

As a final note: for me the dynamic compilation is the greatest thing of OpenCL and GLSL - it allows you to live code your fractal formulas and preview the result right away.

Btw, for debug you could try Intel's OpenCL implementation for debugging, it supports printf(...).
Logged
lycium
Fractal Supremo
*****
Posts: 1158



WWW
« Reply #5 on: September 26, 2011, 10:54:29 PM »

Cross platform OpenCL's been getting a lot better with recent driver updates by both AMD and NVIDIA. Interestingly, of the CPU implementations AMD's seems a lot faster than Intel's.
Logged

David Makin
Global Moderator
Fractal Senior
******
Posts: 2286



Makin' Magic Fractals
WWW
« Reply #6 on: September 26, 2011, 11:53:19 PM »


As a final note: for me the dynamic compilation is the greatest thing of OpenCL and GLSL - it allows you to live code your fractal formulas and preview the result right away.


IMO that is the single most important thing after the potential speed - especially as an interface can be added so non-programmers devoid of even any maths ability can essentially "program" and see results "immediately" too wink
Logged

The meaning and purpose of life is to give life purpose and meaning.

http://www.fractalgallery.co.uk/
"Makin' Magic Music" on Jango
marius
Fractal Lover
**
Posts: 206


« Reply #7 on: September 27, 2011, 12:09:06 AM »

Actually, OpenCL is way behind GLSL here. OpenCL 1.0 didn't even have three-component vector types, a 'float3' is first part of OpenCL 1.1! And there are no matrix types yet (though 'floatnxm' is a reserved keyword for future use).

I stand corrected. I knew there was a reason to use GLSL over OpenCL  grin
The GLSL WebGL tie-in is also very compelling to me.
Logged
Loadus
Forums Freshman
**
Posts: 16



WWW
« Reply #8 on: November 09, 2011, 07:04:57 PM »

Buddhi, any news on this? A chance to test the early beta maybe?

Just got myself a new videocard and aching to test this. : )
Logged

ker2x
Fractal Molossus
**
Posts: 795


WWW
« Reply #9 on: November 09, 2011, 07:33:57 PM »

Quote
Negatives:
- OpenCL allows to use  only C99 version of C language, which has a lot of limitations (no classes, no global variables, etc...). I have to convert everything from C++ to C99. It means that I have to rewrite every function.
- Only possible to use float type variables (no doubles). It limits calculation accuracy and of course maximum zoom
- Available to use only video card built-in memory. It limits maximum image resolution in some cases. Of course it is possible to render image in smaller blocks, but then there will be not possible to render of some effects by GPU.
- Very difficult debugging of kernel program. There is no possibility to use printf() or some other functions to observe what is going on in the program. There is also no debugger. It is only possible to see compiler errors.

- Most recents card (gamer's card) support double now smiley
- I think you can, in some way, use host memory. Or maybe it's a cuda only thing, not sure. But... really, you don't want to do that.
- I don't understand what is the problem with the card's built-in memory. Any card have at least 512MB, and 1GB is common now. isn't it enough ?   huh?
- Yes, debugging sux smiley

Quote
Neutral:
- Philosophy of writing programs is completely different. Parallel computation need different program structure than for ordinary CPUs. Program has to be compiled dynamically and build from small "bricks" to get good performance, because too long code with many branches causes smaller sizes of workgroups (slower computation).

Indeed, coding on GPU is completely different because the architecture is completely different.

Quote
Positives:
- Very fast computation. In some cases it works 20-30 times faster than on CPU (I'm comparing Intel Core 2 Quad 8200 and GeForce 9600 GT). On my graphics card there is possible to have up to 512 parallel threads (for longer code about 128).
- possible to use native_xxxx() math functions which are incredibly fast
- built-in many math functions and vector types
- My graphics card heats my room  cheesy

- Huh... 9600GT is slow  sad
- Only 512 Threads ? You're doing something wrong, or i don't understand what you wrote. To be efficient you should lauch millions of threads (eg : 1 per pixel, 1 per ray, ...)

One advice which was repeated so many times that i'm not sure if i should write it again :
Global memory acces is painfully slow. A good number is ~800 cycle latency per global memory access. Considering that you can do up to 8 add operation per cycle (using FMAD), you should avoid global memory access at all cost  grin
And, of course, (but you said it too) : Code branch slowdown you computation, a lot, too.
Logged

often times... there are other approaches which are kinda crappy until you put them in the context of parallel machines
(en) http://www.blog-gpgpu.com/ , (fr) http://www.keru.org/ ,
Sysadmin & DBA @ http://www.over-blog.com/
Buddhi
Fractal Iambus
***
Posts: 895



WWW
« Reply #10 on: November 09, 2011, 08:08:38 PM »

Buddhi, any news on this? A chance to test the early beta maybe?

Just got myself a new videocard and aching to test this. : )

I had 2 months break in development, but few days ago I have started again (my very little daughter is consuming a lot of time smiley . If somebody want to observe progress of development or test the program, please visit: http://code.google.com/p/mandelbulber/source/list
Logged

isosceles
Alien
***
Posts: 36



isosceles
WWW
« Reply #11 on: November 11, 2011, 05:10:09 AM »

Very exciting work! Glad to see you back. How is life with the baby?
Logged

Jason Fletcher
Charles Hayden Planetarium
Loadus
Forums Freshman
**
Posts: 16



WWW
« Reply #12 on: November 16, 2011, 11:04:25 AM »

Buddhi, any news on this? A chance to test the early beta maybe?

Just got myself a new videocard and aching to test this. : )

I had 2 months break in development, but few days ago I have started again (my very little daughter is consuming a lot of time smiley . If somebody want to observe progress of development or test the program, please visit: http://code.google.com/p/mandelbulber/source/list

Hehe, totally understandable. cheesy:D

No hurry at all. : )
Logged

Pages: [1]   Go Down
  Print  
 
Jump to:  


Powered by MySQL Powered by PHP Powered by SMF 1.1.21 | SMF © 2015, Simple Machines

Valid XHTML 1.0! Valid CSS! Dilber MC Theme by HarzeM
Page created in 0.309 seconds with 25 queries. (Pretty URLs adds 0.014s, 2q)