Welcome to Fractal Forums

Fractal Software => Programming => Topic started by: Duncan C on August 29, 2011, 03:04:24 AM




Title: Roll-your-own 64 or 128 bit fixed point math for Mandelbrot/Julia calculations?
Post by: Duncan C on August 29, 2011, 03:04:24 AM
I've been chomping at the bit to try my hand at OpenCL for fractal rendering. However, most graphics cards don't have hardware support for double precision math.

Even double precision is limiting. I've found that if I zoom in to a plot width of around 2e-13, the plot starts to degrade due to round-off errors, and by 5e-14, it falls apart completely.

64-bit integer math is pretty universal, and very fast. If you treat a 64 bit number as a signed value, you lose a bit. Then take the top 2 high-order bits and treat them as integer values. The remaining 61 bits would be fractional value. 2^-61 is about 4.3E-19. That's pure precision, without any rounding error.

You can do fixed-point math using signed integer instructions, and just apply a scaling factor at the end to interpret the results.

Values in the range +/- 4.99999999 are plenty big enough for doing Quadratic Mandelbrot/Julia set calculation.

It should be trivial to write code that calculates quadratic Mandelbrot and Julia set values using fixed point arithmetic. What's more, scaling up to any arbitrary precision should be easy with fixed-point arithmetic. You'd just need to calculate carry from the low order to the high order part of the calculation. Such calculations would be about 4 times as slow for each doubling of precision, but that is much, much faster than any arbitrary precision software floating point library.

Have any of you experimented with high precision fixed-point math for fractal rendering? Especially GPU-based fixed-point math?

More sophisticated calculations like distance estimates and fractional iteration values could be harder. I'd have to go back and review those algorithms and see how hard it would do them in fixed point. If I remember correctly, both require logarithms, which would be a problem in a custom fixed point notation. It would no doubt be a great deal slower to calculate logs and other transcendentals in software than in the equivalent hardware-accellerated floating point.


Thoughts?


Title: Re: Roll-your-own 64 or 128 bit fixed point math for Mandelbrot/Julia calculations?
Post by: lycium on August 29, 2011, 05:12:00 AM
Thoughts?
Zooming on the Mandelbrot set has been efficiently solved on the CPU since decades...

In fact, I'm worried that in 2050 we'll have computers with billions of petaflops, direct neural interface bringing high dimensional data straight into our consciousness... and there will still be people asking how to extended precision for zooming on the Mandelbrot set :|


Title: Re: Roll-your-own 64 or 128 bit fixed point math for Mandelbrot/Julia calculations?
Post by: A Noniem on August 29, 2011, 12:57:39 PM
A site I found on this is http://www.bealto.com/mp-mandelbrot_fp128-opencl.html, which basically does exactly what you want. Only problem is that not all the code is on that page, but it is definitely worth taking a look at. In this example he uses 4 32-bit ints to emulate a 128 bit float, but in theory you could use 16 64-bit longs to emulate a 1024 bit float (openCL supports up to 16-component vectors, which speeds up things compared to using 16 single longs) even though this will be a lot slower  ;D


Title: Re: Roll-your-own 64 or 128 bit fixed point math for Mandelbrot/Julia calculations?
Post by: lkmitch on August 29, 2011, 04:49:38 PM
Back in the day when floating-point units were not part of the CPU, folks used fixed-point math for their fractals.  That's how FractInt got its name.  I believe the source is available; you may want to have a look at it.


Title: Re: Roll-your-own 64 or 128 bit fixed point math for Mandelbrot/Julia calculations?
Post by: real_het on August 31, 2011, 12:21:58 PM
Hello!

You can read the multiplication topic on wiki -> http://en.wikipedia.org/wiki/Multiplication_algorithm (http://en.wikipedia.org/wiki/Multiplication_algorithm)
There are a some multi-precision mult algos there. Generally when you do the less calculations then you will have the bigger memory usage. As far as I experienced on gpu the best one is the 'long multiplication' optimized for just a few registers. And the best multiplication algo for the cpu is the FFT multiplication because it can easily access memory in a non-linear pattern (not like the gpu).

By using the 128 registers on an ati hd4xxx+ gpu you can go up to 1280 bit precision with the 'long multiplication' algo (32bit int math). After 1280 bit there is impossible to do the math on 128regs, so memory swapping will turn in, and it will be slow like hell. At that point I think FFTmult and more intense memory usage will outperform this method, but I've never tried that algo on the gpu yet.

Here are my measurements on the 'long' algo at different precisions:
bits | million mandel iters/sec
192 | 177
256 | 446
384 | 667
512 | 558
640 | 328
768 | 215
1024 | 78
1280 | 52
(those numbers have 32bit integer parts, measured on 2x HD6990@950Mhz (around 11.6 mad tflops))
The best performance is at 384 bits percision, and after that more and more execution units have to turned off because of the less registers they can have. (there are 128 regs shared across 64 execution units).
At 192 and 256 bits precisions the efficiency can be improved with much bigger calculation quantities in order to achieve better paralelism.

So, those were my thoughts, hope it helps