Botond Kósa
|
|
« Reply #90 on: February 26, 2014, 12:52:15 AM » |
|
Mandel Machine v1.0.6 is now available with the following bugfixes: - FIXED: Application sometimes hangs when using perturbation method with pixel guessing off
- FIXED: Bogus pixels with infinite or NaN iteration count appear in super dense areas
|
|
|
Logged
|
|
|
|
Botond Kósa
|
|
« Reply #91 on: February 26, 2014, 01:02:37 AM » |
|
Found something strange in Mandel Machine which I did not see before. Perhaps a bug?
<Quoted Image Removed>
The blue rectangles don't seem to fit the picture. Seen similar (smaller) things a few zoomlevels back. Param file attached. Rerendering from this file reproduces the error (did not do a restart of the program).
<EDIT> Notice the lowest big black circle. Between the big black circle and the smaller one, there is a stripe of what looks like the same blue color. Also seen on deeper zoomlevels. </EDIT>
Thanks youhn for the location. This bug has been haunting me for years, but only in a few pixels per image, so I didn't bother fixing it. It is caused by the adjacency optimization in super dense areas with iteration values spanning several magnitudes. It is now fixed in the newest version.
|
|
« Last Edit: February 26, 2014, 10:59:23 AM by Botond Kósa »
|
Logged
|
|
|
|
Botond Kósa
|
|
« Reply #92 on: February 26, 2014, 11:28:37 AM » |
|
Bruce Dawson from fractal extreme investigated the use of GPGPU for deep zooming. The conclusion was that GPUs could be used and they would be good for low depths, but they are less efficient exactly where they are needed most: at the deeper zoom levels. Mandelbrot rendering up to the max depth for floating point precision is already so fast that it doesn't really need to be optimized more. The human efforts would be too big for almost unnoticeable extra performance. However, perturbation requires only few high-precision calculations even at high zoom levels, so maybe those GPUs aren't that bad of an idea after all!
Our experiences show that double precision (fp64) is needed to correctly render images with perturbation. The fp64 performance of consumer GPUs and APUs are deliberately kept low in order to drive professional customers towards pro cards like nVidia Tesla and AMD FirePro that cost several times more than consumer cards with the same GPUs. (Consumer cards' fp64 performance is 1/8 - 1/16 of their fp32 performance, compared to 1/2 - 1/4 at pro cards.) There is a great article at AnandTech comparing floating point performance of recent AMD and Intel CPUs ( http://anandtech.com/show/7711/floating-point-peak-performance-of-kaveri-and-other-recent-amd-and-intel-chips). The conclusion is that even the fastest AMD APU is no match for a Core i7 in fp64 performance. Discrete GPUs on high-end consumer cards may have comparable fp64 performance to an Intel quad-core CPU.
|
|
« Last Edit: February 26, 2014, 11:31:10 AM by Botond Kósa »
|
Logged
|
|
|
|
Botond Kósa
|
|
« Reply #93 on: February 26, 2014, 11:34:35 AM » |
|
CUDA Device count = 1
Some properties of CUDA device 0: ================================= Name: GeForce GTX 760 Compute capability: 3.0 Number of multiprocessors: 6 Total global memory: 2147155968 bytes Shared Mem/Block: 49152 bytes Shared Mem Access: 8 bytes =================================
********************* n_digits = 156 prec_words = 12, 12 MAX_PREC_WORDS = 145 n_words = 17 numElement = 307200 ********************* Prepare data................................. done. test_add ........................................ numElement = 307200, interval = 307200 numBlock = 2400, numThread = 128 interval memory layout... *** GPU add: 0.003 sec *** *** CPU add: 0.164 sec *** *** The abs. of max. rel. error = 10 ^ 0 x 0 *** *** The abs. of avg. rel. error = 10 ^ 0 x 0 *** A sample when i = 164661 GOLD = 10 ^ 0 x 1.146514967712305809649915716387329890097697486394245497434308221497424464834325631 79364833248365700345339165485223330578732787146025929987360878945595128796 REF = 10 ^ 0 x 1.146514967712305809649915716387329890097697486394245497434308221497424464834325631 79364833248365700345339165485223330578732787146025929987360878945595128796
...
Could you provide some hints on how to interpret these results?
|
|
|
Logged
|
|
|
|
ellarien
|
|
« Reply #94 on: February 26, 2014, 11:54:52 AM » |
|
Thanks for the new version! I'm now getting a complete, silent crash (instead of a hang of the rendering engine) when panning in perturbation mode with glitches present. I'm not sure whether that's progress or not. In some ways it's better than having to kill the hung program from the task manager, but there's no opportunity to save the location.
Also, as of the previous version but also in this one, I'm sometimes seeing image corruption when zooming or glitch-fixing as well as when panning. Recompute then removes the glitch fix, but it usually comes right on the second attempt.
|
|
|
Logged
|
|
|
|
Botond Kósa
|
|
« Reply #95 on: February 26, 2014, 12:10:15 PM » |
|
The panning function is not working properly, I advise you to avoid it completely until I fix it. A workaround is to drag a selection rectangle from the center of the image to the edge, so that the size of the selection will be the same as of the image. Then move the selection and apply it by double clicking.
|
|
|
Logged
|
|
|
|
3dickulus
|
|
« Reply #96 on: February 26, 2014, 02:40:00 PM » |
|
Could you provide some hints on how to interpret these results?
What would you like to know? Consumer grade card around $200 CDN 6 processors x 192 cores ea @ 3Ghz Fills two arrays (numElement = 307200 = 640x480) with random numbers and performs a math op between each element like a[n] * b[n] = c[n] and compares the results with CPU. The time indicated includes the time it takes to copy datas to GPU and back again. CPU = Core2 Duo @ 2.8 Ghz edit: blocks and threads are GPU, Shared Mem is GPU chip cache with 64bit r/w (normally 32bit)
|
|
« Last Edit: February 26, 2014, 02:57:39 PM by 3dickulus, Reason: upd »
|
Logged
|
|
|
|
Botond Kósa
|
|
« Reply #97 on: February 26, 2014, 02:58:49 PM » |
|
What do n_digits, prec_words, MAX_PREC_WORDS and n_words mean?
|
|
|
Logged
|
|
|
|
3dickulus
|
|
« Reply #98 on: February 26, 2014, 03:39:36 PM » |
|
What do n_digits, prec_words, MAX_PREC_WORDS and n_words mean?
n_digits = the number of digits in a value prec_words = the number of words for storage of a value MAX_PREC_WORDS = maximum number of words set at GARPREC compile time ( can be adjusted up to 680 on my system, around 10000 digits ) n_words = actual number of words required = prec_words + n for overflow so you don't loose bits of precision edit: my mistake, times do not include copy to/from GPU but that is on the order of 0.0003 sec depending on array size
|
|
« Last Edit: February 26, 2014, 04:22:11 PM by 3dickulus, Reason: oops »
|
Logged
|
|
|
|
Botond Kósa
|
|
« Reply #99 on: February 26, 2014, 04:38:35 PM » |
|
So IIUC n_digits=156 means you have 156 decimal digits, that equals to 518 binary digits. Using 32-bit words that results in 17 words required, that's why n_words=17, right?
|
|
|
Logged
|
|
|
|
3dickulus
|
|
« Reply #100 on: February 26, 2014, 04:53:06 PM » |
|
yes
prec_words = 12, 12 is the number of words used on GPU and CPU for a single value
n_words = 17 is the number of words used "internally" by the GARPREC library to do the actual calculation, a 12 word value is returned
1 word = 1 double
GARPREC library also includes a "dd" double double 128bit type and a "qd" quad double 256bit type if you don't need all that precision
it works very much like MPFR library.
|
|
|
Logged
|
|
|
|
Botond Kósa
|
|
« Reply #101 on: February 26, 2014, 05:06:06 PM » |
|
How does GARPREC represent high precision numbers? Does it really use double precision floating points? This seems a little odd given the low fp64 performance of consumer GPUs.
|
|
|
Logged
|
|
|
|
3dickulus
|
|
« Reply #102 on: February 26, 2014, 05:26:57 PM » |
|
The best way to understand it is to grab the source code and poke around the internals, I'm not a mathematician or phd so admittedly there are many things I just take for granted like that the people who developed this know what they are doing. I found that GARPREC did not compile with CUDA 5.5 so I modified it where needed and put together a cmake project that does compile the static library and test program. here is my slightly modified version http://www.digilanti.org/cudabrot/garprec_1.2.1.zipyou will also need http://crd.lbl.gov/~dhbailey/mpdist/arprec-2.2.17.tar.gzand http://crd.lbl.gov/~dhbailey/mpdist/qd-2.3.14.tar.gzthe tar.gz files include documentation and instructions for compiling and installation I think this was designed for Tesla type GPUs but should work with anything that is capable of Compute 1.3 and up. edit: ARPREC and QD (CPU) have operators well defined so math is very straight forward... mp_real x; dd_real y = 5.5; qd_real z = x + y;
double n = 0.1;
x = n*y+z; the GPU side of things is a bit more complicated... double d[MAX_D_SIZE]; // MAX_D_SIZE defined in lib gmpadd(d_a, interval, d_b, interval, d_c, interval, prec_words, d);
where d_a,b,c are the value->words[] arrays and d is a temp scratch area (from GARPREC sources) "d is a temperoral buffer, should be allocated outside with the size (prec_words+7)"
|
|
« Last Edit: February 26, 2014, 05:44:50 PM by 3dickulus, Reason: clarity? »
|
Logged
|
|
|
|
Botond Kósa
|
|
« Reply #103 on: February 26, 2014, 11:27:25 PM » |
|
Version 1.1 is out, with automatic correction of flat blobs. In cases where some unsolved blobs remain, they can be corrected manually by right-clicking inside them. List of changes: - NEW: Automatic correction of flat blobs caused by the perturbation algorithm
- FIXED: Stack overflow can occur when rendering super dense areas with 1000+ magnification and pixel grouping of 2
|
|
|
Logged
|
|
|
|
ellarien
|
|
« Reply #104 on: February 26, 2014, 11:56:38 PM » |
|
Yay! That works a treat, and takes a lot of the frustration out of exploring.
|
|
|
Logged
|
|
|
|
|