Logo by mjk1093 - Contribute your own Logo!

END OF AN ERA, FRACTALFORUMS.COM IS CONTINUED ON FRACTALFORUMS.ORG

it was a great time but no longer maintainable by c.Kleinhuis contact him for any data retrieval,
thanks and see you perhaps in 10 years again

this forum will stay online for reference
News: Visit us on facebook
 
*
Welcome, Guest. Please login or register. November 29, 2025, 05:19:06 AM


Login with username, password and session length


The All New FractalForums is now in Public Beta Testing! Visit FractalForums.org and check it out!


Pages: 1 ... 16 17 [18] 19 20 ... 22   Go Down
  Print  
Share this topic on DiggShare this topic on FacebookShare this topic on GoogleShare this topic on RedditShare this topic on StumbleUponShare this topic on Twitter
Author Topic: Mandel Machine  (Read 50129 times)
Description: A highly efficient Mandelbrot set explorer
0 Members and 1 Guest are viewing this topic.
Kalles Fraktaler
Fractal Senior
******
Posts: 1458



kallesfraktaler
WWW
« Reply #255 on: September 04, 2014, 09:44:43 AM »

Thanks, now it works smiley
Nice to see that its now solving glitches in previously unbelievable speed!  shocked

Attached location makes SA crash though.

* glitch49.kfr.txt (1.64 KB - downloaded 73 times.)
Logged

Want to create DEEP Mandelbrot fractals 100 times faster than the commercial programs, for FREE? One hour or one minute? Three months or one day? Try Kalles Fraktaler http://www.chillheimer.de/kallesfraktaler
http://www.facebook.com/kallesfraktaler
Botond Kósa
Fractal Lover
**
Posts: 233



WWW
« Reply #256 on: September 05, 2014, 03:19:23 AM »

I have corrected some of the recently discovered bugs and uploaded a new version.
List of changes: http://web.t-online.hu/kbotond/mandelmachine/#changelog
Logged

Check out my Mandelbrot set explorer:
http://web.t-online.hu/kbotond/mandelmachine/
Kalles Fraktaler
Fractal Senior
******
Posts: 1458



kallesfraktaler
WWW
« Reply #257 on: September 05, 2014, 03:07:44 PM »

Thanks.
You are close now to make a final release smiley

The location I posted still breaks the approximation unfortunately.
Or fortunately, since you have a stress location to test with smiley

I tested it against some of the locations in the gallery I have on my site.
MM is 2-10 times faster than KF for most locations.
But for some location, KF is actually faster than MM, e.g. "candy".
The reason is that KF is able to approximate much more iterations - for the same number of approximation terms.

You wrote that you were able to speed up the calculation of approximation with 65%, I wonder what you did to do that?
Are you "caching" the alignment/normalization of your version of "fixedfloat"? Because I think that is a big reason why it is much more than twice slower than ordinary double (which I think it should be in theory). But that might also be a reason why precision may be limited.

I will hopefully upload my latest code in a few days, if that can give you some tips on how I calculate of approximation.
Logged

Want to create DEEP Mandelbrot fractals 100 times faster than the commercial programs, for FREE? One hour or one minute? Three months or one day? Try Kalles Fraktaler http://www.chillheimer.de/kallesfraktaler
http://www.facebook.com/kallesfraktaler
Botond Kósa
Fractal Lover
**
Posts: 233



WWW
« Reply #258 on: September 05, 2014, 04:26:21 PM »

Your latest location breaking and the speedup of SA are actually closely related.

Previously I was using two ASFloat objects (each with a double precision mantissa and and integer exponent) for the re and im parts of an ASComplex type. Profiling revealed that the performance bottleneck was the addition of two ASFloats (both ASComplex addition and ASComplex multiplication require 2 ASFloat additions). Before performing the addition, the two ASFloats had to be adjusted to have equal exponents (meaning the smaller exponent was set to equal the larger, and its mantissa was rescaled accordingly).

I had an idea that complex numbers could be represented with two mantissas (re and im) and a shared exponent. This way, adding two complex numbers requires only one rescale operation. Even better, multiplying two complexes requires no rescale at all, because (a+bi)(c+di) = (ac-bd)+(bc+ad)i, and the terms ac, bd, bc and ad have equal exponents. This is how my new ASComplex2 type works.

ASComplex2 was introduced in Mandel Machine in two steps:
  • In version 1.2.3, the calculation of SA coefficients was switched to use ASComplex2. This resulted in an up to 5x speed improvement for that phase of the rendering. The resulting coefficients were still stored in the old ASComplex format.
  • From version 1.2.5, the coefficients are stored in both ASComplex and ASComplex2. For pixels not in the same row or column as the reference (that is, dx0 and dy0 are not zero), the actual approximation of deltaN is calculated using ASComplex2. If one of them is zero, the regular ASComplex is used. This yields an up to 65% speedup at doing the approximation, which of course is noticeable only when SA can skip enough iterations so that the time to calculate the remaining iterations is comparable to do the approximation itself. The 65% speedup was measured in the tick-tock location at huge resolution (~20000x10000) and 33 SA coefficients.

Unfortunately, ASComplex2 has two shortcomings:
  • When storing numbers whose re and im parts have very different orders of magnitude, the arithmetic operations can potentially have larger rounding errors than ASComplex with two distinct ASFloats. I have not encountered such problems so far, but this needs further testing.
  • Zero re or im values cannot be represented in ASComplex2, because zero has no meaningful exponent. This is why your location breaks SA since version 1.2.3. Zeros could be handled as special cases, but checking the mantissas against zero at every operation would slow them down considerably. In fact, when im=0, all the calculations can be done using real numbers represented in a single ASFloat. I plan to implement this in a future release.

During the initial calculation of the SA coefficients, I normalize them every 16 iterations, just like I did with the old ASComplex type. For some locations, this is not enough and causes the number of skipped iterations to drop to near zero when the number of coefficients exceeds a certain amount. It seems the frequency of the normalization has to be adapted to the magnitude of the coefficients. There is still much work to do in this area.

There is also some room for further speed improvements. ASComplex2 is currently implemented in Java, where rescaling of the mantissas has to be done using floating point multiplications. Using ASM it could be done by integer additions to the exponent parts of the double precision mantissas, since the rescale factor is always an integral power of 2. SSE could also be used to perform many operations (addition, multiplication, rescaling) on the re and im parts simultaneously.
Logged

Check out my Mandelbrot set explorer:
http://web.t-online.hu/kbotond/mandelmachine/
Botond Kósa
Fractal Lover
**
Posts: 233



WWW
« Reply #259 on: September 05, 2014, 04:49:44 PM »

Are you "caching" the alignment/normalization of your version of "fixedfloat"? Because I think that is a big reason why it is much more than twice slower than ordinary double (which I think it should be in theory). But that might also be a reason why precision may be limited.

I use a cache of rescale factors that fit into the range of double precision floating point numbers: 2-1022 to 21023. When a mantissa has to be rescaled, the rescale factor is obtained from the cache by using the shifted exponent as the array address. This way no actual floating-point exponentiation is performed.

In fact, I initially thought that any rescale with an exponent not in the [-52, 52] range could be eliminated because double has only 52 significand bits, so adding two doubles whose exponents differ by more than 52 results in the larger one. Unfortunately, this is only true when adding two normalized ASFloats. By performing normalization every 16 iterations only, rescales with an exponent larger than 52 are necessary. So I gradually increased the limit from 52 to 80, then 100, 200, and so on, from version to version, as more and more problematic locations were found by Dinkydau and stardust4ever. This fixed those particular locations, but slowed down the calculation of the previously unproblematic ones a bit. I know this is quite a silly approach to it, but so far I had no time and patience to analyze the problem numerically and come up with a reliable solution.  :smiley
Logged

Check out my Mandelbrot set explorer:
http://web.t-online.hu/kbotond/mandelmachine/
stardust4ever
Fractal Bachius
*
Posts: 513



« Reply #260 on: September 05, 2014, 11:39:26 PM »

Just wanted to say that the automatic glitch solving in 1.2.7 is awesome. Also as far as I can tell, it appears the scroll wheel crashing I mentioned a few posts earlier has been fixed. Thank you.

However, you claim that huge renders up to nearly a half gigapixel (23000x23000) can be realized, but I keep getting an error where Mandelbrot turns gray and freezes as I approach 16000x16000. I can successfully render at 15875x15875, but it hangs consistently at resolutions of 15900x15900 or greater. I understand bitmaps go corrupt at somewhere larger than 23040x23040 ((360x64)^2 is just shy of .5 binary Gigapixels but larger than 500 decimal megapixels, and mathematically it is a nice round number (small prime factors less than or equal to 5) to use) but I'm unable to render up to the 23000x23000 resolution you advertised.

In the bottom right hand corner of the window, I see the text 2591M / 3073M / 3911M. I'm not sure what that means but it appears Windows, Java or something is not allocating enough memory to Mandel Machine. I recently upgraded to 16Gb (8Gb x 2) of 1866Mhz DDR3 Raedon dual channel memory from previously old 8Gb (2Gb x 4) DDR3 1333, and have configured my system in BIOS to run the new RAM at rated speed. (My CPU is AMD FX processor, 8 core bulldozer, @4.2Ghz, Windows 7 64-bit Pro). Not sure why Mandel Machine is not getting allocated the memory it needs but I see no reason why I shouldn't be able to allocate more than 3991Mb. Is there some settings in Windows 7 or Java that needs to be adjusted? Worse, if I increase the resolution beyond the threshold, the application just hangs indefinitely. Currently Windows 7 Task Manager says I have a little over 10 Gb memory available so why can't Mandel Machine use more of it? EDIT: I closed Mandel Machine and it jumped up to 13Gb available.

Parameter (MMF):
Code:
image.width=15875
image.height=15875
image.supersampling=0
position.re=-0.7500000
position.im=0.0000000
position.magnification=1.25
position.rotation=0.0
computation.iteration_limit=1000
rendering.computed_only=false
rendering.inner_color=000000
rendering.outer_color=dddd00
rendering.empty_color_1=ffffff
rendering.empty_color_2=e0e0e0
rendering.transfer_function=1
rendering.color_density=0
rendering.dwell_bands=0
rendering.de_transfer_function=0
rendering.de_color_density=0
rendering.palette=0.0,0,7,100,0.15625,32,107,203,0.41015625,237,255,255,0.62744140625,255,170,0,0.83740234375,50,2,0
rendering.color_offset=0
Logged
stardust4ever
Fractal Bachius
*
Posts: 513



« Reply #261 on: September 06, 2014, 12:04:05 AM »

Thanks, now it works smiley
Nice to see that its now solving glitches in previously unbelievable speed!  shocked

Attached location makes SA crash though.
Looks fine by me, second attempt to open worked for some reason. If not, zoom out and back in.

* glitch49.kfr.mmf.txt (3.07 KB - downloaded 85 times.)
Logged
Botond Kósa
Fractal Lover
**
Posts: 233



WWW
« Reply #262 on: September 06, 2014, 12:31:27 AM »

The numbers in the lower right corner describe the actual RAM usage of the application. The smallest one is the actually used amount, the middle number is the allocated, and the largest number is maximum allocable amount of RAM in megabytes.

When Mandel Machine is started, the maximum amount of RAM allocable by the JVM is set to 4400 MB. This proved to be enough in some old version, but since then I made a few changes and forgot to adjust the limit. These are the two major changes that affect the RAM usage of the application (both introduced in version 1.2):
  • The previously bundled Oracle JRockit 1.6 JVM was dropped from the distribution package. MM now searches for a preinstalled JVM of appropriate version. JRockit could allocate the whole 4400 MB, but newer 1.7 HotSpot JVMs can only allocate up to ~3900 MB when a 4400 MB limit is specified.
  • The storage of iteration data was changed from float to double, increasing the memory footprint of the image from 8 to 12 bytes per pixel. This was necessary because the granularity of floats became visible in locations with slowly changing gradients and very high iteration counts (millions).

There are other memory-consuming new features as well, e.g. glitch correction. So the limit of 4400 MB will have to be raised in a future version. Until then you can manually change it in the recently provided mm_start.cmd file. I also have 16 GB and could render a 23000x23000 image successfully by specifying -Xmx10000m instead of -Xmx4400m. You will have to start the application with the cmd, not the exe.
Logged

Check out my Mandelbrot set explorer:
http://web.t-online.hu/kbotond/mandelmachine/
stardust4ever
Fractal Bachius
*
Posts: 513



« Reply #263 on: September 06, 2014, 01:37:57 AM »

There are other memory-consuming new features as well, e.g. glitch correction. So the limit of 4400 MB will have to be raised in a future version. Until then you can manually change it in the recently provided mm_start.cmd file. I also have 16 GB and could render a 23000x23000 image successfully by specifying -Xmx10240m instead of -Xmx4400m. You will have to start the application with the cmd, not the exe.
LOL, I loathe using the command line. I may just make a *.bat file to click on to unlock that extra RAM whenever I want to do "Hue Jazz" style renders.

EDIT: It worked, thanks! Editing the CMD file didn't do anything but I placed the following line in a *.bat file and dumped it in the MandelMachine folder and it worked. (Not recommended unless you have 16Gb or more of RAM) Now I can render flawlessly at 23040x23040! cool
Code:
java -showversion -Xmx12288m -jar mm.jar

Does anything bad happen if I exceed .5 gigapixel? devil

EDIT: Answered my own question. Mandel Machine blocks you from exceeding 536 megapixel. That's probably a good thing.
« Last Edit: September 06, 2014, 01:47:45 AM by stardust4ever » Logged
Kalles Fraktaler
Fractal Senior
******
Posts: 1458



kallesfraktaler
WWW
« Reply #264 on: September 06, 2014, 12:20:13 PM »

Before performing the addition, the two ASFloats had to be adjusted to have equal exponents (meaning the smaller exponent was set to equal the larger, and its mantissa was rescaled accordingly).
When I do addition in floatexp, I only change the exponent part in the mantissa of the smaller object. I don't manipulate the mantissa with any double arithmetic...

Code:
	static __inline double setExp(double newval,__int64 newexp)
{
*((__int64*)&newval) = (*((__int64*)&newval) & 0x800FFFFFFFFFFFFF) | ((newexp+1023)<<52);
return newval;
}
__inline floatexp operator +(const floatexp &a)
{
floatexp r;
__int64 diff;
if(exp>a.exp){
diff = exp-a.exp;
r.exp = exp;
if(diff>MAX_PREC)
// If the smaller term is too small, ignore it.
r.val=val;
else{
// Scale the smaller term and do the addition.
double aval = setExp(a.val,-diff);
r.val = val+aval;
}
}
else{
diff = a.exp-exp;
r.exp = a.exp;
if(diff>MAX_PREC)
r.val=a.val;
else{
double aval = setExp(val,-diff);
r.val = a.val+aval;
}
}
_ALIGN_(r.val,r.exp)
return r;
}
Logged

Want to create DEEP Mandelbrot fractals 100 times faster than the commercial programs, for FREE? One hour or one minute? Three months or one day? Try Kalles Fraktaler http://www.chillheimer.de/kallesfraktaler
http://www.facebook.com/kallesfraktaler
Botond Kósa
Fractal Lover
**
Posts: 233



WWW
« Reply #265 on: September 07, 2014, 09:31:18 PM »

When I do addition in floatexp, I only change the exponent part in the mantissa of the smaller object. I don't manipulate the mantissa with any double arithmetic...

I tried exactly the same, but it did not result in a significant speedup. Maybe because replacing a floating-point multiplication by an integer addition only saves a few clock cycles per floatexp addition. The nested IF constructs cost much more, especially in the case of a branch misprediction.
Logged

Check out my Mandelbrot set explorer:
http://web.t-online.hu/kbotond/mandelmachine/
stardust4ever
Fractal Bachius
*
Posts: 513



« Reply #266 on: September 07, 2014, 10:07:44 PM »

I tried exactly the same, but it did not result in a significant speedup. Maybe because replacing a floating-point multiplication by an integer addition only saves a few clock cycles per floatexp addition. The nested IF constructs cost much more, especially in the case of a branch misprediction.
This sounds like one of those things where performance between integer vs float could change between processor architectures: for instance Intel hyper-threading cores versus AMD shared FPUs. You might get more or less speedup depending on processor design.
Logged
Botond Kósa
Fractal Lover
**
Posts: 233



WWW
« Reply #267 on: September 09, 2014, 10:01:14 AM »

A new beta is available. It contains some bug fixes and performance improvements.
List of changes: http://web.t-online.hu/kbotond/mandelmachine/#changelog
Logged

Check out my Mandelbrot set explorer:
http://web.t-online.hu/kbotond/mandelmachine/
stardust4ever
Fractal Bachius
*
Posts: 513



« Reply #268 on: September 09, 2014, 12:30:17 PM »

Glad to see you have improved AVX implementation. I'm not entirely sure how many pixel groupings AMD Bulldozer/Piledriver cores can handle atm. My 8-core Desktop and 4-core laptop (both AMD) give a very slight margin at 12 over 16, only a couple percent on the benchmark.

Can't update to 1.2.8 yet as I've got a render on the backburner. 6679 zooms!  shocked

One thing I notice while zooming in, is that most frames appear nearly instantly when the 1st orbit is completed, but when approaching areas with high density or a highly advanced Julia formation, the process of rendering pixels suddenly slows down by nearly an order of magnitude, then speeds back up after leaving the area or zooming through it. Kind of ironic that the feature I'm trying to render takes longer than the stuff immediately before or after it. And no, it's not one of those "blobby" areas either. Still strange that the complexity of the image or density of iteration data will have a profound impact on render speed (orbits are still about the same at this depth and iteration count though).

Food for thought, at 6679 zoom depth and 1,400,000 iteration depth, the orbit by itself takes 30-40 seconds. It can still get slow zooming in. Turn glitch correction off or waitfor the image to completely render as you can cause the application to crash if you attempt to zoom in during a second or subsequent orbit calculation.  Most areas you can turn glitch correction off and you have a pretty good idea the centroid is in the center of the black circle thingy. On my big monitor, I haverage about a little over 5 zooms per frame advance, manually drawing tiny rectangles. I still have to wait 30-40 seconds between zooms though for the orbits to complete. It is still possible to crash with the scroll wheel in version 1.2.7, but occurs less frequently, so the bug isn't 100% fixed.

Still awesome that I can zoom incredibly deep though. I've got an "X of Xs of Xs of Xs" formation that I will be posting online soon! grin
« Last Edit: September 09, 2014, 12:35:32 PM by stardust4ever » Logged
Botond Kósa
Fractal Lover
**
Posts: 233



WWW
« Reply #269 on: September 09, 2014, 01:39:36 PM »

I'm not entirely sure how many pixel groupings AMD Bulldozer/Piledriver cores can handle atm. My 8-core Desktop and 4-core laptop (both AMD) give a very slight margin at 12 over 16, only a couple percent on the benchmark.
In the AMD Bulldozer architecture each processor module includes two CPU cores and one shared FPU (your 8-core CPU has only 4 FPUs). Inside a module, the two CPU cores already saturate the execution units of the shared FPU at a pixel grouping of 12, so going to 16 results in no further speed improvement. (Btw, the situation is the same on Intel CPUs with HyperThreading.)

Even worse, the Bulldozer modules have only two 128-bit FPU pipelines that can be unified into one 256-bit unit when running AVX code, so theoretically their performance under SSE2 and AVX is the same. By comparison, Intel processors have two 256-bit wide FPU pipelines per core. One Sandy Bridge or newer core has the same floating point throughput as 4 Bulldozer cores under AVX workloads. For more details and measurements see this article at AnandTech: http://www.anandtech.com/show/7711/floating-point-peak-performance-of-kaveri-and-other-recent-amd-and-intel-chips
Logged

Check out my Mandelbrot set explorer:
http://web.t-online.hu/kbotond/mandelmachine/
Pages: 1 ... 16 17 [18] 19 20 ... 22   Go Down
  Print  
 
Jump to:  

Related Topics
Subject Started by Replies Views Last post
Mandel FractalForums.com Banner Logos fractalwizz 0 2337 Last post October 13, 2009, 08:08:07 PM
by fractalwizz
Mandel Machine - some questions Others simon.snake 4 4484 Last post April 21, 2014, 08:14:13 PM
by simon.snake
Mandel Machine Zoom Test 2^278. Movies Showcase (Rate My Movie) SeryZone 6 2736 Last post December 01, 2014, 06:41:16 PM
by SeryZone
Mandel Machine storing thousands of items in history folder. Mandel Machine stardust4ever 0 2155 Last post March 20, 2016, 12:37:17 PM
by stardust4ever
Whither Mandel Machine Mandel Machine Pauldelbrot 2 3694 Last post January 04, 2017, 02:52:02 PM
by Kalles Fraktaler

Powered by MySQL Powered by PHP Powered by SMF 1.1.21 | SMF © 2015, Simple Machines

Valid XHTML 1.0! Valid CSS! Dilber MC Theme by HarzeM
Page created in 0.377 seconds with 26 queries. (Pretty URLs adds 0.013s, 2q)