Logo by fractalwizz - Contribute your own Logo!

END OF AN ERA, FRACTALFORUMS.COM IS CONTINUED ON FRACTALFORUMS.ORG

it was a great time but no longer maintainable by c.Kleinhuis contact him for any data retrieval,
thanks and see you perhaps in 10 years again

this forum will stay online for reference
News: Visit the official fractalforums.com Youtube Channel
 
*
Welcome, Guest. Please login or register. March 29, 2024, 01:44:27 PM


Login with username, password and session length


The All New FractalForums is now in Public Beta Testing! Visit FractalForums.org and check it out!


Pages: [1]   Go Down
  Print  
Share this topic on DiggShare this topic on FacebookShare this topic on GoogleShare this topic on RedditShare this topic on StumbleUponShare this topic on Twitter
Author Topic: Needs faster Desktop Processor for Deep Zooming (Integer)  (Read 10211 times)
Description: AMD FX-8150 (8-Core, 3.6Ghz) ???
0 Members and 1 Guest are viewing this topic.
stardust4ever
Fractal Bachius
*
Posts: 513



« on: December 10, 2011, 12:11:58 PM »

I've been doing some extremely deep zoom renders in the Mandelbrot set using Fractal Extreme 64-bit, and have recently started zooming in on features in the order of the 2000s of zooms and beyond.  My current CPU is the AMD Phenom II X4 955, running at stock clock speed of 3.2 Ghz. I used to have it clocked at 3.6Ghz and slightly overvolted, with a beefy heatsink, but over time, the processor became more and more prone to errors and crashes, so I bumped the clock back down from 18x to 16x stock. Anyway, the renders are becoming mind-numbingly slow at these extraordinary zoom levels, and one project goal I've dreamt up will be a giant X shape made up of X shapes which are made up of smaller Xs. Click the link for an example of the kind of ridiculously deep features I've been rendering (this one took two strait weeks): http://stardust4ever.deviantart.com/art/X7-Hot-Fusion-2431-Zooms-270452576 I haven't reached the target area yet, but I'm getting close with the periodic doubling, and I'm predicting the final formation formation will be around 3,127 zoom levels at 375,000 iterations per pixel. I'm planning on rendering it at 4096 pixels square, and if my estimates are correct, the final render will take nearly 3 months of continuous processing on my current setup. As long as I return to the computer and save the render progress on a periodic basis, I shouldn't have to worry too much about losing my progress due to computer crashes or blackouts.

I built the system in 2009, with 8 Gigs of dual-channel 1333Mhz DDR3 RAM and Vista 64-bit, but lately I've been thinking of upgrading my motherboard and processor. Basically, I'm looking for the CPU that will give the best multi-threaded performance doing nothing but 64-bit integer calculations, and the $269 AMD FX-8150 seems like the best option without spending a fortune on server-grade components. I'm hoping I can keep my existing Windows installation, RAM, Power Supply, etc and just upgrade the processor and motherboard (the new processor utilizes the same AM3+ socket, but Biostar is dumb and won't provide bios updates for the new CPUs - the motherboard uses the 790 chipset and doesn't support the dual power plane necessary for turbo functionality). Hopefully, I can swap out the motherboard without Windows Vista pitching a fit.

The AMD FX-8150 stock clocks at 3.6Ghz and has 8 interger processing cores, but has gotten some lackluster reviews due to some shortcuts AMD has taken resulting in less than stellar floating-point performance. Single-threaded performance per clock cycle is poor as well. Like I said, I could care less about FPU performance, but at stock speeds, the FX-8150 seems to beat the socks off of all but the high-end Intel Xeons when it comes to multi-threaded double-precision integer calculations, which is really all I care about right now.

So should I invest in the new FX-8150 processor, or should I hold off for another year or so until AMD works all the bugs out and releases it's next CPU architecture??? I could really use a performance boost. Smiling Mandelbrot
« Last Edit: December 10, 2011, 12:25:21 PM by stardust4ever » Logged
David Makin
Global Moderator
Fractal Senior
******
Posts: 2286



Makin' Magic Fractals
WWW
« Reply #1 on: December 10, 2011, 01:20:39 PM »

Not sure about specifically integer calculations rather than using the FPU, but the UF benchmarks probably still give a reasonable indication of relative processor performance - unfortunately they've not been updated for a while I don't think, but here they are anyway:

http://www.shadoworld.co.uk/UF5/Benchmark/Benchmark%20results.htm
Logged

The meaning and purpose of life is to give life purpose and meaning.

http://www.fractalgallery.co.uk/
"Makin' Magic Music" on Jango
cbuchner1
Fractal Phenom
******
Posts: 443


« Reply #2 on: December 10, 2011, 04:02:29 PM »

The new Intel AVX Vector extensions (which are implemented by current Intel processors and will be by future AMD processors as well) might give a near 2x performance boost at same clock rates, simply because it allows 8 simultaneous computations, instead of 4 (as previously with SSE).

However this would require a software update... Existing fractal software doesn't make use of AVX yet, but I am sure programmers will adopt this technology soon.

Also I strongly believe that the GPU will be able to contribute a significant amount to the computation, especially in AMD's "APU" offerings which integrates CPU and a programmable GPU on the same die. This also depends on having proper software support.

Christian
« Last Edit: December 10, 2011, 04:22:39 PM by cbuchner1 » Logged
panzerboy
Fractal Lover
**
Posts: 242


« Reply #3 on: December 11, 2011, 02:46:58 PM »

The new Intel AVX Vector extensions (...) might give a near 2x performance boost at same clock rates,

See this discussion I've been having with Bruce Dawson the author of Fractal Extreme.
http://randomascii.wordpress.com/2011/11/28/faster-fractals-again/

The AVX arithmetic instructions that use the 256bit registers are all floating point.
About the only thing you might achieve for integers is to cache values in the upper 128bits.
Forthcoming versions of AVX may include integer arithmetic with 256bit vectors, the register size may also be increased to 512bits or more.

As for processors, I'd go for an i7. Even though intel's focus is obviously floating point (for H.264 video?) nothing can touch the i7's integer performance today.
(I know Intel has quicksync H.264 acceleration, but its poorly supported and you have to disable your GPU card to use it, hence they still have an interest in excellent fp CPU performance)
« Last Edit: December 11, 2011, 03:02:09 PM by panzerboy » Logged
hobold
Fractal Bachius
*
Posts: 573


« Reply #4 on: December 12, 2011, 05:17:02 PM »

One more comment that will further complicate things:

When people talk about "integer performance", they are generally referring to anything but floating point arithmetic. It seems to me that the original poster is really interested specifically in "integer arithmetic performance", which is not represented all that much in the usual integer performance benchmarks.

I have dabbled in performance optimization for more than 20 years now, and throughout all that time, neither the user community nor the computer industry managed to come up with a comprehensive set of micro benchmarks that would offer some standardized way of measuring such details.

I have heard rumours that AMD's Bulldozer is the first microprocessor with a dedicated, almost fully pipelined, integer divider (per core, obviously). Ever since the original Athlon, AMD also has a history of dedicating transistors to fully pipelined, high performance integer multipliers. So despite the Bulldozer's documented weakness in general integer performance, this processor could very well be a beast for integer arithmetic, and particularly for multiprecision/bignum computations.

But without some credible measurement, I dare not give an actual recommendation for Bulldozer. The closest to a standard I know of is GMP (the GNU Multiprecision Library), but they do not have code and results for Bulldozer yet:
http://gmplib.org/gmpbench.html

And you can see in these scores that Intel's offerings, even if second place, are not far behind. So they are probably the overall better option.
Logged
stardust4ever
Fractal Bachius
*
Posts: 513



« Reply #5 on: December 13, 2011, 07:41:54 AM »

Thank you Hobold for your thoughts. I think this benchmark test sums it up nicely (see thumbnail). I know the thumbnail is for AVX processing and shows only the 32-bit results, but I imagine they would be comparable for 64-bit computations.
http://www.tomshardware.co.uk/fx-8150-zambezi-bulldozer-990fx,review-32295-5.html
While they do have is an actual Mandelbrot fractal benchmark, it uses floating point calculations, not integer, so I can't really use that as an accurate performance benchmark.

The FX-8150 is almost continuously sold out on Newegg.com, so there must be some continued interest in the processor by the general public, either that or the production facility can't keep up.

Fractal Extreme doesn't support AVX instructions yet, but if I understand it correctly, this would eventually allow 64-bit processors to perform enhanced precision 128/256-bit arithmetic with single instructions, therefore I could imagine nothing but huge performance gains in the future when the software catches up. The software almost exclusively uses squaring (multiplication) and addition performed using 64-bit integers. At least for the Mandelbrot zooms, it's almost exclusively simple arithmetic operators, and most of those are multiplications. The 64-bit multiplications are scaled repeatedly using lattice multiplication to perform the higher-precision math. Because they are doing squares, half of the multiplicand operations are duplicated and thus only need to be computed once, and because half of the least significant bits are discarded, another half of the calculations can be omitted, resulting in huge performance boosts through clever software shortcuts. And I don't think I'll have to worry too much about any performance penalty due to cache misses, especially for a program that often takes up less memory on my computer than the total cash size of some modern processors.

Irregardless of how well the graph actually applies to deep fractal rendering, it shows that, everything else being equal, the Sandy Bridge processor performs Floating point instructions better than integers, while the bulldozer performs those same integer instructions better than float. And I imagine the way the benchmark is presented, it is likely using both computation methods (integer, float) to do the same work. Another issue to consider, is that while Intel may have high-end desktop processors on the market that outperform AMD's selection in most areas, research shows that AMD has a history of providing better processing power per dollar than Intel. And don't get me started on server grade components. It would be really easy to get suckered into buying a "$5,000 lemon" that only gets outperformed by cheap desktops two years later. It's also worth noting that the next line of AMD desktop processors may very well keep all of the benefits of the Bulldozer platform tech while fixing a lot of it's shortcomings.

And Fractal Extreme certainly is not "ordinary desktop software" (games, productivity, ray tracing, etc) by any means, which is what most typical benchmarks try to portray.


* sandra multimedia avx.png (13.44 KB, 450x465 - viewed 766 times.)
« Last Edit: December 13, 2011, 07:48:26 AM by stardust4ever » Logged
stardust4ever
Fractal Bachius
*
Posts: 513



« Reply #6 on: January 10, 2012, 11:59:28 AM »

Well, I set up my new rig with the FX-8150 8-core processor. Using 64-bit integer arithmatic to render deep zoom fractals, the new 8-core processor gets less done per clock per core than the old Phenom II x4.

I did a test fractal on my old rig before I tore out the motherboard. Using 832-bit precision math, the file took 4:55 (4 minutes, 55 seconds) to complete on the Phenom II x4 955 (stock 3.2Ghz).

On my new rig, that same file took 3:11 on my new FX-8150 processor, clocked at the stock 3.6Ghz

That's about 54% faster, or approximately 1.54 times as fast.

If each core did the same amount of work, per clock cycle, the new processor should have been 2.25 times as fast running at stock speed. Hence, the online reviews are right, that the older Phenom II's get more work done per clock per core then the newer Bulldozer platform. If the charts are any indication, floating point performance is probably even worse.

Much to my delight, by manipulating the multiplier, the new processor runs completely stable at 4.2Ghz, even at stock voltage! (You'll need a good heat sink, though). I ran the test again at a number of different multipliers, and the render time scaled inversly proportional to the clock speed, meaning raw clock speed is the only bottleneck for Fractal Extreme.

The render time on the same test file, at 4.2Ghz (16.67% O.C.) is 2:48

So, even with a massive overclock, the new processor is still only 75%, or 1.75 times as fast, with double the processing cores and a massive overclock. It's still a nice gain in performance, but not worth the cost to upgrade. AMD likely cheated themselves with their cost-cutting measures.

My verdict: wait another year or so for AMD tech to improve, or you can pay up the nose for an Intel platform.
« Last Edit: January 10, 2012, 12:06:25 PM by stardust4ever » Logged
hobold
Fractal Bachius
*
Posts: 573


« Reply #7 on: January 11, 2012, 12:12:37 AM »

You can realistically hope for 10% or 20% higher performance if anybody ever bothers to hand tune the arithmetic subroutines. Bulldozer doesn't run well on code tuned for other processors models. But don't hold your breath. As long as Bulldozer is viewed as the inferior solution, no one is going to bother investing all that much effort.

On the other hand, I don't think you should be disappointed. I am not at all sure Sandy Bridge would do better for that kind of workload.
Logged
stardust4ever
Fractal Bachius
*
Posts: 513



« Reply #8 on: January 11, 2012, 09:37:48 AM »

I definitely believe it's worth the upgrade if you're running an old AMD Athlon 64 X2 or an Intel Core 2 Duo, but not if you're already using a Phenom II x4 955 processor and needed to upgrade both the MB and CPU because the 790GX chipset don't support bulldozer, like I did. A lot of the people at Tom's Hardware recommended the Phenom II X6 1090T as a cheaper alternative, which performed on par with the FX-8150 in many of the FPU benchmarks.

I did some more testing and I think I have figured out that the bottleneck is AMD's shared pipeline. Since the CPU has four pipelines and 8 integer processing cores, 2 cores share one pipeline. Through some experimenting, I have concluded that the cores are arranged like this: (0,4) (1,5) (2,6) (3,7) with (X,X) being a pipe.

I did some testing by changing the affinity in Windows Task Scheduler and limiting Fractal Extreme to four cores, and finally two.

With affinity set to four Cores, (0,1,2,3) was about 20% faster than (0,2,4,6) and (0,1,4,5). (0,2,4,6) and (0,1,4,5) were nearly identical.

Limiting myself to two cores, I shrunk my test render file to 200 pixels square and set affinity to cores (0,1), (0,2), and (0,4). (0,4) was about 20% slower than (0,1) and (0,2)

Further complicating the issue is that the threads tend to collide with each other when they are on the same pipe, but if the threads on different pipes share the same data, there is a higher latency when fetching said data, so threads that make a lot of data calls might actually perform better on the same pipe. That said, the performance gain, at least for integer calculations, is probably much higher per the additional cores, than Intel's hyperthreading, which creates an additional "virtual" core for every real one.

Because Windows 7 task scheduler is ignorant of the dual core pipelines or turbo core functionality, if you use a software app that only utilizes four threads, you may want to manually set the affinity to (0123) or (4567) so that each thread gets it's own dedicated pipe. Or you could also try setting even cores (0,2,4,6) or odd cores (1,3,5,7) so that the four threads are shared across two pipes, which would enable turbo functionality. Experimentation would reveal which setting is fastest, and it would probably vary per software.

If anyone has access to a Sandy Bridge 2600k and Fractal Extreme 64-bit, PM me so I can send over my Mandelbrot deep-zoom test file (728 zooms). I'd love to see how the FX-8150 stacks up to 2600K with just pure integer math (mostly repeated 64-bit multiplication). My test file got 2m48s at 400x400 pixels in FX v2.20 (on all 8 cores, OC @4.2Ghz). I think integer maths was really the one benchmark area in which the bulldozer really blew away the Sandy Bridge. grin
Logged
Pages: [1]   Go Down
  Print  
 
Jump to:  

Related Topics
Subject Started by Replies Views Last post
Deep Zooming in Ultra Fractal Movies Showcase (Rate My Movie) « 1 2 » fractalwizz 16 9582 Last post October 03, 2008, 07:42:32 PM
by fractalwizz
Deep 3D IFS zooming 3D Fractal Generation David Makin 0 2233 Last post July 28, 2009, 12:44:29 AM
by David Makin
A processor benchmark for Mandelbulb 3d Mandelbulb 3d « 1 2 3 » ant123 44 57236 Last post December 10, 2017, 09:44:52 PM
by AtomicNixon
Any fractal renderer suitable for interactive deep zooming? Fractal Programs laser blaster 2 5288 Last post August 14, 2013, 12:38:24 AM
by laser blaster
Deep zooming to interesting areas Help & Support « 1 2 » simon.snake 19 3972 Last post November 16, 2014, 10:32:09 PM
by Botond Kósa

Powered by MySQL Powered by PHP Powered by SMF 1.1.21 | SMF © 2015, Simple Machines

Valid XHTML 1.0! Valid CSS! Dilber MC Theme by HarzeM
Page created in 0.213 seconds with 24 queries. (Pretty URLs adds 0.009s, 2q)