Logo by AGUS - Contribute your own Logo!

END OF AN ERA, FRACTALFORUMS.COM IS CONTINUED ON FRACTALFORUMS.ORG

it was a great time but no longer maintainable by c.Kleinhuis contact him for any data retrieval,
thanks and see you perhaps in 10 years again

this forum will stay online for reference
News: Visit the official fractalforums.com Youtube Channel
 
*
Welcome, Guest. Please login or register. March 28, 2024, 01:57:01 PM


Login with username, password and session length


The All New FractalForums is now in Public Beta Testing! Visit FractalForums.org and check it out!


Pages: [1] 2 3   Go Down
  Print  
Share this topic on DiggShare this topic on FacebookShare this topic on GoogleShare this topic on RedditShare this topic on StumbleUponShare this topic on Twitter
Author Topic: New version of Fractalworks w full mulitprocessor support  (Read 11545 times)
0 Members and 1 Guest are viewing this topic.
Duncan C
Fractal Fanatic
****
Posts: 348



WWW
« on: March 17, 2007, 12:42:53 AM »

I just posted a new demo version of my FractalWorks fractal generating program. It runs native on PowerPC and Intel macs running OS 10.4 (or later)

This version now fully supports mulitprocessor machines. A dual processor machine renders about twice as fast as a single processor machine at the same clock speed.

This version also takes into account the symmetry of Mandelbrot and Julia sets, and only renders the unique parts of each plot.

Performance on Intel macs is amazing. I clocked it on my wife's Intel MacBook at over 211 million iterations/second.

You can download a demo version at http://homepage.mac.com/dmchampney1/FileSharing1.html. This version expires 4/31/07. There should be a new version available by then however.

There is a readme file in the archive that includes very rough directions that should be enough to get you started. Better documentation will be available at some future date.


Duncan C
Logged

Regards,

Duncan C
Duncan C
Fractal Fanatic
****
Posts: 348



WWW
« Reply #1 on: March 18, 2007, 05:51:45 PM »

This is a work in progress, so I'm very interested in comments, feedback, and suggestions.


Duncan C
Logged

Regards,

Duncan C
lycium
Fractal Supremo
*****
Posts: 1158



WWW
« Reply #2 on: March 18, 2007, 06:54:32 PM »

since you ask, i'll bite. by now there are an abundance of programs to generate julia and mandelbrot sets, why make another and then have it expire? next up, the speed is completely irrelevant (though still technically impressive to us programmers) since even an old processor can generate unviewably huge julia/mandelbrot images in seconds, using a completely naive, 10-seconds-to-program algorithm.

in short, what are your goals with this program?
Logged

Duncan C
Fractal Fanatic
****
Posts: 348



WWW
« Reply #3 on: March 19, 2007, 03:35:21 AM »

since you ask, i'll bite. by now there are an abundance of programs to generate julia and mandelbrot sets, why make another and then have it expire? next up, the speed is completely irrelevant (though still technically impressive to us programmers) since even an old processor can generate unviewably huge julia/mandelbrot images in seconds, using a completely naive, 10-seconds-to-program algorithm.

in short, what are your goals with this program?

My goals are:

1. Eye candy. While it's true that there are lots of programs to generate Mandelbrot and Julia sets, I don't find many of them visually appealing. I last worked with this some 20 years ago, and found that gettting aesthetically pleasing plots was tricky. I'm focusing on a flexible approach to color mapping in order to create appealing plots.

I am also planning to create animations of various sorts "Movies." Still images are just the beginning. I think there's plenty of room for unique approaches to the problem.

As far as performance, you say "...even an old processor can generate unviewably huge julia/mandelbrot images in seconds, using a completely naive, 10-seconds-to-program algorithm." That's news to me. Modern processors are much faster than the machines available back when I first tackled this, but a high magnifcation plot at a very high max iterations still makes even a high end computer sweat. My program supports up to 64k iterations, and allows you to specify a unique color for every iteration in a plot. Is there some magic to rendering these plots that I'm not aware of? As far as I can tell, doing a plot involving hundreds of millions, or even billions, of iterations at double precision still takes enough time that fast algorithms matter.

I've created high resolution plots with 40 million pixels, and those take a good long time to render, even on a fast machine.

2. I'm also pursuing this as a learning project. I let my Macintosh development skills go stale, and have a lot of catching up to do. There are lots of challenges in this project that I find interesting.



Duncan C
Logged

Regards,

Duncan C
ericbigas
Guest
« Reply #4 on: March 19, 2007, 06:57:26 AM »

Animation, you say?  Bring it on.  I'd love something new and fast for parameter-interpolated zoom animations on OSX.  Have you used Graham Anderson's Escape or Jesse Jones' Mandella?
Logged
lycium
Fractal Supremo
*****
Posts: 1158



WWW
« Reply #5 on: March 19, 2007, 11:11:04 AM »

I am also planning to create animations of various sorts "Movies." Still images are just the beginning. I think there's plenty of room for unique approaches to the problem.

fully agree with you. right now the only way to do real exploration (that i'm aware of) is to use ultrafractal or write your own app.

along those lines, i rendered some julia set animations a while back, and i think the results would be difficult to replicate with any of the apps out there today (they're in my fractographer.com/wip folder if you're interested but i'm not sure if quicktime will play that h.264 stream - they're encoded with x264). the problem is of course that it's all done from c++ code, and is not tweakable/saveable through a nice interface; while i firmly believe the one true way to fractal exploration is to code it yourself, compromises must be made for those who don't walk the programmer's path... the difficulty is maintaining flexibility while still being usable.

As far as performance, you say "...even an old processor can generate unviewably huge julia/mandelbrot images in seconds, using a completely naive, 10-seconds-to-program algorithm." That's news to me. Modern processors are much faster than the machines available back when I first tackled this, but a high magnifcation plot at a very high max iterations still makes even a high end computer sweat.

depends what you call sweat. with two 2.2ghz cores (amd x2, so weaker than you intel core 2 duo mac) i can produce this in some seconds: http://www.fractographer.com/propaganda/tlset.png

that has 144 supersamples per pixel, i don't remember the maximum iteration depth unfortunately. the iteration itself is done with my generic c++ complex number class, via operator overloading; i also wrote a very fast (double precision) sse-based iteration that would totally scream on your core 2 duo (since it's twice as fast doing sse than the athlon64), especially in 64bit mode... but the end result is that it's just not necessary (and specific to the mandelbrot/julia), and was done only for exercise.

My program supports up to 64k iterations, and allows you to specify a unique color for every iteration in a plot. Is there some magic to rendering these plots that I'm not aware of? As far as I can tell, doing a plot involving hundreds of millions, or even billions, of iterations at double precision still takes enough time that fast algorithms matter.

a somewhat indirect comparison from my own experiences: when ray tracing you need billions and billions of rays to generate images with global illumination; sampling a 2d brightness function for something simple like a julia is a walk in the park for any cpu since the pentium2, since that can do even realtime ray tracing and basic global illumination.

2. I'm also pursuing this as a learning project. I let my Macintosh development skills go stale, and have a lot of catching up to do. There are lots of challenges in this project that I find interesting.

that's a fine reason of course, and people seem to be interested in the animation support, so definitely it's not just for you!

the reason i asked is because, i'm sure you're aware, it's really trivial to render mandelbrots and julia sets and we've all been rendering them since at least the early 90s (interesting animations notwithstanding, but then again it is 2007!). in essence, i was hoping to see something really new... mutatorkammer for example takes a holistic approach to this by trying a huge variety (hmm perhaps too huge, but that's a different point) of random iteration formulae and variable types.
Logged

ericbigas
Guest
« Reply #6 on: March 19, 2007, 12:35:13 PM »

along those lines, i rendered some julia set animations a while back, and i think the results would be difficult to replicate with any of the apps out there today (they're in my fractographer.com/wip folder if you're interested but i'm not sure if quicktime will play that h.264 stream - they're encoded with x264).

Nope.  QuickTime won't even open them.  Apple made a huge deal out of H.264 being the next big thing but is doing a poor job of supporting it.  I'm already tired of having to avoid using b-frames in order to maintain QT compatibility.

They play in VLC, of course.  I don't know what's going on in the Julia2 anim, but it looks very cool.

the problem is of course that it's all done from c++ code, and is not tweakable/saveable through a nice interface; while i firmly believe the one true way to fractal exploration is to code it yourself, compromises must be made for those who don't walk the programmer's path

* meekly raises hand *

I can code web apps but not C++ or the like.  I understand what you mean, though.  When you're trying to get a specific task done as well as possible, good custom code smokes general-purpose apps (or libraries).  But that's why the rest of us rely on you 1337 programmers.  If you share your know-how and tools, we'll plug away with them and create some fairly cool stuff, even if it's not cutting-edge.
Logged
lycium
Fractal Supremo
*****
Posts: 1158



WWW
« Reply #7 on: March 19, 2007, 02:12:41 PM »

Nope.  QuickTime won't even open them.  Apple made a huge deal out of H.264 being the next big thing but is doing a poor job of supporting it.  I'm already tired of having to avoid using b-frames in order to maintain QT compatibility.

i hear that a lot from apple folks, it's a shame :|

They play in VLC, of course.  I don't know what's going on in the Julia2 anim, but it looks very cool.

thanks, unfortunately it's incomplete though - had to go to airport before the frames finished rendering!

I can code web apps but not C++ or the like.  I understand what you mean, though.  When you're trying to get a specific task done as well as possible, good custom code smokes general-purpose apps (or libraries).  But that's why the rest of us rely on you 1337 programmers.  If you share your know-how and tools, we'll plug away with them and create some fairly cool stuff, even if it's not cutting-edge.

careful who you call l33t, the truly l33t ones are easily offended and i would prefer not to incur their wrath...

as for making good tools, i've mentioned elsewhere on the forums that my partner in crime and i will be starting a commercial project soon; that ought to be flexible (read: scriptable) and have really good 3d rendering, so it should be quite a good foothold for exploration.
Logged

Duncan C
Fractal Fanatic
****
Posts: 348



WWW
« Reply #8 on: March 20, 2007, 12:28:22 AM »

I am also planning to create animations of various sorts "Movies." Still images are just the beginning. I think there's plenty of room for unique approaches to the problem.

fully agree with you. right now the only way to do real exploration (that i'm aware of) is to use ultrafractal or write your own app.

Sounds like ultrafractal is the 700 lb gorrila of the fractal rendering programs. Is it expensive?
along those lines, i rendered some julia set animations a while back, and i think the results would be difficult to replicate with any of the apps out there today (they're in my fractographer.com/wip folder if you're interested but i'm not sure if quicktime will play that h.264 stream - they're encoded with x264). the problem is of course that it's all done from c++ code, and is not tweakable/saveable through a nice interface; while i firmly believe the one true way to fractal exploration is to code it yourself, compromises must be made for those who don't walk the programmer's path... the difficulty is maintaining flexibility while still being usable.

Those animations are cool. Thanks for showing them to me.

As far as performance, you say "...even an old processor can generate unviewably huge julia/mandelbrot images in seconds, using a completely naive, 10-seconds-to-program algorithm." That's news to me. Modern processors are much faster than the machines available back when I first tackled this, but a high magnifcation plot at a very high max iterations still makes even a high end computer sweat.

depends what you call sweat. with two 2.2ghz cores (amd x2, so weaker than you intel core 2 duo mac) i can produce this in some seconds: http://www.fractographer.com/propaganda/tlset.png
I'm still limping along with a 1.25 gHz PowerPC G4. I'm several generations behind the Core 2 duo. My wife has a Core duo (not Core 2 duo) MacBook. That's what I did my Intel timing on.

that has 144 supersamples per pixel, i don't remember the maximum iteration depth unfortunately. the iteration itself is done with my generic c++ complex number class, via operator overloading; i also wrote a very fast (double precision) sse-based iteration that would totally scream on your core 2 duo (since it's twice as fast doing sse than the athlon64), especially in 64bit mode... but the end result is that it's just not necessary (and specific to the mandelbrot/julia), and was done only for exercise.
I had never heard of SSE until your post. It sounds very much like Altavec for the PowerPC G4. I wonder if Apple's compiler generates SSE instructions for their Intel machines?

I've been toying with the idea of writing Altavec-specific code for G4 based Macs, but that would be a fair amount of work for an older processor.

Back 20 years ago, I wrote the floating point code for my program in machine code for the FPU. I'm not really inclined to get that down and dirty any more. As you say, it's not really necessary any more.


My program supports up to 64k iterations, and allows you to specify a unique color for every iteration in a plot. Is there some magic to rendering these plots that I'm not aware of? As far as I can tell, doing a plot involving hundreds of millions, or even billions, of iterations at double precision still takes enough time that fast algorithms matter.

a somewhat indirect comparison from my own experiences: when ray tracing you need billions and billions of rays to generate images with global illumination; sampling a 2d brightness function for something simple like a julia is a walk in the park for any cpu since the pentium2, since that can do even realtime ray tracing and basic global illumination.

2. I'm also pursuing this as a learning project. I let my Macintosh development skills go stale, and have a lot of catching up to do. There are lots of challenges in this project that I find interesting.

that's a fine reason of course, and people seem to be interested in the animation support, so definitely it's not just for you!

the reason i asked is because, i'm sure you're aware, it's really trivial to render mandelbrots and julia sets and we've all been rendering them since at least the early 90s (interesting animations notwithstanding, but then again it is 2007!). in essence, i was hoping to see something really new... mutatorkammer for example takes a holistic approach to this by trying a huge variety (hmm perhaps too huge, but that's a different point) of random iteration formulae and variable types.

I'm still pretty early in my development, and nowhere near feature complete. I'm definitely planning to generate movies. Right now, my program will animate changes to the color tables, and that looks quite cool. If you have access to a Mac, it's worth downloading my application and trying the color table animation tutorial I included. You can even share a color definition ("color scheme" between several plots, and changes to the color scheme are applied in real time to all the windows.


Duncan C
Logged

Regards,

Duncan C
lycium
Fractal Supremo
*****
Posts: 1158



WWW
« Reply #9 on: March 20, 2007, 01:18:57 PM »

Sounds like ultrafractal is the 700 lb gorrila of the fractal rendering programs. Is it expensive?

depends what you call expensive, but they certainly charge enough to tempt my friend aexion and i into joining the market!

I'm still limping along with a 1.25 gHz PowerPC G4. I'm several generations behind the Core 2 duo. My wife has a Core duo (not Core 2 duo) MacBook. That's what I did my Intel timing on.

the core duo chip is still a bit faster (and certainly more efficient) than the x2 afaik, but the core 2 duo completely smashes it when it comes to most number crunching apps (128bit sse instructions take only a single cycle, versus 2 on core duo and athlon64). amd's upcoming quadcore chip will do two 128bit sse instructions per cycle, but that's another matter...

I had never heard of SSE until your post. It sounds very much like Altavec for the PowerPC G4. I wonder if Apple's compiler generates SSE instructions for their Intel machines?

indeed sse (and sse2, sse3, and sse4 on core 2 duo) is a lot like altivec in that they are simd extensions for coherent/streaming execution. as for "apple's" compiler, they now use gcc, which version i'm not sure, but i do know that the latest gcc is much better than microsoft visual c++ in producing sse code, particularly in 64bit mode. that's because the register allocator in gcc is apparently poor compared to intel and microsoft's compilers, and with amd64 you have a lot more registers to exploit.

i totally love 64bit computing smiley it's incredibly useful for specific things like random number generation, and in general helps to plough through complex code where juggling just a few general purpose registers with the stack introduces a ton overhead.

I've been toying with the idea of writing Altavec-specific code for G4 based Macs, but that would be a fair amount of work for an older processor.

definitely don't bother unless you're feeling nostalgic (see my next point).

Back 20 years ago, I wrote the floating point code for my program in machine code for the FPU. I'm not really inclined to get that down and dirty any more. As you say, it's not really necessary any more.

i had a long (and excruciating) discussion with another member of the forums about that (in this thread: http://www.fractalforums.com/index.php?topic=132.0). i too was an asm optimisation junkie in my earlier years, but these days cpus are so complex that a compiler will almost always do better - a sentiment that is echoed by very, very knowledgable low-level programmers.

the One True Way to fast code these days (that's a mild contradiction, i know wink) is:

1. plan your attack, organise your program's execution (and only the critical parts) into wider runs
2. test it with plain c/c++
3. write an additional optimised path, using the intel supplied intrinsics for the instructions you wish to use (you should have a plain c/c++ fallback path anyway)

this is the One True Way because:

1. it requires minimal effort from the programmer
2. that super-intelligent compiler knows exactly what you'd like to do, but is free to schedule the instructions optimally according to which processor architecture you're compiling for (p4 is wildly different from core duo and athlon64, for example)
3. with just a recompile you can take advantage of improved compilers and/or the enhanced capabilities of future architectures (for example the extra registers 64bit mode offers)
4. it requires minimal effort from the programmer wink

regarding points 1 and 4, a programming proverb might go "a programmer that is not lazy invariably ends up wasting their time." wink

anyway, that's enough advocacy for now. unfortunately i don't have access to a mac (and as a simple programmer am not nearly trendy enough to use one anyway wink) so i can't try out your program, but if you paste code we can talk speed for sure cheesy

edit: i just realised that i dropped some rather vague terms like "coherent" and "plan your attack"; to clarify, therein lies the difficulty of doing good simd optimisation: you should have low branching complexity (high coherence), since your intructions will be executed on all the data elements and obviously you don't want to be too wasteful there. a certain degree of redundancy is fine and desirable if it reduces code complexity, since it takes just as long to do 4 adds as it does 1, but the code you want to simd-ify should have "nice" execution properties, and unless they are really nice you have to take a trip back to the 80s and use execution masks and so on to unify your computation. sorry if that's a poor explanantion, but i needed to clarify that there are typically complications you have to deal with, and it's not always as simple as i made it out to be above smiley
« Last Edit: March 20, 2007, 01:36:44 PM by lycium » Logged

Duncan C
Fractal Fanatic
****
Posts: 348



WWW
« Reply #10 on: March 23, 2007, 02:31:42 AM »

...was an asm optimisation junkie in my earlier years, but these days cpus are so complex that a compiler will almost always do better - a sentiment that is echoed by very, very knowledgable low-level programmers.

the One True Way to fast code these days (that's a mild contradiction, i know wink) is:

1. plan your attack, organise your program's execution (and only the critical parts) into wider runs
2. test it with plain c/c++
3. write an additional optimised path, using the intel supplied intrinsics for the instructions you wish to use (you should have a plain c/c++ fallback path anyway)

I'd really prefer not writing hardware-specific optimizations if I can help it. I'd like my code to run on PowerPC G4 and G5 Macs as well as the newer intel-based models.

I need to look into library support for vector functions, or perhaps complex math functions. My HOPE is that the complex math libraries are written to run optimized code on the target platform. I found some trig functions that operate on complex numbers, but not simple functions like addition and multiplication. (I don't claim to understand how you take the cosine of a complex number)

edit: i just realised that i dropped some rather vague terms like "coherent" and "plan your attack"; to clarify, therein lies the difficulty of doing good simd optimisation: you should have low branching complexity (high coherence), since your intructions will be executed on all the data elements and obviously you don't want to be too wasteful there. a certain degree of redundancy is fine and desirable if it reduces code complexity, since it takes just as long to do 4 adds as it does 1, but the code you want to simd-ify should have "nice" execution properties, and unless they are really nice you have to take a trip back to the 80s and use execution masks and so on to unify your computation. sorry if that's a poor explanantion, but i needed to clarify that there are typically complications you have to deal with, and it's not always as simple as i made it out to be above smiley

Right now, the code I want to optimize is the "iterate a point" routine. That's where my program spends the VAST majority of it's time, and it's pretty simple. It does the Zn+1 = Zn^2 + c iteration, checks for escape, and does it again. Right now I've unrolled the complex math into floating point functions and written it so I re-use the Z^2 value for both the iteration and checking for escape. If I could rewrite the code to use a complex math library, it might execute much faster.


Duncan
Logged

Regards,

Duncan C
Nahee_Enterprises
World Renowned
Fractal Senior
******
Posts: 2250


use email to contact


nahee_enterprises Nahee.Enterprises NaheeEnterprise
WWW
« Reply #11 on: March 23, 2007, 06:30:57 AM »

Duncan C. wrote:
>
>    I don't claim to understand how you take the cosine of a complex number

I believe you could create a function called "ccos" that would return the cosine of a complex number represented by "x + iy" .   Where you would supply it two parameter values, such as:  ccos(x,y) .

Where "ccos(x,y)" is defined as:   cos (x + iy)   which is equal to:   (cos x cosh y- isin x sinh y) .



Logged

gandreas
Explorer
****
Posts: 53



WWW
« Reply #12 on: March 23, 2007, 03:25:17 PM »

I'd really prefer not writing hardware-specific optimizations if I can help it. I'd like my code to run on PowerPC G4 and G5 Macs as well as the newer intel-based models.
If you want speed on an OS X based machine, you'll need to do two things
1) Use Altivec/SSE code where possible (which means you'll have different code for G4/G5 and x86)
2) Use multi-threading.

Properly done, these can easily accelerate your code by a factor of 8 or more.  Note that it's hard to get really good speed improvements on SSE as easily as you can with AltiVec, due to the limited number of vector registers (until you go to 64-bit x86 code, which you won't be able to reasonably do until Leopard ships), but depending on the image, it's not that hard to get 3x+ due to AltiVec (quadrium often gets 3.5 times the performance with AltiVec enabled, but only about 2x for SSE)

Threading is also a must - basically split the image into parts, and have each part calculated by a different thread.  It's not all that hard to add in once you've got the basics of threading (and this is a fairly simple threading approach).  You pretty much get N times the performance then (where N is the number of cores your machine has).


Quote
I need to look into library support for vector functions, or perhaps complex math functions. My HOPE is that the complex math libraries are written to run optimized code on the target platform. I found some trig functions that operate on complex numbers, but not simple functions like addition and multiplication. (I don't claim to understand how you take the cosine of a complex number)

gcc offers two different complex number packages - one from the C99 (?) complex data type (which offers an intrinsic complex data type that allows to compiler to do things like basic operations), and another that involves the <complex> C++ template based library.  Neither are accelerated to support AltiVec/PPC (but due to that way that C++ works, you can extend that one via template specialization, but it's a fair amount of work).

There is also the "Accelerate.framework" that Apple provides, which provides a full range of basic AltiVec/SSE based numerics (the usual assortment of transcendental and trigonometric functions), as well as routines to apply simple operations to large arrays of numbers (which are highly tuned to the specific machine, unfortunately these aren't of much use for fractal generation).

There is also macstl, a derivative of STL that includes support for AltiVec and SSE based operation, which might be of use to you (though the last time I checked, it was still missing a few features I needed).

Logged
lycium
Fractal Supremo
*****
Posts: 1158



WWW
« Reply #13 on: March 23, 2007, 04:03:32 PM »

Properly done, these can easily accelerate your code by a factor of 8 or more.  Note that it's hard to get really good speed improvements on SSE as easily as you can with AltiVec, due to the limited number of vector registers (until you go to 64-bit x86 code, which you won't be able to reasonably do until Leopard ships), but depending on the image, it's not that hard to get 3x+ due to AltiVec (quadrium often gets 3.5 times the performance with AltiVec enabled, but only about 2x for SSE)

i got nearly a clean 4x speedup from using sse, and that's on the simple mandelbrot/julia - not something more compute-limited like quadrium; what gives?

Threading is also a must - basically split the image into parts, and have each part calculated by a different thread.  It's not all that hard to add in once you've got the basics of threading (and this is a fairly simple threading approach).  You pretty much get N times the performance then (where N is the number of cores your machine has).

that's of course assuming you have an equal workload for all pixels, otherwise you'll have threads twiddling thumbs after a while. to be fair you did mention that uniform subdivision is a simple approach, so this is obviously @ duncan and not at you.

gcc offers two different complex number packages - one from the C99 (?) complex data type (which offers an intrinsic complex data type that allows to compiler to do things like basic operations), and another that involves the <complex> C++ template based library.  Neither are accelerated to support AltiVec/PPC (but due to that way that C++ works, you can extend that one via template specialization, but it's a fair amount of work).

i'm pretty sure core 2 duo has specific instructions for accelerating complex number computations (finally!), so it's reasonable to assume that a recent gcc will spit those instructions if compiled with the right march-flag (microarchitecture). of course that doesn't help our friend duncan, but i'd be interested to know if that's indeed the case.

finally, there are ways to speed up general fractal computations without resorting to special cases for various fractal types, but in the end your time will be more rewarded by just biting the bullet and going for the big speedup. put another way, if you ever want to catch up with quadrium, you'll have to get your hands dirty wink
Logged

Duncan C
Fractal Fanatic
****
Posts: 348



WWW
« Reply #14 on: March 23, 2007, 05:21:14 PM »

I'd really prefer not writing hardware-specific optimizations if I can help it. I'd like my code to run on PowerPC G4 and G5 Macs as well as the newer intel-based models.
If you want speed on an OS X based machine, you'll need to do two things
1) Use Altivec/SSE code where possible (which means you'll have different code for G4/G5 and x86)
2) Use multi-threading.


Properly done, these can easily accelerate your code by a factor of 8 or more.  Note that it's hard to get really good speed improvements on SSE as easily as you can with AltiVec, due to the limited number of vector registers (until you go to 64-bit x86 code, which you won't be able to reasonably do until Leopard ships), but depending on the image, it's not that hard to get 3x+ due to AltiVec (quadrium often gets 3.5 times the performance with AltiVec enabled, but only about 2x for SSE)
I have multi-threading implemented already. My code detects multiprocessor machines  breaks the plot into pieces ,  and keeps a compute thread running on each processor. This speeds things up nicely on multiprocessor machines.

I have also written logic that detects the symmetric parts of Mandelbrot and Julia sets, and only renders the unique parts. It copies and flips the symmetric parts (Mandelbrot sets have line symmetry across the x axis, and Julia sets are point symmetric across 0,0.)


Threading is also a must - basically split the image into parts, and have each part calculated by a different thread.  It's not all that hard to add in once you've got the basics of threading (and this is a fairly simple threading approach).  You pretty much get N times the performance then (where N is the number of cores your machine has).
Quote
I need to look into library support for vector functions, or perhaps complex math functions. My HOPE is that the complex math libraries are written to run optimized code on the target platform. I found some trig functions that operate on complex numbers, but not simple functions like addition and multiplication. (I don't claim to understand how you take the cosine of a complex number)
gcc offers two different complex number packages - one from the C99 (?) complex data type (which offers an intrinsic complex data type that allows to compiler to do things like basic operations), and another that involves the <complex> C++ template based library.  Neither are accelerated to support AltiVec/PPC (but due to that way that C++ works, you can extend that one via template specialization, but it's a fair amount of work).

There is also the "Accelerate.framework" that Apple provides, which provides a full range of basic AltiVec/SSE based numerics (the usual assortment of transcendental and trigonometric functions), as well as routines to apply simple operations to large arrays of numbers (which are highly tuned to the specific machine, unfortunately these aren't of much use for fractal generation).

There is also macstl, a derivative of STL that includes support for AltiVec and SSE based operation, which might be of use to you (though the last time I checked, it was still missing a few features I needed).

Can you point me in the direction of some resources on Altavec/SSE optimization? I haven't tackled either before. Also, isn't the optimization different for G4 vs G5 and Intel Core duo vs. Core 2 duo (SSE vs SSE2)?

It seems to me I'd get decent performance benifits by teaching my code to do simultaneous operations for complex add and multiply operations. (since Z^2 works out to (a+bi)^2, or a^2 +2bi - b^2. The a^, 2bi, and b^2 could all be performed at once. Adding two complex numbers, I should also be able to use Altevec/SSE to add the real and imaginary components simultaneously.)

Is it true that PPC G4 altavec is single precision floating point only? That's a deal-killer for me. My app is based on double precison. Single precision falls apart too fast for high "magnification" zooms. Double precision seems to fall apart at around 10^-14.


Duncan C
Logged

Regards,

Duncan C
Pages: [1] 2 3   Go Down
  Print  
 
Jump to:  

Related Topics
Subject Started by Replies Views Last post
7th-power 3d mandelbrot render (full 3840x2400 version) 3D Fractal Generation lycium 1 2253 Last post October 15, 2009, 09:10:52 AM
by twinbee
Commercial version of FractalWorks released Announcements & News Duncan C 5 2600 Last post January 15, 2011, 05:30:39 PM
by Sockratease
Genesis Planet #2 Animated in full 1920x1080! Movies Showcase (Rate My Movie) reallybigname 5 1918 Last post April 10, 2011, 10:56:35 AM
by reallybigname
Mandelbulber 1.09 - experimental OpenCL support Releases Buddhi 9 8061 Last post December 21, 2011, 09:15:18 PM
by Loadus
Structural Support Images Showcase (Rate My Fractal) Pauldelbrot 0 1063 Last post March 15, 2012, 07:28:10 PM
by Pauldelbrot

Powered by MySQL Powered by PHP Powered by SMF 1.1.21 | SMF © 2015, Simple Machines

Valid XHTML 1.0! Valid CSS! Dilber MC Theme by HarzeM
Page created in 0.336 seconds with 25 queries. (Pretty URLs adds 0.015s, 2q)