Title: x86 assembly optimized code for fractals needed ? Post by: Kuemmel on November 15, 2009, 03:41:20 PM Hi guys,
I just recently found this forum. Over some time now, as being interested in assembler coding and fractals, I developed my own little Mandelbrot benchmark, to see how the new CPU architectures can be used efficiently, what leads to surprising results and speed ups. You can find my code here (all based on double precision SSE2/FPU variants): http://www.mikusite.de/pages/x86.htm (http://www.mikusite.de/pages/x86.htm) Looking around here I just wondered what would be fun and more or less usefull to code in assembler next. So in other ways, what would be really in the need of a speed-up, something like the new 3D-algorithms, or is the lack of speed there more the rendering and not the iterations ? Title: Re: x86 assembly optimized code for fractals needed ? Post by: David Makin on November 15, 2009, 05:35:54 PM Hi guys, I just recently found this forum. Over some time now, as being interested in assembler coding and fractals, I developed my own little Mandelbrot benchmark, to see how the new CPU architectures can be used efficiently, what leads to surprising results and speed ups. You can find my code here (all based on double precision SSE2/FPU variants): http://www.mikusite.de/pages/x86.htm (http://www.mikusite.de/pages/x86.htm) Looking around here I just wondered what would be fun and more or less usefull to code in assembler next. So in other ways, what would be really in the need of a speed-up, something like the new 3D-algorithms, or is the lack of speed there more the rendering and not the iterations ? Rendering the 3D fractals has 2 main routes to optimisation - improving the ray-stepping algorithm so that fewer steps are required to find the solid boundary and improving the efficiency of the main iteration loop (as you say). The first can only really be improved if there's a breakthrough in the maths theory but the second can certainly be improved by using optimised assembly code. Personally I am no longer interested in assembly code until such time that I can program for a Mac instead of Windows. Title: Re: x86 assembly optimized code for fractals needed ? Post by: cKleinhuis on November 15, 2009, 06:28:48 PM in my opinion pure single core assembler code is no more needed nowadays, today development focusses on parallel graphics card (gpu) hardware, gpu optimized algorithms for those areas are still under development, but many different approaches exists for those purposes already
:police: Title: Re: x86 assembly optimized code for fractals needed ? Post by: David Makin on November 15, 2009, 09:37:44 PM in my opinion pure single core assembler code is no more needed nowadays, today development focusses on parallel graphics card (gpu) hardware, gpu optimized algorithms for those areas are still under development, but many different approaches exists for those purposes already :police: I disagree if the "pure single core assembler" is written so it will multi-thread on say a dual Quadcore Nehalem system :) Title: Re: x86 assembly optimized code for fractals needed ? Post by: lycium on November 15, 2009, 10:28:18 PM yup, writing assembly is basically a waste of time; will you write it for 64bit computers, or will you write it for 32bit computers? how about both?
if you use intel's sse intrinsic instructions, the compiler does the grunt work of combinatorial instruction scheduling and register allocation. Title: Re: x86 assembly optimized code for fractals needed ? Post by: David Makin on November 15, 2009, 10:54:23 PM yup, writing assembly is basically a waste of time; will you write it for 64bit computers, or will you write it for 32bit computers? how about both? if you use intel's sse intrinsic instructions, the compiler does the grunt work of combinatorial instruction scheduling and register allocation. SSE ? I didn't think that was accurate enough, guess it's a while since I read a processor manual :) Title: Re: x86 assembly optimized code for fractals needed ? Post by: lycium on November 15, 2009, 11:00:25 PM x87 is dead since years now; actually in 64bit (AMD64 architecture) it's not supported at all, everything is done via (scalar) sse, and that's a great thing since that horrible horrible stack-based register architecture needed to die a long time ago!
Title: Re: x86 assembly optimized code for fractals needed ? Post by: David Makin on November 15, 2009, 11:08:11 PM yup, writing assembly is basically a waste of time; will you write it for 64bit computers, or will you write it for 32bit computers? how about both? if you use intel's sse intrinsic instructions, the compiler does the grunt work of combinatorial instruction scheduling and register allocation. SSE ? I didn't think that was accurate enough, guess it's a while since I read a processor manual :) I meant last time I looked SSE was just float not double :) Title: Re: x86 assembly optimized code for fractals needed ? Post by: Kuemmel on November 16, 2009, 12:00:56 AM Rendering the 3D fractals has 2 main routes to optimisation - improving the ray-stepping algorithm so that fewer steps are required to find the solid boundary and improving the efficiency of the main iteration loop (as you say). The first can only really be improved if there's a breakthrough in the maths theory but the second can certainly be improved by using optimised assembly code. Hi David,Personally I am no longer interested in assembly code until such time that I can program for a Mac instead of Windows. okay, I see, so the iterations seems to be interesting, can you send me any C-code for the iteration loop of a nice 3D fractal formula so that I can use it as a base and play with that ? Regarding the others, of course it's may be sometimes a waste of time, but I still think it's kind of factor 2 or even more faster than some C-code. I already support multi core up to 32 cores in my benchmark. So one could say on an i7 quad core with hyper threading it's 8 cores * SSE2 (2 double's) makes about a parallelism of 16. Isn't it still the problem of GPU's that they only support single precision or some kind of not precise double thing ? For fun I also coded my benchmark with x87 FPU (a real pain in the ass with the stack), a lot slower on modern CPU's compared to SSE2 of course...I just wondered if the extended precision could be of some use for fractals, but I guess it's not a big deal. For the moment I stick to 32bit coding, as 64bit OS seems not to be that wide spread. Depending on the algorithm 64bit mainly helps with the double amount of registers (16 SSE2 double precision registers), what can be quite helpful, as it means kind of less access to memory needed. Title: Re: x86 assembly optimized code for fractals needed ? Post by: cKleinhuis on November 16, 2009, 12:38:43 AM precision actually IS the DEAL for fractal calculation :)
it is all about precision, with floating point you can achieve zoom levels of a factor 100.000 which is quite nice, but on zoom factors like nhowadays fractal programs implement own floating point algorithms to achieve that high precision :dink: :angel1: Title: Re: x86 assembly optimized code for fractals needed ? Post by: David Makin on November 16, 2009, 12:56:24 AM To me for Fine Art quality images then double precision is a minimum but for animations float is OK, so at the moment I'd only be interested in GPU for animation - or for preview renders while setting up.
The new GPU thingy from Intel that Thomas keeps mentioning sounds more interesting :) Title: Re: x86 assembly optimized code for fractals needed ? Post by: lycium on November 16, 2009, 01:01:49 AM i don't "keep" mentioning it, do i? :P
anyway, you guys can stick to your cpu asm coding if it makes you happy ;) Title: Re: x86 assembly optimized code for fractals needed ? Post by: Zom-B on November 16, 2009, 02:22:05 PM precision actually IS the DEAL for fractal calculation :) Indeed.it is all about precision, with floating point you can achieve zoom levels of a factor 100.000 which is quite nice, but on zoom factors like <Quoted Image Removed> you can wonder about the decimal length calculatable nhowadays fractal programs implement own floating point algorithms to achieve that high precision :dink: :angel1: With 32-bit floating point one can only reach depths of In my own deep-zoom programs, I made an implementation of 128-bit double-double-precision emulation (not to be confused with 128-bit quad-precision), and now I can reach zooms of up to One can also implement 160-bit double-extended-precision, 256-bit quad-double-precision, or even higher orders of double-precision extensions, but the math involved gets exponentially more complex. If anyone wants my library (in Java), just give me a sign and I'll put it up somewhere. It contains all the usual arithmetic, polynomial, root, power, logarithmic and unary functions (not in the complex plane). I also have templates of how to convert these into complex numbers, including the trigonometric and hyperbolic functions. Title: Re: x86 assembly optimized code for fractals needed ? Post by: David Makin on November 16, 2009, 03:04:33 PM In my own deep-zoom programs, I made an implementation of 128-bit double-double-precision emulation (not to be confused with 128-bit quad-precision), and now I can reach zooms of up to <Quoted Image Removed> with reasonable speed. I would say it is about 6 times slower than normal double-precision math, but it is still a lot faster than arbitrary-precision as in Ultra Fractal. One can also implement 160-bit double-extended-precision, 256-bit quad-double-precision, or even higher orders of double-precision extensions, but the math involved gets exponentially more complex. If anyone wants my library (in Java), just give me a sign and I'll put it up somewhere. Yes please, though I may not use it immediately, I would like to convert it to Objective C (and maybe C++). Title: Re: x86 assembly optimized code for fractals needed ? Post by: Duncan C on November 17, 2009, 03:23:57 AM precision actually IS the DEAL for fractal calculation :) Indeed.it is all about precision, with floating point you can achieve zoom levels of a factor 100.000 which is quite nice, but on zoom factors like <Quoted Image Removed> you can wonder about the decimal length calculatable nhowadays fractal programs implement own floating point algorithms to achieve that high precision :dink: :angel1: With 32-bit floating point one can only reach depths of In my own deep-zoom programs, I made an implementation of 128-bit double-double-precision emulation (not to be confused with 128-bit quad-precision), and now I can reach zooms of up to One can also implement 160-bit double-extended-precision, 256-bit quad-double-precision, or even higher orders of double-precision extensions, but the math involved gets exponentially more complex. If anyone wants my library (in Java), just give me a sign and I'll put it up somewhere. It contains all the usual arithmetic, polynomial, root, power, logarithmic and unary functions (not in the complex plane). I also have templates of how to convert these into complex numbers, including the trigonometric and hyperbolic functions. I'd also be interested in seeing that library. I'd have to convert it to C, but that shouldn't e that hard... Duncan Title: Re: x86 assembly optimized code for fractals needed ? Post by: Zom-B on November 17, 2009, 01:32:04 PM see http://www.fractalforums.com/programming/(java)-double-double-library-for-128-bit-precision/ (http://www.fractalforums.com/programming/(java)-double-double-library-for-128-bit-precision/)
Title: Re: x86 assembly optimized code for fractals needed ? Post by: ker2x on March 07, 2010, 10:05:33 AM if you disassemble code, you'll see that x87 (FPU) is still heavily used :'(
Also, you can do 64 bits aritmetic using SSE. SSE registers are 128bits, using ADDPD or ADDPS (just an exemple) you can add 4x32bits (ADDPS) or 2x64bits (ADDPD) packed scalar. The compiler can vectorize some obvious operation, but sometimes it's usefull to do assembly or use intresics (which is just ASM with a C syntax) I'm learning fortran, the optimization made by the fortran compilers are much more aggressive and smarter than any C compiler i ever tested, and can do much more auto-vectorization and auto-parallelisation (yes, multithreading without a single additional instruction, impressive) Title: Re: x86 assembly optimized code for fractals needed ? Post by: johandebock on March 08, 2010, 01:39:19 PM Nice link.
I'm investigating the usefullness of assembly optimized code for my multi-threaded Buddhabrot renderer. I'm still looking for an easy setup to work with assembly code. Ideally as simple as changing the assembly code generated by a compiler and using that file to create the executable. Title: Re: x86 assembly optimized code for fractals needed ? Post by: ker2x on March 08, 2010, 06:04:23 PM when i'm planning to write asm, i first write the application in purebasic, which generate a nicely commented asm file (FASM), i patch it, and i tell purebasic to recompile my patched fasm code into an standalone application.
it also support inline asm. If you're already coding you app in C/C++ i suggest to use all the SSE* intresic i use http://siyobik.info/index.php?module=x86 and http://softpixel.com/~cwright/programming/simd/mmx.php as a quick reference for asm instruction. (and of course intel documentation for the complete reference) Title: Re: x86 assembly optimized code for fractals needed ? Post by: Duncan C on March 08, 2010, 06:13:46 PM precision actually IS the DEAL for fractal calculation :) Indeed.it is all about precision, with floating point you can achieve zoom levels of a factor 100.000 which is quite nice, but on zoom factors like <Quoted Image Removed> you can wonder about the decimal length calculatable nhowadays fractal programs implement own floating point algorithms to achieve that high precision :dink: :angel1: With 32-bit floating point one can only reach depths of In my own deep-zoom programs, I made an implementation of 128-bit double-double-precision emulation (not to be confused with 128-bit quad-precision), and now I can reach zooms of up to One can also implement 160-bit double-extended-precision, 256-bit quad-double-precision, or even higher orders of double-precision extensions, but the math involved gets exponentially more complex. If anyone wants my library (in Java), just give me a sign and I'll put it up somewhere. It contains all the usual arithmetic, polynomial, root, power, logarithmic and unary functions (not in the complex plane). I also have templates of how to convert these into complex numbers, including the trigonometric and hyperbolic functions. Indeed, double double math would be interesting. Title: Re: x86 assembly optimized code for fractals needed ? Post by: Duncan C on March 08, 2010, 06:17:36 PM In my own deep-zoom programs, I made an implementation of 128-bit double-double-precision emulation (not to be confused with 128-bit quad-precision), and now I can reach zooms of up to <Quoted Image Removed> with reasonable speed. I would say it is about 6 times slower than normal double-precision math, but it is still a lot faster than arbitrary-precision as in Ultra Fractal. One can also implement 160-bit double-extended-precision, 256-bit quad-double-precision, or even higher orders of double-precision extensions, but the math involved gets exponentially more complex. If anyone wants my library (in Java), just give me a sign and I'll put it up somewhere. Yes please, though I may not use it immediately, I would like to convert it to Objective C (and maybe C++). David, You're a fellow Mac developer? I didn't realize that. Did you know that XCODE includes native support for "long double" (128 bit quad). Based on limited benchmarking, it only appears to be about 1/2 as fast as double (64 bit) floating point. I would expect it to be 1/4 as fast. Duncan C Title: Re: x86 assembly optimized code for fractals needed ? Post by: hobold on March 08, 2010, 08:16:44 PM If you want fastest possible speed, then you are probably better off not using ObjectiveC. The object model and the method binding of this language was never meant to be used for computational workloads. ObjC does have its strengths, but low overhead is not one of them.
C++ does have a different object model that was specifically designed for low overhead. But even here the overhead is greater than zero in some relevant cases. If at all possible, try to stick with plain old C for lowest level of compute intensive stuff. That way, you can later use it from either C++ or ObjC. Title: Re: x86 assembly optimized code for fractals needed ? Post by: Duncan C on March 08, 2010, 08:51:43 PM If you want fastest possible speed, then you are probably better off not using ObjectiveC. The object model and the method binding of this language was never meant to be used for computational workloads. ObjC does have its strengths, but low overhead is not one of them. C++ does have a different object model that was specifically designed for low overhead. But even here the overhead is greater than zero in some relevant cases. If at all possible, try to stick with plain old C for lowest level of compute intensive stuff. That way, you can later use it from either C++ or ObjC. I would put that more generally. If you want the fastest compute code, use vanilla C, period. Don't use ANY object-oriented language. However, that does not mean your app can't be written in an object-oriented language. You just have to draw a circle around your compute-intensive code, and write that in carefully optimized vanilla C. I am willing to put my app, FractalWorks, against any other Mandelbrot/Julia renderer out there for raw speed. It is highly optimized; It takes symmetry into account, it uses boundary following to identify contiguous "blobs" of pixels with the same iteration value, and only computes the perimeter of those blobs; it is multi-threaded and will spawn an arbitrary number of worker threads based on the number of cores available, etc. It uses Objective C for most of the above, but the inner code to iterate a strip of pixels is pure C. I question whether or not it's still practical to create hand-written assembler for such things. These days there is a great deal going on that isn't well documented or understood: Predictive execution, pipelining, multi-level caches, parallel execution paths, etc, etc. Modern optimizing compilers are very sophisticated, and create very well tuned code. If you really, really know what you are doing, and spend months writing the eqivalent of a few hundred lines of C code, you may be able to beat well written C by a small margin, but I think the days of doubling the performance of a block of code in assembler are basically over. Title: Re: x86 assembly optimized code for fractals needed ? Post by: David Makin on March 08, 2010, 09:05:08 PM In my own deep-zoom programs, I made an implementation of 128-bit double-double-precision emulation (not to be confused with 128-bit quad-precision), and now I can reach zooms of up to <Quoted Image Removed> with reasonable speed. I would say it is about 6 times slower than normal double-precision math, but it is still a lot faster than arbitrary-precision as in Ultra Fractal. One can also implement 160-bit double-extended-precision, 256-bit quad-double-precision, or even higher orders of double-precision extensions, but the math involved gets exponentially more complex. If anyone wants my library (in Java), just give me a sign and I'll put it up somewhere. iPhone/iTouch/iPad at the moment but Mac as soon as I get a Mac at home rather than just at work. Yes please, though I may not use it immediately, I would like to convert it to Objective C (and maybe C++). David, You're a fellow Mac developer? I didn't realize that. Did you know that XCODE includes native support for "long double" (128 bit quad). Based on limited benchmarking, it only appears to be about 1/2 as fast as double (64 bit) floating point. I would expect it to be 1/4 as fast. Duncan C iPhone/iTouch/iPad at the moment, Mac as soon as I get a Mac at home as well as at work :) Title: Re: x86 assembly optimized code for fractals needed ? Post by: David Makin on March 08, 2010, 09:18:46 PM If you want fastest possible speed, then you are probably better off not using ObjectiveC. The object model and the method binding of this language was never meant to be used for computational workloads. ObjC does have its strengths, but low overhead is not one of them. C++ does have a different object model that was specifically designed for low overhead. But even here the overhead is greater than zero in some relevant cases. If at all possible, try to stick with plain old C for lowest level of compute intensive stuff. That way, you can later use it from either C++ or ObjC. I would put that more generally. If you want the fastest compute code, use vanilla C, period. Don't use ANY object-oriented language. However, that does not mean your app can't be written in an object-oriented language. You just have to draw a circle around your compute-intensive code, and write that in carefully optimized vanilla C. I am willing to put my app, FractalWorks, against any other Mandelbrot/Julia renderer out there for raw speed. It is highly optimized; It takes symmetry into account, it uses boundary following to identify contiguous "blobs" of pixels with the same iteration value, and only computes the perimeter of those blobs; it is multi-threaded and will span an arbitrary number of worker threads based on the number of cores available, etc. It uses Objective C for most of the above, but the inner code to iterate a strip of pixels is pure C. I question whether or not it's still practical to create hand-written assembler for such things. These days there is a great deal going on that isn't well documented or understood: Predictive execution, pipelining, multi-level caches, parallel execution paths, etc, etc. Modern optimizing compilers are very sophisticated, and create very well tuned code. If you really, really know what you are doing, and spend months writing the eqivalent of a few hundred lines of C code, you may be able to beat well written C by a small margin, but I think the days of doubling the performance of a block of code in assembler are basically over. When I said convert it to Objective C and C++ I just meant so it could be used from GUI controllers in those formats, knowing me I'd probably attempt conversion of the number crunching to assembler - I still prefer programming in assembler to higher languages and do not trust any higher level compiler to produce optimum run-time code especially where the FPU is concerned (unless of course I wrote the compiler myself). Having said that, the last ASM code I did was for the ARM and not the x86 (a blitter library done for the older WM devices). Title: Re: x86 assembly optimized code for fractals needed ? Post by: Duncan C on March 09, 2010, 01:55:50 AM iPhone/iTouch/iPad at the moment, Mac as soon as I get a Mac at home as well as at work :) Really? Small world. My company is also focusing on iPhone/iPod touch/iPad development. We're a self-funded startup, so work and home are the same place at the moment... Do you have a background in Objective C/Cocoa development? Duncan Title: Re: x86 assembly optimized code for fractals needed ? Post by: David Makin on March 09, 2010, 03:09:38 AM iPhone/iTouch/iPad at the moment, Mac as soon as I get a Mac at home as well as at work :) Really? Small world. My company is also focusing on iPhone/iPod touch/iPad development. We're a self-funded startup, so work and home are the same place at the moment... Do you have a background in Objective C/Cocoa development? Duncan Not really, I started from scratch with respect to Apple based development for the iPhone and (although 47) I've only been coding in C/C++ since around 2002 when I was forced to do so to develop for the Windows PPC and Smartphone - prior to that all my coding was ASM e.g. MMFrac (PC/32-bit DOS), Crystal Dragon (Amiga), Tower of Souls (Amiga/PC/mobile devices), the 3D Pets series (PC) and a number of unused/unreleased items such as a GameAPI for Windows Mobile and a CPU/FPU based 3D engine (first Pentiums) that I unfortunately just finished as it was made redundant by the first true 3D graphics cards :) Title: Re: x86 assembly optimized code for fractals needed ? Post by: David Makin on March 09, 2010, 03:11:56 AM iPhone/iTouch/iPad at the moment, Mac as soon as I get a Mac at home as well as at work :) Really? Small world. My company is also focusing on iPhone/iPod touch/iPad development. We're a self-funded startup, so work and home are the same place at the moment... Do you have a background in Objective C/Cocoa development? Duncan Not really, I started from scratch with respect to Apple based development for the iPhone and (although 47) I've only been coding in C/C++ since around 2002 when I was forced to do so to develop for the Windows PPC and Smartphone - prior to that all my coding was ASM e.g. MMFrac (PC/32-bit DOS), Crystal Dragon (Amiga), Tower of Souls (Amiga/PC/mobile devices), the 3D Pets series (PC) and a number of unused/unreleased items such as a GameAPI for Windows Mobile and a CPU/FPU based 3D engine (first Pentiums) that I unfortunately just finished as it was made redundant by the first true 3D graphics cards :) I should add that for me work (at Parys Technografx) is just Steve Parys' front room :) Title: Re: x86 assembly optimized code for fractals needed ? Post by: utak3r on March 09, 2010, 08:43:51 PM As of the subject of this topic... if you're using only CPU power - it pays off well. In Apophysis I have almost every formula written both in Delphi and asm. The rendertime differences are really worth of it.
Of course - it's unportable, but still... Title: Re: x86 assembly optimized code for fractals needed ? Post by: Duncan C on March 10, 2010, 12:59:53 PM iPhone/iTouch/iPad at the moment, Mac as soon as I get a Mac at home as well as at work :) Really? Small world. My company is also focusing on iPhone/iPod touch/iPad development. We're a self-funded startup, so work and home are the same place at the moment... Do you have a background in Objective C/Cocoa development? Duncan Not really, I started from scratch with respect to Apple based development for the iPhone and (although 47) I've only been coding in C/C++ since around 2002 when I was forced to do so to develop for the Windows PPC and Smartphone - prior to that all my coding was ASM e.g. MMFrac (PC/32-bit DOS), Crystal Dragon (Amiga), Tower of Souls (Amiga/PC/mobile devices), the 3D Pets series (PC) and a number of unused/unreleased items such as a GameAPI for Windows Mobile and a CPU/FPU based 3D engine (first Pentiums) that I unfortunately just finished as it was made redundant by the first true 3D graphics cards :) I've written quite a bit of assembler in my day as well. In the early days, it was 6502, for the Apple II. Then 8086/8088, for the early IBM PC. (I wrote a word processor for the IBM PC in assembler, for a now-defunct company called Muse Software.) Then 68000 family. This is when I wrote hand-tuned floating point code, for Macs with an FPU. I haven't written much assembler in years however. It takes too long, and is too hard to maintain. I also question the payback, given the sophistication of modern processors and compilers. I would like to see a comparison of well written C vs well-written assembler for fractal calculation. Regards, Duncan C Title: Re: x86 assembly optimized code for fractals needed ? Post by: utak3r on March 15, 2010, 12:08:42 PM It all depends on both the compiler and the code itself. The easiest example is the fsincos function from FPU (computing both sinus and cosinus of the same angle) - what I do is inlining a one-liner asm code. The normal code stuff in nowadays compilers is optimized really well (I was studying the generated code). So, my humble conclusion is: if you can optimize your code before compiling it, it's good :) Just think of such situations as I described, and eventually drop here and there short inline asm, but let the whole code be in C or whatever.
Title: Re: x86 assembly optimized code for fractals needed ? Post by: David Makin on March 15, 2010, 02:20:13 PM The major issue with compilers is that since you presumably did not write the compiler yourself then you don't actually know what optimisations it's capable of (certainly not in detail) and so relying on the compiler itself is "iffy" and it's also difficult to correctly optimise your higher level code so the compiler will produce the best results.
In the case of the UF compiler it's not so bad, if unsure how the compiler optimises something I just ask Frederik, but you can't do that with say Xcode or Visual Studio (they are unlikely to want to reveal that level of detail, or simply do not have the time to do so, in fact I suspect the folks who would actually know never talk directly to the public). Title: Re: x86 assembly optimized code for fractals needed ? Post by: Duncan C on March 15, 2010, 02:43:13 PM The major issue with compilers is that since you presumably did not write the compiler yourself then you don't actually know what optimisations it's capable of (certainly not in detail) and so relying on the compiler itself is "iffy" and it's also difficult to correctly optimise your higher level code so the compiler will produce the best results. In the case of the UF compiler it's not so bad, if unsure how the compiler optimises something I just ask Frederik, but you can't do that with say Xcode or Visual Studio (they are unlikely to want to reveal that level of detail, or simply do not have the time to do so, in fact I suspect the folks who would actually know never talk directly to the public). David, Apple uses the GNU compiler family, which is open source. You could download the compiler and study it yourself. You can also use the XCode debugger to look a the machine code that the compiler generates. The code it generates is very different in debug and release mode, because release mode has a lot more optimization. If you want to study the code for efficiency be sure to look at the optimized version. I don't have time to write hand-coded assembler any more. I'm a lot more productive writing in a high level language and using an optimizing compiler. Having a background in assembler I know how to write efficient code, which helps. An efficient algorithm is often much more important than the most optimized code. Hand-tuned assembler to implement an N-squared algorithm will still be really, really slow for a large data-set. Duncan Duncan Title: Re: x86 assembly optimized code for fractals needed ? Post by: David Makin on March 15, 2010, 03:45:58 PM The major issue with compilers is that since you presumably did not write the compiler yourself then you don't actually know what optimisations it's capable of (certainly not in detail) and so relying on the compiler itself is "iffy" and it's also difficult to correctly optimise your higher level code so the compiler will produce the best results. In the case of the UF compiler it's not so bad, if unsure how the compiler optimises something I just ask Frederik, but you can't do that with say Xcode or Visual Studio (they are unlikely to want to reveal that level of detail, or simply do not have the time to do so, in fact I suspect the folks who would actually know never talk directly to the public). David, Apple uses the GNU compiler family, which is open source. You could download the compiler and study it yourself. You can also use the XCode debugger to look a the machine code that the compiler generates. The code it generates is very different in debug and release mode, because release mode has a lot more optimization. If you want to study the code for efficiency be sure to look at the optimized version. I don't have time to write hand-coded assembler any more. I'm a lot more productive writing in a high level language and using an optimizing compiler. Having a background in assembler I know how to write efficient code, which helps. An efficient algorithm is often much more important than the most optimized code. Hand-tuned assembler to implement an N-squared algorithm will still be really, really slow for a large data-set. Duncan Duncan I confess I'd forgotten that Xcode uses the GNU compiler so what I said is really more relevant to VS than Xcode, however unless there is clear and complete documentation for the compiler then trying to follow the intricate detail of how the optimisations work directly from the source code is not going to be efficient in terms of time taken :) |