I do not accept challenges, nor give out challenges. Life is not war.
i was very explicit in pointing out that i don't see this as any kind of war :/ that two programmers interested in low-level efficiency might compare performance ideals is natural, no?
frankly i'm very interested to see how fast your integer-asm iteration goes, and if i could egg you on to produce a strong result while having a bit of fun myself (i mentioned also that my forthcoming contract work also involves similar optimisation), where's the harm? i really must say i'm disappointed with this negative interpretation of what i thought might be a bit of fun. producing a 24576x16384 image (1.18gb) with 12*12 supersamples per pixel of up to 256 iterations in under 5 1/2 hours is fun for me, at least.
However, I have been there and done that. I have been in the world of floating-point, and way beyond 64 bit. I wrote mathematics to a precision of 65 places of DECIMALS. That was 28 BYTES (mantissa) and two bytes exponent.
holy smokes... what kind of application needs such precision?! i know pi off by heart to sufficiently many decimals (far less than 65 i can assure) to describe a circle from the sun to pluto accurate to a metre, and that's widely considered sufficiently insane ;)
i read that, but must admit it's a bit secondary to my interests since i need accuracy across many scales. modern architectures also shun the horribly inefficient stack-based x87 architecture in favour of a direct-access scalar computation model offered by cpus starting with the pentium4 (which itself has been around for quite some time).
furthermore, i must admit i "cheated" and used 128bit simd instructions - they admit a remarkably efficient mapping to escape time fractal computation. on my amd k8 architecture box such 128bit operations have a latency of 2 cycles, and on the latest intels (so-called "core" architecture) they take only 1 cycle - a true 128bit machine! furthermore, there are now quad-core versions of that architecture, scaling to 3.4ghz. i'm going to look into getting a benchmark result from one of those machines soon; i would like to point out that this incredible performance scaling happens with
exactly the same source code and binary, flexibility which i believe is not easily achieved without using an intermediate language (ignoring the fact that it took zero effort on my part - as it should be, since it's a very mechanical job to replicate jobs across cores and shouldn't be a human burden if possible).
the relevance of all of the above is that the straightforward c/c++ code would need just a recompile to make very good use of these new technologies. what's more, the way i wrote those 128bit operations is via so-called "intrinsics", which are essentially a bunch of macros and wrapper functions that map (often directly) to the intended cpu instructions. the difference is this: the operands are just variables, and the compiler handles the arduous and combinatorial task of instruction scheduling and register allocation. you're still coding the algorithm very close to the metal, but thanks to this elegant method of specifying it the compiler, which can try so very very many different combinations of code and measure which is fastest in the blink of an eye, is free to produce whatever is most efficient given the architectural resources available. i find it very difficult to believe that this isn't a huge step up from directly writing assembly in terms of resulting code speed, programmer effort and sanity, ease of debugging, future proofing, ...
When a language like C, or C++ is created, somebody first writes the inner core in assembler. When Inmos introduced the Transputer for supercomputers, they said that nobody ever programs in assembler any more. That is a joke. At an exhibition I met Sir Clive Sinclair, and repeated the joke. He said "That is what they want you to believe". Later, Inmos published the op-codes of the Transputer (which they had been keeping secret).
everyone who's programmed in asm has encountered such ignorant attitudes towards their beloved "dark art". i'm 23, and already feel rather old telling younger programmers war stories about optimising inner loops for the pentium, trying to best take advantage of both the u and the famously-crippled v pipeline via careful scheduling. such an attitude is unfortunately, however, rather justified both in terms of business sense (development is costly, in most cases execution power is dirt cheap, so these days code is more for human consumption than machine consumption) and in terms of common sense: if cpus are so very capable of re-arranging in-flight instructions to perform well, coupled with incredibly good optimising compilers... well, it just makes sense to focus more on what really matters - getting the algorithm very clean and minimal, avoiding poor cache behavior. that's the #1 performance killer these days, since cpus run in gigaherz and memory in megaherz, and a cache miss costs hundreds of cycles (equivalent of a great many alu operations).
- i hope you don't mind if i don't comment on your forth war stories; while they are certainly appreciable i am far too ignorant of the language and its history to provide meaningful commentary in that regard -
Explain to me how a machine can run without a processor. If there is no machine code, one can removed the CPU and the system will still run - in one's dreams!
So there is always machine code somewhere - even if it is hidden from the programmer.
this is the crux of my little discussion, and the main thrust of why i wrote the program i linked above: i am
not suggesting that assembly is dead or useless, only that there are much easier ways to achieve much better results (given the metrics referenced previously) for the same time investment as would be spent on writing it in pure asm. that is the whole of it - there is no war - i saw that you were frustrated by things with which one should not be frustrated in 2006/2007, thought about how much you stand to gain by a slight change of gear (the way i code is still very close to the machine), and sought to try and show it to you. please don't read animosity into this :/