|
David Makin
|
 |
« Reply #30 on: March 23, 2009, 09:45:14 PM » |
|
Actually I may not be correct on second thoughts - it probably doesn't make that much difference in this case - p would probably end up a stack variable and #pixel would be at a fixed address, either would be cached and I can't remember if there's an overhead for indexing off the stack compared with reading from a fixed address  I'm now a bit over-eager to avoid using variables after finding out that UF does absolutely the most it can at compile time - for instance if you have a user parameter @power then you should use (@power-1) rather than assigning @power-1 to a variable - that will most definitely make a difference since using (@power-1) directly would mean no indirection or calculation at all in the run-time code.
|
|
|
|
|
Logged
|
|
|
|
|
Nahee_Enterprises
|
 |
« Reply #31 on: March 24, 2009, 11:19:07 AM » |
|
Actually I may not be correct on second thoughts - .... ....I can't remember if there's an overhead for indexing off the stack compared with reading from a fixed address. I'm now a bit over-eager to avoid using variables after finding out that UF does absolutely the most it can at compile time - .... There should be a document made available to all formula writers on how the fractal application handles various situations, which includes what goes on within compilers and interpreters. It sure would make better formula writers, not to mention better formulae, if such detailed knowledge was required reading before everybody decided they wanted to try their hand at such things. 
|
|
|
|
|
Logged
|
|
|
|
|
David Makin
|
 |
« Reply #32 on: March 24, 2009, 12:05:24 PM » |
|
There should be a document made available to all formula writers on how the fractal application handles various situations, which includes what goes on within compilers and interpreters. It sure would make better formula writers, not to mention better formulae, if such detailed knowledge was required reading before everybody decided they wanted to try their hand at such things.  Well the material is there - it's the Intel documentation on their assembly language - with respect to UF code just think what's the best way of converting the UF source to assembly code and that's basically what Frederik's compiler does  Of course not everyone can be bothered learning assembly code....
|
|
|
|
|
Logged
|
|
|
|
|
Nahee_Enterprises
|
 |
« Reply #33 on: March 24, 2009, 01:48:03 PM » |
|
....with respect to UF code just think what's the best way of converting the UF source to assembly code and that's basically what Frederik's compiler does  Of course not everyone can be bothered learning assembly code.... Which is why there should be a document explaining exactly what the compiler is doing. Something that the many ordinary formula writers would be capable of understanding. A document that gives details and examples of the correct way to write formulas, and what happens when one does something otherwise. A document with various Tips, Tricks, and "I Gotchas" to assist the formula writer. It has been many years since I wrote code directly in Assembler Language. But I do know that this is something the average and/or new formula writer would never wish to learn. So, a document about the compiler and what happens should be essential to a fractal application.
|
|
|
|
|
Logged
|
|
|
|
HPDZ
Iterator

Posts: 157
|
 |
« Reply #34 on: March 24, 2009, 05:51:49 PM » |
|
My two cents on some of these comments:
1. There is essentially no difference in the Intel processors between retrieving a variable from the stack versus some non-stack memory location. There's a slight overhead if you dereference a pointer versus accessing a direct address, maybe, depending on what the cache is doing and how well the look-ahead logic is working and which processor you have, etc. Generally, all the modern CPUs have filled the cache with the data your instruction will need long before the instruction executes, unless you're accessing a whole bunch of widely separated memory locations in a row.
2. Does the UF compiler actually compile down to machine code?
3. If UF does indeed produce "good" machine code, then by far the dominant time-consuming part of evaluating an expression such as z^p where p and z are both complex numbers will be all the trigonometric, exponential, and logarithmic functions involved. It will probably be on the order of 1000-10000 times as much time as accessing any number in memory, or any other housekeeping-type of operation in a fractal forumla's loop (i.e. looking up a user parameter, incrementing and testing a loop conrol variable, etc.).
4. Documentation...is good, of course. One problem with documenting nitty-gritty details is that if you ever change anything, you have to change the documentation. But as far as documenting the UF language for purposes of speed optimization, I would speculate that it won't make much difference in most circumstances. If the code is getting down to the point where details like the CPU cache and branch prediction are important considerations, then documenting the compiler won't help much because the different flavors of CPU all do these things slightly differently (e.g. the Core2 works differently than the Pentium IV). When this kind of optimization is important, you generally need special performance-monitoring tools that can report on things like cache misses, branch prediction failures, and concurrency failures due to dependencies in the code. Often, these optimizations are tailored to a specific processor based on empirical testing. Both AMD and Intel are not good about documenting the details of the more sophisticated stuff in their processors (i.e. the cache algorithms, branch prediction algorithms, scheduling across multiple execution units, etc), I suspect because it's proprietary information they don't want to give away.
5. That said, of course, good programming practices should be used, I'm sure. I suspect that's what Nahee_Enterprises was referring to.... Things like saving results of complicated calculations that are constant across a loop rather than calculating them over and over in each loop iteration, etc. I'd love to help with this, but I don't know if my experience with C and assembly language programming has much applicability to UF programming, which I am still quite new at.
|
|
|
|
|
Logged
|
|
|
|
|
David Makin
|
 |
« Reply #35 on: March 24, 2009, 06:08:27 PM » |
|
2. Does the UF compiler actually compile down to machine code?
3. If UF does indeed produce "good" machine code, then by far the dominant time-consuming part of evaluating an expression such as z^p where p and z are both complex numbers will be all the trigonometric, exponential, and logarithmic functions involved. It will probably be on the order of 1000-10000 times as much time as accessing any number in memory, or any other housekeeping-type of operation in a fractal forumla's loop (i.e. looking up a user parameter, incrementing and testing a loop conrol variable, etc.).
2. Yes I think so (it would be a whole lot slower otherwise  3. Point taken on the times. Actually I'd guess evaluating z^p is probably only around 500* slower than looking up a variable on a modern FPU - also if that's what your formula needs to do you can't optimise that, you can only optimise what it's possible to optimise. Even shaving 0.1% off the render time can make a big difference when doing large disk renders or long animations (that may take days or weeks in total).
|
|
|
|
« Last Edit: March 24, 2009, 06:11:10 PM by David Makin »
|
Logged
|
|
|
|
|
David Makin
|
 |
« Reply #36 on: March 24, 2009, 07:44:26 PM » |
|
<snip> Actually I'd guess evaluating z^p is probably only around 500* slower than looking up a variable on a modern FPU
I meant if computing the full complex calculation. I just checked on the following to verify in UF: If you have a value z^p where p is a run-time variable then UF has to compile allowing for compututation of the full complex calculation. If however p is a known value at compile time (such as using z^(@power-1)) then UF checks the value and compiles the code accordingly - so raising to small integer powers 2,3,4 etc. is magnitudes faster, in fact it's quite a bit faster if the compile time value is just entirely real or entirely imaginary.
|
|
|
|
|
Logged
|
|
|
|
|
Nahee_Enterprises
|
 |
« Reply #37 on: March 24, 2009, 07:52:01 PM » |
|
4. Documentation...is good, of course. One problem with documenting nitty-gritty details is that if you ever change anything, you have to change the documentation. .....
5. That said, of course, good programming practices should be used, I'm sure. I suspect that's what Nahee_Enterprises was referring to.... Things like saving results of complicated calculations that are constant across a loop rather than calculating them over and over in each loop iteration, etc. Thank you, that was pretty much what I was getting at. And without sufficient documentation, it is difficult to have good programming practices (something I have always tried to follow for many years, and something I attempt to teach my students). Yes, change usually does require updating the documentation. Just something that needs to be done with every version/release level, whether it be an Installation Guide, a User Manual or a Formula Writer's Guide. 
|
|
|
|
|
Logged
|
|
|
|
|
David Makin
|
 |
« Reply #38 on: March 24, 2009, 08:57:50 PM » |
|
4. Documentation...is good, of course. One problem with documenting nitty-gritty details is that if you ever change anything, you have to change the documentation. .....
5. That said, of course, good programming practices should be used, I'm sure. I suspect that's what Nahee_Enterprises was referring to.... Things like saving results of complicated calculations that are constant across a loop rather than calculating them over and over in each loop iteration, etc. Thank you, that was pretty much what I was getting at. And without sufficient documentation, it is difficult to have good programming practices (something I have always tried to follow for many years, and something I attempt to teach my students). Yes, change usually does require updating the documentation. Just something that needs to be done with every version/release level, whether it be an Installation Guide, a User Manual or a Formula Writer's Guide.  Hi Paul - looks like you're after the sort of documentation that isn't even included in the Visual Studio manuals, or the Xcode manuals...... 
|
|
|
|
|
Logged
|
|
|
|
|
gamma
|
 |
« Reply #39 on: March 25, 2009, 12:40:49 AM » |
|
Matlab is probably the slowest environment based on some continuing standard from 1915. Yet, it is so nice to get back to an interpreter, just coding, having fun... but it can link and has a C compiler.
|
|
|
|
|
Logged
|
|
|
|
|
Nahee_Enterprises
|
 |
« Reply #40 on: March 26, 2009, 01:52:43 AM » |
|
Hi Paul - looks like you're after the sort of documentation that isn't even included in the Visual Studio manuals, or the Xcode manuals......  In my own opinion, Microsoft has never been able to produce good documentation. And that includes their books, CDs, PDFs, Help files (plain, compiled, or HTML), or anything they have Online. I was thinking of something more like what O'Reilly publishes with their technical books, in particular their "In A Nutshell" series. Or maybe something like the old IBM programming manuals from the mainframe days. 
|
|
|
|
|
Logged
|
|
|
|
|
lycium
|
 |
« Reply #41 on: March 26, 2009, 08:32:22 AM » |
|
In my own opinion, Microsoft has never been able to produce good documentation. While that's generally case, "never" is perhaps too strong - I have fond childhood memories of QBASIC's help system, and these days find MSDN to be quite helpful, especially the DirectX documentation and examples. Still, the unimaginable worldwide frustration caused by Clippy will not soon be forgotten!
|
|
|
|
|
Logged
|
|
|
|
HPDZ
Iterator

Posts: 157
|
 |
« Reply #42 on: March 27, 2009, 01:00:35 AM » |
|
Actually I'd guess evaluating z^p is probably only around 500* slower than looking up a variable on a modern FPU The "1000-10000" figure I put out was pretty much a SWAG, partilly influenced by my outdated memory of the old 8087. A quick check of the relevant documentaiton for modern CPUs shows your estimate of 500 is closer to the truth. We need one log, one exp, two trig functions, and a few multiplications. Depending on the exact CPU type, this can all be done with a throughput of about 200-300 clock cycles, ignoring load/save time for intermediate calculations, and neglecting any cache or latency issues. Loads have a throughput of 1 clock cycle usually.
|
|
|
|
|
Logged
|
|
|
|
|
lycium
|
 |
« Reply #43 on: March 27, 2009, 08:39:29 AM » |
|
Actually I'd guess evaluating z^p is probably only around 500* slower than looking up a variable on a modern FPU The "1000-10000" figure I put out was pretty much a SWAG, partilly influenced by my outdated memory of the old 8087. A quick check of the relevant documentaiton for modern CPUs shows your estimate of 500 is closer to the truth. We need one log, one exp, two trig functions, and a few multiplications. Depending on the exact CPU type, this can all be done with a throughput of about 200-300 clock cycles, ignoring load/save time for intermediate calculations, and neglecting any cache or latency issues. Loads have a throughput of 1 clock cycle usually. memory accesses run into the thousands of cpu cycles these days, partly due to these things being massively speculative and superscalar; x86 performance programming has been all about caching and careful memory access since the pentium1. so while memory access has gotten much much faster (in both latency and throughput -- ~50ns and 25gb/s these days), computational throughput has grown exponentially faster still to the point where you have arrays of x86 threads, all memory-starved! it is generally faster to recompute a lot of things (even a 3-vector normalisation, the evil 1/sqrt pair) than to do an uncached load. anyway, no more speculation: these things can easily be measured. on something like a core 2 duo (and newer) you'll be really amazed at what you find 
|
|
|
|
« Last Edit: March 27, 2009, 09:02:20 AM by lycium »
|
Logged
|
|
|
|
|
lycium
|
 |
« Reply #44 on: March 27, 2009, 09:06:53 AM » |
|
|
|
|
|
|
Logged
|
|
|
|
|