Title: Implemented the Mandelbrot loop in Lisp. Post by: Pauldelbrot on June 30, 2009, 09:04:12 AM The really interesting thing is, it's a macro, and it can be called simply with "(:= z (+ (** z 2) c))" and it will spit out a loop that calculates the M-set.
Not only that, it takes two parameters, bitsprecision and a number-system object. If bitsprecision is high enough, it will spit out a loop that uses bignums instead of IEEE doubles. Orthogonally, the number-system object is defined in terms of a list of unit-symbols, a unit multiplication table, and a collection of functions for reciprocal and transcendental functions. Passing different number systems to the macro generates different versions of the loop. I could generate a loop that iterated only real numbers (say, for generating a bifurcation diagram), and I've also implemented the number system objects for quaternions and what Fractint calls hypercomplex numbers (except no transcendentals yet; I have to implement those for bignum reals and for non-reals.) Basically, it does symbolic algebra to convert the input assignment-like expressions and number-system info into more assignment-like expressions, representing the same math but decomposed into the individual real-number components. So two input-expressions become four with complex arithmetic and eight with quaternions. These are then subjected to algebraic simplification, which can extract common factors, apply the distributive rule, collect like terms, and similarly -- this step turns a (+ (* z-r z-i) (* z-i z-r)) into a (* 2 z-r z-i) in the case of the standard Mandelbrot fractal, for example. Yes, it does detect if terms are the same even if products or sums are rearranged. And since this step is applied after reduction to real-number arithmetic, it doesn't matter that quaternions are noncommutative; by the time simplify sees quat expressions, they're groups of four real expressions rather than quat expressions, and real multiplication is commutative. The upshot: by implementing a fair bit of a symbolic-algebra package and one lousy macro I have ANY non-transcendental map, to ANY precision, in real, complex, quaternion, or hypercomplex arithmetic. I don't have to write the inner loops for everything separately. The computer does it for me. :) Best of all, the loops are fast. The IEEE double loop it spits out for plain old complex quadratic Mandelbrot is 40% faster than Ultra Fractal's inner loop. The bignum loop is 7 times slower than UF's at 22 decimals, for now, but I plan to fix that. Plus this thing will get speed enhancements elsewhere. For starters, the system can in fact compile a loop on the fly for each specific image, or even each layer, with parameter values baked in as constants. So if I want to generate a Julia set image for c = i, the loop it generates will be short one add because the real part of c is zero and it can just generate a loop that has Furthermore, some stuff you can't do in UF without multiple layers that duplicate one another's calculations, this software will do with a single calculation of each pixel and smarter handling of the resulting data. I also plan to have faster bailout tests: instead of looking for convergence to a possibly-expensive-to-test shape, it will run a high speed loop that tests for exit of a square or circular area, then iterate a lower-speed loop until it hits the target shape. So for most iterations, say 999700+ out of 1 million for a deep point, it will not be using the expensive bailout test. The Lisp being used is Clojure, which runs on a JVM. This means this stuff is portable to every system that has a JVM, fast on every system that has a good JITting JVM, and interoperates nicely with Java. (The current bignum implementation, which sucks, uses java.math.BigDecimal. I will probably adapt the BSD-licensed JScience LargeInteger code to make a custom one that's faster, particularly by using Karatsuba multiplication when it gets really REALLY big. With that, it will probably beat Fractint(!) at hundreds of decimals.) The Java interoperation means programs built on this core have ready access to GUI, PNG encode/decode, other image formats, the network, and so forth. Last but not least, Clojure's eval is available at deployment time, so the formula parser is nearly fully written: the user will be able to just type "z^4 - z^2 + c" and the computer will grind away, compile a function to JVM byte-codes to iterate that really really fast, and off it goes, with the JIT putting the finishing touch on it. Did I mention that the loop-generating code does common subexpression elimination? And that all of this has been tested and found to work? 100 million iterations in under 700msec on my E5200; UF takes about 1 second to do that many iterations on one core on the standard Mandelbrot zoomed into the cardioid and with guessing and periodicity checking turned off. That, by the way, is what's next: adding periodicity logic to the emitted loops, controlled by a parameter, and a slightly smarter bailout. Right now it generates a circular/spherical bailout zone, though it also does common subexpression elimination between the iteration steps and computing the squared modulus. It needs to be smarter: detecting if it's cheaper to test a sphere or a square and choosing accordingly what code to emit. For the normal M-set, a sphere is cheap because x2 and y2 are needed anyway. Add an add and a compare and you've got your bailout test, versus four compares for a square bailout. On the other hand, if there were no x2 or y2 already needed elsewhere, the circular bailout test becomes two multiplies, an add, and a compare instead of four compares. Since adds, subtracts, and compares tend to be comparable in expense and multiplies tend to be more expensive than any of those, four compares is cheaper in this instance. A multiply, an add, and a compare might be cheaper than four compares though, so it might be worthwhile to use a sphere if it's a complex number and one of the squares is needed. For higher dimensions d, there are 2d compares needed for a hypercube, d multiplies, d minus one adds, and a compare needed for a sphere, so many of the squares would need to be already needed for this to be a winner. How many? Probably hardware-dependent. I'll have to experiment. Further down the line: add distance estimator capability. I've determined that to support most of the common colorings, and many of my own custom ones, you just need the last iterate and the number of iterations. The major exceptions:
Clojure supports a concept of lazy sequences, so we can generate a lazy sequence of iterates that, under the hood, runs a fast iteration loop to spit out 64,000 at a time (or however many), then trickles these out to the sequence's consumer, then generates some more. Alternatively, the caller can just feed the existing macro a few extra expressions to compute to accumulate a weighted sum or whatever. Heck, I don't even need to add a flag to compute a derivative/Jacobian; I can just create symbolic algebra code to compute how to compute the derivative/Jacobian and then append those to the list of expressions fed to the macro. The output loop will compute them all and the derivative will be right after the last iterate in the array of output values. And the macro will do common subexpression elimination between all of these various calculations, so if xy appears in a lot of places, the output code will assign it to a temporary and then use that repeatedly. Oh, and did I mention I've implemented what I've implemented so far in four days? In a week at that rate I'll have derivatives, Jacobians, and what-have-you. (Yes, it already will iterate functions of more than one variable, such as the Henon map.) In a month at that rate, a working demo that renders PNG images and breaks all kinds of speed records. It's shocking that nobody much else is apparently developing fractal software in Lisp, given what it's capable of. :) Title: Re: Implemented the Mandelbrot loop in Lisp. Post by: Pauldelbrot on June 30, 2009, 09:15:47 AM For higher dimensions d, there are 2d compares needed for a hypercube, d multiplies, d minus one adds, and a compare needed for a sphere, so many of the squares would need to be already needed for this to be a winner. How many? Probably hardware-dependent. I'll have to experiment. To hell with experimenting. I'll just have the computer do it. It would be easy to add a flag parameter to generate either spherical or cubical bailout zones and then a wrapper function to generate both, test them on an interior point, and return the faster one. Finding an interior point might be difficult. Sometimes there are none -- disconnected quadratic Julia sets come to mind. Maybe for the speed tests it can generate loops that do the bailout test but don't actually quit if it's triggered, run each a few times so the JIT kicks in, then generate a normal bailing-out loop using whichever method was faster. Of course, stopping the compiler optimizer eliminating the unused test would be tricky. Perhaps use the test to flip the sign of a number, so it matters semantically, in dummy loops. In those dummy loops the actual iterations don't matter anyway, other than how fast or slow they are. Title: Re: Implemented the Mandelbrot loop in Lisp. Post by: Pauldelbrot on July 01, 2009, 08:43:30 AM To hell with experimenting. I'll just have the computer do it. It would be easy to add a flag parameter to generate either spherical or cubical bailout zones and then a wrapper function to generate both, test them on an interior point, and return the faster one. Finding an interior point might be difficult. Sometimes there are none -- disconnected quadratic Julia sets come to mind. Maybe for the speed tests it can generate loops that do the bailout test but don't actually quit if it's triggered, run each a few times so the JIT kicks in, then generate a normal bailing-out loop using whichever method was faster. Of course, stopping the compiler optimizer eliminating the unused test would be tricky. Perhaps use the test to flip the sign of a number, so it matters semantically, in dummy loops. In those dummy loops the actual iterations don't matter anyway, other than how fast or slow they are. Done. In fact, the original macro is now a function that returns a function (so, a higher-order function, albeit one that actually compiles its return value :)). It now takes several new or changed parameters. The bailout may now be positive, negative, zero, or nil. Nil means no bailout test, for mappings with no attractor at infinity. Zero means only use a passed-in bailout function. Positive means spherical bailout with that radius. Negative means cubical bailout with doubled absolute value as side length. The emitted function will, if the bailout wasn't zero, run a high speed loop until the bailout or the iteration limit is hit. Then, if the caller supplied a symbol instead of nil for the bailout test function, it will loop some more this time calling the function bound to the supplied symbol for the bailout test. So if for instance we want a star-shaped bailout area in which a circle of radius 5 centered on the origin can be inscribed without clipping the inner points and in which a square of side length 4 centered on the origin can be inscribed without clipping the inner points, we can call the function with a bailout number of 5 or -4 and the syntax-quoted name of a test function that takes two arguments (for the complex case) and decides if it's outside the star yet, and get back a loop that will run until the point lands outside the circle or square using the fast test for such a shape, and then iterates a bit more until the star-test function returns true to indicate the point landed outside the star. The function returns with the first iterate that landed outside the star, but a lot faster than if it did a complicated test for a star-shaped region every iteration. In addition, it does loop unrolling, controlled by another parameter. 1 yields normal loops. 2 or higher unrolls the loop and moves the iteration counting outside of the unrolled portion; so instead of Code: maxit = 1000 it might be Code: maxit = 500 or even Code: maxit = 250 and in theory this could save time by only testing and incrementing i every nth iteration. Last but not least there's a driver function that takes some of the parameters, including separate circular and square bailouts, and calls the function-generator with various combinations of square or circular bailout and unrolling amount, passing a flag that generates a dummy function that just takes maxiters, treats the other input parameters as zeros, and doesn't really bail out though it does the bailout tests. These are all timed, and whichever combination of parameters yields the fastest time is used to produce a non-dummy iteration function that is then returned. The timing function runs the dummy iterator through a million iterations fifteen times in a row, averaging the time taken for the last ten runs; the first five are discarded, since the JIT won't have kicked in for the first few. So far, it almost always chooses the circular bailout and a non-unrolled loop; almost every nontrivial nonlinear function of z involves x2 and y2. Interestingly, it prefers circular bailout even to iterate z = z + c, which doesn't involve x2 or y2. Perhaps the E5200 I'm running it on has a very fast fmul instruction? Regardless, I've continued testing the stock M-set iteration, and it continues to emit a loop for it that completes in a blistering 7 nanoseconds on this box. That's maybe 17 CPU cycles. I can account for most of them: Code: branch if i > max Integer compare, branch, fmul, fmul, fadd, float compare, branch, fmul, fmul, fadd, fsub, fadd, inc, branch; that's at least fourteen cycles right there. A couple of the instructions are probably not able to be pipelined, accounting for the other three. The code I wrote, combined with Clojure's compiler and Sun's Hotspot server VM JIT, appears to be generating about the fastest loop possible. Did I mention I've added distance estimator? Well, not quite. But the iterator-generator can take some forms representing the mapping and now also another collection of forms representing auxiliary computations, which will be performed each time around the loop but whose loop variables won't participate in bailout calculations. The auxiliary computations can, and generally will, refer to the variables in the mapping (e.g. z). Perfect for accumulating the derivative for distance estimator, or similarly. Next up: adding another flag to emit a loop either with or without periodicity checking, perhaps even a specific period to look for. Title: Re: Implemented the Mandelbrot loop in Lisp. Post by: cKleinhuis on July 01, 2009, 07:19:24 PM wonderful, i love such detailed monolugizes :D
cheers Title: Re: Implemented the Mandelbrot loop in Lisp. Post by: Pauldelbrot on July 02, 2009, 07:48:45 AM wonderful, i love such detailed monolugizes :D cheers Thanks. Loop-generator can now emit iteration functions like before, or with extra parameters and with a spherical or a square test for the iterated point hitting a point specified by the extra parameters. This will be used for:
It may seem limited, that it will only check for one point, but this is actually a feature, not a bug. In practise, when any rendering method hits a point that goes to a particular attractor, it tends to hit lots more in a row that share the same fate, simply because basins, M-set components, and the like tend to be unions of contiguous regions of space. So, the iterator will do something like this:
For the quadratic M-set iteration, the no-check loop emitted runs at 7ns per iteration; the checking loop runs at about 10ns per iteration. The no-check loop, being 30% faster, is therefore preferred if it's likely the current point will escape. The checking loop is clearly preferred otherwise, with some check or another: my test point 0.2 + 0.1i (near the set's cardioid cusp) lands on 0.21556775949710844 + 0.1757887921270715i. With an iteration limit of 100,000,000, the no-check loop takes 700ms to discover this fact, whereas the checking loop with that attractor plugged in iterates 25 times and stops, taking 150 microseconds, at 0.21556774298091957 + 0.17578884455030708i. 25 iterations of the checking loop takes 250ns, so nearly all of those 150 microseconds are function-call overhead! This indicates that the minimum number of sub-loop iterations when chaining should be 150 microseconds worth, or 15,000 iterations(!) in the case of the normal M-set loop. I will add a function to the code to compute this minimum for whatever actual mapping is being used; this should be simple: 150,000 divided by average-time-to-execute-one-iteration-in-nanoseconds. Title: Re: Implemented the Mandelbrot loop in Lisp. Post by: Nahee_Enterprises on July 03, 2009, 12:52:13 AM It's shocking that nobody much else is apparently developing fractal software in Lisp, given what it's capable of. :) I can see that you are really on a roll with this, and very much stoked. I would cheer you on, but I am afraid what it would do to you in your current state of arousal. ;) :evil1: Keep us informed as to the current progress. I am curious about comparison times once you have completed the bulk of the coding. (And would love trying it out.) :D Title: Re: Implemented the Mandelbrot loop in Lisp. Post by: David Makin on July 03, 2009, 01:35:32 AM It's shocking that nobody much else is apparently developing fractal software in Lisp, given what it's capable of. :) When your project is in a form others can use then please count me in as a beta tester :) Also - what development software are you using ? Title: Re: Implemented the Mandelbrot loop in Lisp. Post by: Pauldelbrot on July 03, 2009, 01:42:35 AM When your project is in a form others can use then please count me in as a beta tester :) Also - what development software are you using ? NetBeans. Title: Re: Implemented the Mandelbrot loop in Lisp. Post by: Pauldelbrot on July 03, 2009, 12:32:32 PM Periodicity checking added more fully to loop generator. I found that the Clojure function call overhead was fairly steep, to the point that one function call took as long as 15,000 raw Mandelbrot iterations, so I rewrote the loop generator to take two more parameters, per-check? and per-check-brent?, and if either was present, wrap its output loop in an outer loop that implements the appropriate algorithm.
It can now output four kinds of loop:
The inner loops are limited to 32-bit integer iteration counts, so if for instance the step size in the specific-period-check is bigger than about 2 billion, the largest factor of the period smaller than that will be used instead, and the Brent algorithm will switch eventually from powers to a linear sequence. The first two loop types actually come out with a period-check loop checking for a very large (around 2 billion) period. The outer loops can use bignums to count iterations, though on present-day hardware, only ones in the low billions will be very useful. :) All this has, as usual, already been tested and seen to work. Including running a Mandelbrot point through five billion iterations, more than a 32-bit integer can track, at decent speed because bignum arithmetic is only used every couple of billion iterations. (It stopped at about the size limit of a 32-bit unsigned integer, actually, but not because of any bug; rather, because that iteration was the same as iteration 2147483647, and the way it handles bigger max-iters values is to force on period-testing with that period. And last but not least, this was generated from a generated loop: Code: ......................................... Title: Re: Implemented the Mandelbrot loop in Lisp. Post by: Pauldelbrot on July 03, 2009, 12:55:34 PM Code: ................................................................................. What's missing from this picture is the impressive accuracy. Iteration limit: five billion, with a "b". It took three seconds to spit that out, thanks to Brent's algorithm. Title: Re: Implemented the Mandelbrot loop in Lisp. Post by: cKleinhuis on July 03, 2009, 05:54:16 PM oh, with that iteration depth it would be interesting to make a new "DEEPEST MANDELBROT" picture :D
O0 O0 O0 Title: Re: Implemented the Mandelbrot loop in Lisp. Post by: Pauldelbrot on July 04, 2009, 08:36:05 AM Didn't do as much today, as I was busy much of the day with other things.
But it now has code to measure the costs of the basic arithmetic operations, add, subtract, negate, multiply, and divide, on whatever type of number is being used. So far, the formula compiler only uses this to optimize some expressions of the form (* 2 x y) into (+ (* x y) (* x y)) and the like (which, with common subexpression elimination, will end up an add and a multiply instead of two multiplies). Any multiplication with an integer coefficient is potentially replaceable with a sum, a negation, or a negation of a sum. The question is whether a multiply costs more than n minus one adds and possibly a negation. The costs of adds, multiplies, and negations are benchmarked for each newly-seen precision of bignum, and for doubles, and when the compiler is called upon to supply a loop for a particular mapping at a particular precision, it will use this data to optimize the loop. On my hardware, all the basic FP operations take the same amount of time, so the normal Mandelbrot inner loop's y recurrence is (* 2 (* x y)) after being broken down into unary and binary expressions. But for bignum arithmetic, a multiply is almost always costlier than an add so this becomes (+ (* x y) (* x y)) with CSE reducing that to one multiply and one add. It changes * 2 and * 3 to sums immediately when bignums first are needed, but * 37 is not changed into an add before the number of bits of precision gets into the upper hundreds. :) The next planned optimization is to generate all the possible breakdowns into binary operations, including extracting common factors from sums, and tot up the costs of each version, picking the version that is cheapest. Title: Re: Implemented the Mandelbrot loop in Lisp. Post by: Pauldelbrot on July 05, 2009, 10:00:09 PM Not gotten much done. I did build the expression rearranger, but testing it revealed a problem. I knew the number of possible rearrangements would grow exponentially, but it turns out to be a very FAST exponential. The real component of z2 + c has six possible variations; that of z3 + c 29, that of z4 + c over 700, and that of z5 + c enough to crash the computer.
Needless to say, another approach to optimizing is needed. There are three major optimization possibilities:
The trickiness is that CSE interacts unfavorably with common factors -- if we have (+ (* 2 x y) (* 2 x z)) we can extract out a common factor of (* 2 x) and turn four multiplies into three. On the other hand, if elsewhere we have (+ (* 2 y) (* 2 z)) CSE can factor out the latter products, saving two multiplies, if the sums don't get simplified; if they do, it can factor out (+ x y). CSE now saves an add, but no multiplies. On the other hand, factoring saved two multiplies in this case, one from each sum. If the expression grouper chooses to group the first factored sum as (* x (* 2 (+ y z))) CSE can factor out the inner product, saving another multiply, but not if the expression grouper groups the product with x in the inner product. So this example shows that the expression grouper needs to be smarter about CSE opportunities somehow. Another case would be a three-term sum with two terms sharing a common factor, and a different two terms a different one. How should (+ (* 2 x) (* 2 y z) (* z x)) be factored? As (+ (* 2 (+ x (* y z))) (* z x)), as (+ (* 2 x) (* z (+ (* 2 y) x))), or as (+ (* x (+ 2 z)) (* 2 y z))? All three save a multiply, so any difference between them will come from how they interact with CSE; if there are lots of occurrences of 2z as a factor in various products, the third one is the winner, whereas if there are lots of occurrences of yz elsewhere in the expressions to be calculated, the first one is the winner, while lots of xy would make number two come out on top. One thought I had was to find every possible subexpression and see which ones occurred the most, eliminating those first and rearranging to suit, but that runs afoul of exponential growth again, and my experimental code for it also seems to unfortunately provoke an implementation bug of some sort, which I still have not heard back from the tool-chain developers about. Now I'm thinking I may have to resort to a few heuristics, such as to prefer to keep squares and higher powers together (they can benefit from internal CSE and CSE with circular-bailout calculations) and to build a list of the subexpressions as it groups terms, rather than later, and have the already-listed subexpressions influence future term-grouping (to favor keeping copies of such subexpressions together, enabling CSE). This can be done in one optimization pass, with small-integer products optimized into adds, or shifts-and-adds, in a second pass afterward. * It occurs to me that it might be more efficient to bit-shift a BigDecimal than add it to itself, perhaps recursively; though BigDecimal has no shift methods, (BigDecimal. (.shiftLeft (.unscaledValue x) n) (.scale x)) will multiply x by 2n and may be faster, at least for larger n. (Tested, works; with x = 17 and n = 2, produced 68, as expected.) Title: Re: Implemented the Mandelbrot loop in Lisp. Post by: Pauldelbrot on July 07, 2009, 02:17:02 PM The first part of the optimizer is done. It behaves more or less as described:
(defn optimize-forms-for [bitsprecision iter-forms other-forms squares?] ... ) user=> (optimize-forms-for 55 '[(:= x (* x z))] '[(:= y (+ y x))] false) ; Does nothing [((:= x (* x z))) ((:= y (+ y x)))] user=> (optimize-forms-for 55 '[(:= x (* z x x))] '[(:= y (+ y x))] false) ; Doesn't keep x^2 together. [((:= x (* x (* z x)))) ((:= y (+ y x)))] user=> (optimize-forms-for 55 '[(:= x (* z x x))] '[(:= y (+ y x))] true) ; Now does keep x^2 together. [((:= x (* z (* x x)))) ((:= y (+ y x)))] user=> (optimize-forms-for 55 '[(:= x (* z (/ a) x x (/ b)))] '[(:= y (+ y x))] true) ; Keeps reciprocals together. [((:= x (* (* (/ a) (/ b)) (* z (* x x))))) ((:= y (+ y x)))] user=> (optimize-forms-for 55 '[(:= x (* z (/ a) x x (/ b)))] '[(:= y (+ (* y q y) x))] true) ; Only keeps squares used for bailouts together. [((:= x (* (* (/ a) (/ b)) (* z (* x x))))) ((:= y (+ (* y (* y q)) x)))] user=> (optimize-forms-for 55 '[(:= x (* z (/ a) x x (/ b)))] '[(:= y (+ (* x q x) x))] true) ; CSE can combine the x^2s here. [((:= x (* (* (/ a) (/ b)) (* z (* x x))))) ((:= y (+ (* q (* x x)) x)))] user=> (optimize-forms-for 55 '[(:= x (* z (/ a) x x (/ b)))] '[(:= y (+ (* -1 q x) x (* -1 z) y))] true) ; Keeps negative terms together. [((:= x (* (* (/ a) (/ b)) (* z (* x x))))) ((:= y (+ (+ (* -1 (* q x)) (* -1 z)) (+ x y))))] user=> (optimize-forms-for 55 '[(:= x (* z (/ a) x x (/ b)))] '[(:= y (+ (* -1 q x) x (* -1 z) y k))] true) ; x and k stayed separated [((:= x (* (* (/ a) (/ b)) (* z (* x x))))) ((:= y (+ (+ x y) (+ k (+ (* -1 (* q x)) (* -1 z))))))] user=> (optimize-forms-for 55 '[(:= x (* z (/ a) x x (/ b) (+ x k)))] '[(:= y (+ (* -1 q x) x (* -1 z) y k))] true) ; Now x and k kept together [((:= x (* (* z (+ x k)) (* (* x x) (* (/ a) (/ b)))))) ((:= y (+ (+ (* -1 (* q x)) (* -1 z)) (+ y (+ x k)))))] user=> (optimize-forms-for 55 '[(:= x (* x y (/ (+ (* 3 y x z) 1)) z))] [] true) ; Groups the product xyz twice. [((:= x (* (/ (+ (* 3 (* z (* y x))) 1)) (* z (* y x))))) ()] user=> (optimize-forms-for 55 '[(:= x (* x y (/ (+ (* 3 y w z) 1)) z))] [] true) ; Misses an opportunity to factor out yz. [((:= x (* (/ (+ (* 3 (* z (* y w))) 1)) (* z (* x y))))) ()] user=> (optimize-forms-for 55 '[(:= x (+ x -1 (* x y (/ (+ a 1))))) (:= y (* 2 y (/ b) y (/ (+ a 1)) x))] [] true) ; Groups xy/(a + 1) for CSE in preference to keeping divisions together. [((:= x (+ (* (/ (+ a 1)) (* x y)) (+ x -1))) (:= y (* 2 (* (/ b) (* y (* (/ (+ a 1)) (* x y))))))) ()] user=> (optimize-forms-for 55 '[(:= x (* 3 x z))] '[(:= y (* 3 y z))] true) ; Here it misses the opportunity to factor out 3z, unfortunately. [((:= x (* 3 (* x z)))) ((:= y (* 3 (* y z))))] user=> (optimize-forms-for 55 '[(:= x (+ a (* 3 x)))] '[(:= y (+ b (* 3 x y)))] true) ; Here it doesn't though. [((:= x (+ a (* 3 x)))) ((:= y (+ b (* y (* 3 x)))))] user=> (optimize-forms-for 55 '[(:= x (+ a (* 3 x)))] '[(:= y (+ b (* 6 x y)))] true) ; It will even factor the 3x out of 6xy. Doesn't save a multiply though. [((:= x (+ a (* 3 x)))) ((:= y (+ b (* 2 (* y (* 3 x))))))] user=> (optimize-forms-for 55 '[(:= x (+ a (* 3 x) (* 2 y)))] '[(:= y (+ b (* 6 x y)))] true) ; This time it does! [((:= x (+ (* 2 y) (+ a (* 3 x))))) ((:= y (+ b (* (* 3 x) (* 2 y)))))] user=> (optimize-forms-for 55 (simplify-forms (real-forms *n-complex* '[(:= z (+ (** z 2) c))])) [] true) ; Standard M-set iteration gives usual result. [((:= z-r (+ (* -1 (* z-i z-i)) (+ c-r (* z-r z-r)))) (:= z-i (+ c-i (* 2 (* z-i z-r))))) ()] user=> (optimize-forms-for 55 (simplify-forms (real-forms *n-complex* '[(:= z (+ (** z 3) c))])) [] true) ; Cubic M-set iteration gives useful grouping of the squares. [((:= z-r (+ (* z-r (* z-r z-r)) (+ c-r (* -3 (* z-r (* z-i z-i)))))) (:= z-i (+ (* -1 (* z-i (* z-i z-i))) (+ c-i (* 3 (* z-i (* z-r z-r))))))) ()] user=> (optimize-forms-for 55 (simplify-forms (real-forms *n-complex* '[(:= z (+ (** z 2) c))])) ; Distance estimator aux iteration added to normal M-set calc. (simplify-forms (real-forms *n-complex* '[(:= dz (+ (* 2 z dz) 1))])) true) ; Opportunity missed to group either (* 2 z-r) or (* 2 z-i). [((:= z-r (+ (* -1 (* z-i z-i)) (+ c-r (* z-r z-r)))) (:= z-i (+ c-i (* 2 (* z-i z-r))))) ((:= dz-r (+ (* 2 (* dz-r z-r)) (+ 1 (* -2 (* dz-i z-i))))) (:= dz-i (+ (* 2 (* dz-r z-i)) (* 2 (* dz-i z-r)))))] It does three things:
To do:
I may also see if I can't pre-load the expressions to try to keep together by looking for common factors between pairs of sums and pairs of products using the common-factor-finding code. (THAT code has been written and tested, and is employed elsewhere, but isn't yet used in the optimizer.) That might get rid of many of the red entries above (where it failed to find CSE-optimization opportunities). The distance estimator case, in particular, could save two multiplies if the optimizer had a bit more brains under the hood. I should have that done sometime tomorrow, and be on to other aspects of the job. Title: Re: Implemented the Mandelbrot loop in Lisp. Post by: cKleinhuis on July 07, 2009, 02:43:51 PM cool, perhaps you could try every known fractal formula ( atmost the large newtonian ones ) and provide optimized versions for them :D
Title: Re: Implemented the Mandelbrot loop in Lisp. Post by: Pauldelbrot on July 08, 2009, 05:31:34 AM This code does better than that. Fed a formula in quasi-mathematical notation, it compiles and returns a function that, when called, executes a tight, optimized loop.
I've now got more smarts in it. The optimizer now factors common factors out of sums -- but only if doing so will save a multiply (factoring the x out of (x + 3x2) doesn't, either way there are three multiplies, and factoring it out loses if you want to CSE x2s later. Though not doing so may lose the ability to CSE a (1 + 3x) later, losing the chance to CSE one sum in return for the chance to CSE a product and a different sum is worth it, heuristically. Furthermore, the grouping now prefers pairs seen elsewhere, prefers to CSE out binary products with large numbers to ones with small numbers, and prefers to CSE out nonnumeric binary products, since other tricks may speed up integer multiplies. The above test cases, plus two additional ones, now yield: user=> (optimize-forms-for 55 '[(:= x (* x z))] '[(:= y (+ y x))] false) [((:= x (* x z))) ((:= y (+ y x)))] No change. user=> (optimize-forms-for 55 '[(:= x (* z x x))] '[(:= y (+ y x))] false) [((:= x (* x (* z x)))) ((:= y (+ y x)))] No change. user=> (optimize-forms-for 55 '[(:= x (* z x x))] '[(:= y (+ y x))] true) [((:= x (* z (* x x)))) ((:= y (+ y x)))] No change. user=> (optimize-forms-for 55 '[(:= x (* z (/ a) x x (/ b)))] '[(:= y (+ y x))] true) [((:= x (* (* z (* x x)) (* (/ a) (/ b))))) ((:= y (+ y x)))] No meaningful change. user=> (optimize-forms-for 55 '[(:= x (* z (/ a) x x (/ b)))] '[(:= y (+ (* y q y) x))] true) [((:= x (* (* z (* x x)) (* (/ a) (/ b))))) ((:= y (+ (* y (* y q)) x)))] No meaningful change. user=> (optimize-forms-for 55 '[(:= x (* z (/ a) x x (/ b)))] '[(:= y (+ (* x q x) x))] true) [((:= x (* (* z (* x x)) (* (/ a) (/ b))))) ((:= y (+ (* q (* x x)) x)))] No meaningful change. With indiscriminate sum factoring, loses the (* x x) grouping in the second expression, but with sum factoring limited to cases where it saves a multiply, it keeps the squares together for the win. user=> (optimize-forms-for 55 '[(:= x (* z (/ a) x x (/ b)))] '[(:= y (+ (* x q x) (* 7 x)))] true) [((:= x (* (* z (* x x)) (* (/ a) (/ b))))) ((:= y (* x (+ (* q x) 7))))] Now factoring the sum does save a multiply. It still keeps the other x^2 in one piece, continuing to save a multiply there if a circular bailout is used. user=> (optimize-forms-for 55 '[(:= x (* z (/ a) x x (/ b)))] '[(:= y (+ (* -1 q x) x (* -1 z) y))] true) [((:= x (* (* z (* x x)) (* (/ a) (/ b))))) ((:= y (+ (+ (* -1 (* q x)) (* -1 z)) (+ x y))))] No meaningful change. user=> (optimize-forms-for 55 '[(:= x (* z (/ a) x x (/ b)))] '[(:= y (+ (* -1 q x) x (* -1 z) y k))] true) [((:= x (* (* z (* x x)) (* (/ a) (/ b))))) ((:= y (+ (+ x y) (+ k (+ (* -1 (* q x)) (* -1 z))))))] No meaningful change. user=> (optimize-forms-for 55 '[(:= x (* z (/ a) x x (/ b) (+ x k)))] '[(:= y (+ (* -1 q x) x (* -1 z) y k))] true) [((:= x (* (* (/ a) (/ b)) (* (* x x) (* z (+ x k)))))) ((:= y (+ (+ (* -1 (* q x)) (* -1 z)) (+ y (+ x k)))))] No meaningful change. user=> (optimize-forms-for 55 '[(:= x (* x y (/ (+ (* 3 y x z) 1)) z))] [] true) [((:= x (* (/ (+ (* 3 (* x (* y z))) 1)) (* x (* y z))))) ()] No meaningful change. user=> (optimize-forms-for 55 '[(:= x (* x y (/ (+ (* 3 y w z) 1)) z))] [] true) [((:= x (* (/ (+ (* 3 (* w (* y z))) 1)) (* x (* y z))))) ()] Now groups the yz products. user=> (optimize-forms-for 55 '[(:= x (+ x -1 (* x y (/ (+ a 1))))) (:= y (* 2 y (/ b) y (/ (+ a 1)) x))] [] true) [((:= x (+ (* x (* y (/ (+ a 1)))) (+ x -1))) (:= y (* 2 (* (/ b) (* (* y y) (* (/ (+ a 1)) x)))))) ()] Now groups y2 instead of xy/(a + 1) unfortunately. But ... user=> (optimize-forms-for 55 '[(:= x (+ x -1 (* x y (/ (+ a 1))))) (:= y (* 2 y (/ b) y (/ (+ a 1)) x))] [] false) [((:= x (+ (* x (* y (/ (+ a 1)))) (+ x -1))) (:= y (* 2 (* (/ b) (* y (* x (* y (/ (+ a 1))))))))) ()] Square bailout version keeps that fraction together, and would probably be faster anyway with this system. And the formula compiler will test both and keep the faster one, automatically. Heh, heh, heh. user=> (optimize-forms-for 55 '[(:= x (* 3 x z))] '[(:= y (* 3 y z))] true) [((:= x (* x (* 3 z)))) ((:= y (* y (* 3 z))))] Now groups the 3z products. user=> (optimize-forms-for 55 '[(:= x (+ a (* 3 x)))] '[(:= y (+ b (* 3 x y)))] true) [((:= x (+ a (* 3 x)))) ((:= y (+ b (* y (* 3 x)))))] No change. user=> (optimize-forms-for 55 '[(:= x (+ a (* 3 x)))] '[(:= y (+ b (* 6 x y)))] true) [((:= x (+ a (* 3 x)))) ((:= y (+ b (* 2 (* y (* 3 x))))))] No change. user=> (optimize-forms-for 55 '[(:= x (+ a (* 3 x) (* 2 y)))] '[(:= y (+ b (* 6 x y)))] true) [((:= x (+ (* 2 y) (+ a (* 3 x))))) ((:= y (+ b (* (* 3 x) (* 2 y)))))] No change. user=> (optimize-forms-for 55 (simplify-forms (real-forms *n-complex* '[(:= z (+ (** z 2) c))])) [] true) [((:= z-r (+ (* -1 (* z-i z-i)) (+ c-r (* z-r z-r)))) (:= z-i (+ c-i (* 2 (* z-i z-r))))) ()] Standard M-set iteration still gives usual result. user=> (optimize-forms-for 55 (simplify-forms (real-forms *n-complex* '[(:= z (+ (** z 3) c))])) [] true) [((:= z-r (+ (* -1 (* (* z-i z-i) (* 3 z-r))) (+ c-r (* z-r (* z-r z-r))))) (:= z-i (+ (* (* z-i z-i) (* -1 z-i)) (+ c-i (* (* z-r z-r) (* 3 z-i)))))) ()] Cubic M-set iteration still gives useful grouping of the squares. Different grouping of numeric terms, because it now sees opportunities to CSE two occurrences each of (* -1 z-i) and (* 3 z-r) but ends up choosing to keep the squares together instead. The (* -1 (* (* z-i z-i) (* 3 z-r))) is inefficient, but will be cleaned up by the not-yet-implemented fourth pass, which will move numeric factors out of non-CSE-able terms like that non-replicated (* 3 z-r). user=> (optimize-forms-for 55 (simplify-forms (real-forms *n-complex* '[(:= z (+ (** z 2) c))])) (simplify-forms (real-forms *n-complex* '[(:= dz (+ (* 2 z dz) 1))])) true) [((:= z-r (+ (* -1 (* z-i z-i)) (+ c-r (* z-r z-r)))) (:= z-i (+ c-i (* z-i (* 2 z-r))))) ((:= dz-r (+ (* -1 (* dz-i (* 2 z-i))) (+ 1 (* dz-r (* 2 z-r))))) (:= dz-i (* 2 (+ (* dz-r z-i) (* dz-i z-r)))))] Now groups (* 2 z-r) in two places, but at the third possible site prefers to factor the 2 out of the sum instead of factor (* 2 z-i) and (*2 z-r). This saves only one multiply by 2 instead of two, unfortunately. Title: Re: Implemented the Mandelbrot loop in Lisp. Post by: Pauldelbrot on July 08, 2009, 07:38:15 AM user=> (optimize-forms-for 55 (simplify-forms (real-forms *n-complex* '[(:= z (+ (** z 2) c))])) (simplify-forms (real-forms *n-complex* '[(:= dz (+ (* 2 z dz) 1))])) true) [((:= z-r (+ (* -1 (* z-i z-i)) (+ c-r (* z-r z-r)))) (:= z-i (+ c-i (* z-i (* 2 z-r))))) ((:= dz-r (+ (* -1 (* dz-i (* 2 z-i))) (+ 1 (* dz-r (* 2 z-r))))) (:= dz-i (* 2 (+ (* dz-r z-i) (* dz-i z-r)))))] Now groups (* 2 z-r) in two places, but at the third possible site prefers to factor the 2 out of the sum instead of factor (* 2 z-i) and (*2 z-r). This saves only one multiply by 2 instead of two, unfortunately. Fixed. I made it smarter. Pass 1 does sum common-factor extraction. Pass 2 gathers repeated possible binary subexpressions. Pass 3 is the one I improved further: when it sees a product, it sees if it simplifies into a sum via the distributive rule. If so, then it tries both versions, counting how many binary product subexpressions it can eliminate because they occur elsewhere in each case. It weighs the change in this number from undoing the common-factor extraction (extra multiplies) with the multiplies saved by the extraction (one fewer than the number of terms in the post-extraction sum that are not 1 -- one fewer for the single multiply by the whole sum, and ignoring sum terms equal to 1 because 1*common-factor is not a real multiply). Then it chooses. (This glosses over some major subtleties I have spent the past two hours working out.) All of the tests except the last continue to give the same results as before, while the last yields: [((:= z-r (+ (* -1 (* z-i z-i)) (+ c-r (* z-r z-r)))) (:= z-i (+ c-i (* z-i (* 2 z-r))))) ((:= dz-r (+ (* -1 (* dz-i (* 2 z-i))) (+ 1 (* dz-r (* 2 z-r))))) (:= dz-i (+ (* dz-i (* 2 z-r)) (* dz-r (* 2 z-i)))))] Saved: two squarings (assuming circular bailout), one doubling of z-i, two doublings of z-r, and a partridge in a pear tree. Distance estimator can blaze now. Next up: make it preferentially group expressions involving no variables appearing on the LHS of assignments, overriding all else. These can be factored out of the iteration loop entirely. Then I'll implement and test the fourth pass and run integration tests. Title: Re: Implemented the Mandelbrot loop in Lisp. Post by: Pauldelbrot on July 08, 2009, 08:13:57 AM Next up: make it preferentially group expressions involving no variables appearing on the LHS of assignments, overriding all else. These can be factored out of the iteration loop entirely. Done: user=> (optimize-forms-for 55 '[(:= x (* z x w))] '[(:= y (* z y w))] true) [((:= x (* x (* z w)))) ((:= y (* y (* z w))))] user=> (optimize-forms-for 55 '[(:= x (* z x w)) (:= w z) (:= z w)] '[(:= y (+ z y w))] true) [((:= x (* w (* z x))) (:= w z) (:= z w)) ((:= y (+ w (+ z y))))] Note how it preferentially clumps the uniterated, per-pixel constants z and w in the first instance (green); in the second, when those have been made no longer constant (red, added assignment expressions) the preference disappears (blue and purple). This means mappings expressed with lots of (* 2 pi) or similarly won't have that being calculated every iteration. It will get, or stay, grouped, and as a big common subexpression with only constant terms it will get factored out of the inner-loop code. This will be particularly important for Antimatter, which has an expensive (exp (* 2 pi i a)) in its expression. Title: Re: Implemented the Mandelbrot loop in Lisp. Post by: Pauldelbrot on July 08, 2009, 12:55:20 PM user=> (optimize-forms-for 55 (simplify-forms (real-forms *n-complex* '[(:= z (+ (** z 3) c))])) [] true) [((:= z-r (+ (* -1 (* (* z-i z-i) (* 3 z-r))) (+ c-r (* z-r (* z-r z-r))))) (:= z-i (+ (* (* z-i z-i) (* -1 z-i)) (+ c-i (* (* z-r z-r) (* 3 z-i)))))) ()] Cubic M-set iteration still gives useful grouping of the squares. Different grouping of numeric terms, because it now sees opportunities to CSE two occurrences each of (* -1 z-i) and (* 3 z-r) but ends up choosing to keep the squares together instead. The (* -1 (* (* z-i z-i) (* 3 z-r))) is inefficient, but will be cleaned up by the not-yet-implemented fourth pass, which will move numeric factors out of non-CSE-able terms like that non-replicated (* 3 z-r). That fourth pass has now been implemented and tested fairly thoroughly, and it works. In particular, it changes the above to: [((:= z-r (+ (* -3 (* (* z-i z-i) z-r)) (+ c-r (* z-r (* z-r z-r))))) (:= z-i (- (+ c-i (* 3 (* (* z-r z-r) z-i))) (* (* z-i z-i) z-i)))) ()] Note that the 3 has been pulled up and combined with the -1. The 3 would have stayed put if it had been part of a repeated subexpression, but it was not. The distance estimator becomes: [((:= z-r (- (+ c-r (* z-r z-r)) (* z-i z-i))) (:= z-i (+ c-i (* z-i (* 2 z-r))))) ((:= dz-r (- (+ 1 (* dz-r (* 2 z-r))) (* dz-i (* 2 z-i)))) (:= dz-i (+ (* dz-i (* 2 z-r)) (* dz-r (* 2 z-i)))))] Note that here and above some expressions like (* -1 foo) have been changed into subtraction operations in their enclosing additive expressions. Additionally, note that (* dz-i (* 2 z-i)) did not have its 2 brought out and combined with the -1 to produce a -2; instead, the factor of -1 became a subtraction. This is in contrast to the 3 and -1 combining in the previous example, and is a result of (* 2 z-i) being a recurrent subexpression. As a result of this behavior, a multiply is saved in preference to saving a subtraction. (This looks slightly bad with bignums, since the (* 2 z-i) would become a (+ z-i z-i) and it therefore appears to be saving a mere add at the cost of a more expensive subtraction in this case. But if it did hoist the 2 and make a -2 factor, this would either be a very expensive multiply, or else a subtraction and add. So the subtraction is actually unavoidable, except at the cost of a more expensive multiply, so by keeping the 2 where it is it saves a bignum add.) The upshot of these last few posts is this: the optimizer is essentially complete. Oh, it will probably get the odd tweak now and again but the bulk of the job is done there and it's time to move on to integration testing, and then start layering functionality above the iteration loops. Title: Re: Implemented the Mandelbrot loop in Lisp. Post by: Pauldelbrot on July 08, 2009, 03:51:01 PM Results of the integration tests are in:
............................................................. ...........................*#*............................... ............................*................................ .......................**#######**#.......................... .....................**############.......................... .............*.#....*##############*......................... ............*#####***##############*......................... #*********#########*##############........................... ............*#####***##############*......................... .............*.#....*##############*......................... .....................**############.......................... .......................**#######**#.......................... ............................*................................ ...........................*#*............................... ............................................................. That's with an iteration max of 1000 and 72-bit mantissas. Yeah, it's a very shallow deep zoom, and it's using the new expression optimizer. The optimizer's only effect on this particular mapping being to change 2xy to (xy + xy) in the next-y part of the calculations (which it does because bignum multiplies are so much slower than bignum adds). (*s indicate points whose periods were not detected.) Still, everything seems to hang together and work together at this point. Now work can begin on building the higher-level stuff. The code I executed at my REPL to generate that, by the way, being (def *l* (make-iteration-loops 72 *n-complex* 2 2 '[(:= z (+ (** z 2) c))] '[] nil '(0 0) '(-1.9913844699254 0))) (asciibrot 30 15 *l* 1000) Yes, it really does transform '[(:= z (+ (** z 2) c))] on the fly into an optimized, for non-bignums only 7 nanosecond, loop that can be called upon at any future time, all by itself without further human intervention. Well, almost. The quoted lists at the end of the first line are a z and c pair that gives rise to an ergodic orbit on the spike. The benchmarking make-iteration-loops uses to choose alternative loop generation parameters didn't work too well with dummy loops -- either for code cache or branch prediction reasons, I'm not sure which. Using a live loop presents a problem if it bails out too early. So an interior point needs to be supplied, and for it to decide between circular and square trap around finite attractors/period checks, it needs not to converge too fast. I'll experiment to see how sensitive the timing data are to the speed of escape/convergence of the supplied point. If worst comes to worst, for complex single-variable maps and quats, hypers, and any other algebraic rings that embed the complex plane, solving for the fixed points and then for parameter points such that the derivative at some fixed point is Title: Re: Implemented the Mandelbrot loop in Lisp. Post by: Pauldelbrot on July 10, 2009, 03:41:58 PM Added another loop-generation option: fixed attractor search. The loop in this case is generated to check every iteration for approach to a particular point that is baked in, rather than a parameter to the loop function. The advantage is that the tests can be CSE'd with the rest of the calculations. If the point is the origin, and both tests are for circular rather than square targets, the same squares and sum used for bailout get recycled, for instance.
I also have derivatives and code to take an input expression and spit out an expression that, when iterated, performs Newton's method on the original one. I've used this to make a working root-finder. Combining these all yields the critical points for Mandelbrot calculations. Oh, and the derivative code can output Jacobian matrix calculations for multi-variable systems, too. The bad news: tests have determined that the loop timings code really does need a trapped point to do its job at all well. Even adding code to calculate the function call overhead (for the call itself, and the loop function's preamble code that executes before entering the loop proper) and account for it, the results were all over the board for z = 2.0001 as a test point (worst case -- 1 iteration and it escapes). I think I shall have to code it to just generate a normal escaping-points loop with a circular bailout, start rendering, and if it hits a trapped point, do the timings and everything and switch to the original "plan" then. With a modification in that it should identify basins whose attractors are unvarying with pixel coordinates and generate loops for those attractors to use whenever it hits those basins again. Title: Re: Implemented the Mandelbrot loop in Lisp. Post by: Nahee_Enterprises on January 02, 2010, 11:02:23 PM It's shocking that nobody much else is apparently developing fractal software in Lisp, given what it's capable of. :) The first part of the optimizer is done. It behaves more or less as described.... Whatever happened to this ?? Was another thread started somewhere else, or was this project dropped to the wayside ?? Title: Re: Implemented the Mandelbrot loop in Lisp. Post by: Pauldelbrot on January 03, 2010, 04:24:40 AM Got back-burnered as I ran into a thorny problem to solve (unrelated to iteration loop generation) and got busy with other things.
Don't worry, more will be done, and perhaps soon. Title: Re: Implemented the Mandelbrot loop in Lisp. Post by: Nahee_Enterprises on January 03, 2010, 04:47:16 AM Got back-burnered... and got busy with other things. Don't worry, more will be done, and perhaps soon. Thanks for the quick reply. No rush on my part, was just curious after going through some of the postings over the past few months. (Still catching up on several hundred postings.) ;) Will keep an eye open for when you do get back around to it. Title: Re: Implemented the Mandelbrot loop in Lisp. Post by: erstwhile on September 03, 2014, 08:44:22 PM I'm probably going to transfer my Supercollider Mandelbrot Music generator to Clojure / Overtone since Supercollider doesn't support arbitrary-precision numbers. This could be a good starting point for implementing m-set calculations in Clojure. |