Finally got around to doing the math on the zoom-movie computation idea.
|
Heavy Math Ahead -- Proceed With Caution
| |
The previous best known method for computing a zoom movie computes keyframes at zooms that are successive factors of some quantity μ. Supposing we want a final frame resolution of 1024x768 and don't mind if the outer part of each frame is not antialiased, but require it not be undersampled, the keyframe resolution will have to be larger than 1024x768, as a final frame will be composited using the last keyframe bigger than the target frame and (conceptually) all the ones smaller than the target frame, so the center part of a given final frame is densely oversampled and the outer part is undersampled if the keyframe resolution is not larger than 1024x768 by a factor of μ.
Supposing the final zoom magnification is ω and the initial zoom magnification is 1. The number of keyframes is then the base-μ logarithm of ω, or ln ω/ln μ. The number of square pixels per keyframe is 1024∙768∙μ
2, and the total number of pixels to compute is 1024∙768∙μ
2ln ω/ln μ. The first derivative of this in μ is 1024∙768∙ln ω∙(μ
2/ln μ)' and has zeros where (μ
2/ln μ)' has zeros. This latter expression is, by a combination of the product rule and the chain rule, 2μ/ln μ - μ/(ln μ)
2 = (μ/ln μ)(2 - 1/ln μ). The latter is the one component in the product that can be zero for nonpathological values of μ and attains that value when ln μ = 1/2, or μ = √
e. The second derivative of the total pixel count in μ is 1024∙768∙ln ω∙(μ
2/ln μ)'' and has the same sign as (μ
2/ln μ)'' = (μ/ln μ)(2 - 1/ln μ)' = (1/(ln μ)
3) + (2 - 1/ln μ)(1/ln μ - 1/(ln μ)
2). Substituting ln μ = 1/2 in this gives 8, since the first term in the sum is 8 and the second is zero due to the factor of (2 - 1/ln μ). So, the second derivative is positive at μ = √
e and this is therefore a minimum, rather than a maximum, an inflection point, or a singularity.
It follows that a zoom factor μ of √
e is optimum for computing the minimum number of pixels to generate via keyframe interpolation a movie whose frames are not undersampled anywhere.
For a final zoom factor of ω = 10
143 and a target resolution of 1024x768, the keyframe resolution needs to be 1689x1267 and the number of keyframes is 673 (using a corrected ω = 10
146 to multiply in the roughly thousand-fold range of magnifications contained in the initial frame at that resolution), for a pixel count of 1,440,195,099 -- nearly one and one-half billion.
Using the Mercator strip trick gives a different result. To avoid undersampling the final frames, the horizontal resolution of the strip needs to be the circumference, in pixels, of a circle through the frame corners. For a 1024x768 frame, the diameter of this circle is the diagonal length of the image, which is 1280 pixels, and the circumference is π times this, or 4022 pixels. The height is more complex to calculate. Each row of pixels corresponds to a circle of some radius about the zoom's target. To avoid undersampling, the one through a particular frame's corners and the next smaller one must not have gaps, so the next smaller one should be only two pixels less in radius; the zoom factor to shrink the larger circle to the smaller will therefore be 1280/1278. This gives us a μ value that can be used to compute the total number of circles needed to reach our target zoom.
So, the number of vertical rows is the logarithm of ω = 10
146 divided by the logarithm of μ; it comes out as 214986 rows. Multiplying by the 4022-pixel width gives 863,467,092 -- fewer than two-thirds the pixels to calculate as in the optimal keyframe-interpolation case!
It's not hard to see where most of the improved efficiency comes from. Each keyframe in the keyframe case contains samples close to samples from earlier keyframes -- 1/
e or nearly forty percent of a keyframe's pixels contain a sample from the immediately preceding keyframe, which could be recycled as a sample in that keyframe. Presuming that samples in still earlier keyframes landing in the current keyframe land in a subset of the same pixels, using the samples from previous keyframes in subsequent keyframes allows all keyframes except the first (which is very low iteration and low precision, so super-fast) to be reduced to 60% new-to-calculate pixels. For a 1689x1267 keyframe, computing only 1 - 1/
e of the pixels (rounded up, as with all pixel counts in the foregoing) gives 1352633 pixels to compute per keyframe and 910,322,009 pixels total. This is still larger than the Mercator method's 863,467,092 for the same zoom sequence.
There is also one more advantage to the Mercator method and one more disadvantage to the keyframe method with recycled pixels. The latter has the problem that the samples "jitter" relative to the pixel centers, and indeed with the linear dimensions expanding by a factor of only about 1.6 per keyframe, new pixels are inserted between only some rows and columns of existing pixels to generate each new keyframe (for new pixels to be between every row and every column of existing pixels they'd have to comprise at least 75% of the whole, not just 63 or so) and distortions may get magnified as successive keyframes are generated.
The Mercator method, meanwhile, enjoys a smooth, rather than a step-function, increase in oversampling level towards the center of a given final frame from its outer edges. Noticeable rectangular boundaries between regions with little and with more antialiasing can result from the layering of keyframes into the final frame with the keyframe method, which the Mercator method avoids.
The Mercator method, however, may be a bit more math-intensive. Each frame of our 1024x768 video spans roughly three orders of magnitude of size, and turns out to involve 4576 of the concentric circles that become rows in the Mercator mapping. So, instead of being made from a few 1689x1267 keyframes via affine transformations, it's made from a 4022x4576 block of raw data via a more complex, nonlinear transformation that maps that raw data into the image to drop samples into the frame's pixels. (All of the remaining circles either ring the frame without intersecting it, or lie entirely in the frame's central group of four pixels.)
Another complication is the question of whether png or other image formats permit dimensions as large as 214986 pixels in width or height, and whether that much data would fit in memory happily -- 8-bit-per-channel RGB data for this hypothetical zoom would fill more than 2.5GB of RAM. These numbers get worse the deeper the zoom goes, as the height depends on ω as well as frame resolution, though the width depends only on frame resolution.
Most likely, the data would have to be generated as a sequence of smaller images, say 4022x4576 blocks, which when stacked vertically amount to the equivalent of a single, much taller image. Choosing the block height to equal the height corresponding to the scale range spanned in a single frame means the movie maker software would have to read exactly two of these blocks to generate one output frame in the general case. The blocks' number, rather than their height, now depends on ω, and their resolution depends only on frame resolution; the blocks take roughly 60 megabytes in this case for 8-bit-per-channel RGB. More generally, they will take roughly 24 times as much space as the frames themselves will, more or less independently of resolution. (The height, if made equal to a frame's magnification span, grows a bit faster than linearly due to an added logarithmic factor.) 120 megabytes suffices to hold the raw data to compute a frame. The amount of I/O is not appreciably increased either from discarding and reloading of blocks, as adjacent frames will usually use the same two blocks; for a zoom movie that never goes into reverse at any point, the movie maker can load and discard blocks once each, sequentially, holding only two in memory at a time and reading each byte of block data exactly once, in principle performing identically to one loading a monolithic image file holding all of the data. (In practice, if the data is stored in a common format like png, there will be a bit of overhead from each one having separate headers.)
Of course, using fixed-size blocks runs into a potential problem if the zoom target is reached partway down the bottom block, instead of exactly at its bottom. But the extra rows needn't be computed; the bottom block could be padded with black instead, which would work particularly well if the zoom is centered on a point contained in a minibrot, as the smallest computed circle can then just be the smallest about that point that contains escaping points, and in principle the zoom then goes on infinitely.
More generally, the bottom block can be padded with the color of the leftmost pixel on the bottom actually-computed row, and if this smallest circle about the zoom center is contained in a minibrot colored a solid color or is away from dendrites and small enough in comparison to nearby dwell bands that the color of the disk it bounds is pretty much uniform, this will give good results.
As with the advanced keyframe interpolation mechanism, a movie maker based on the Mercator algorithm can synthesize any frame, with any orientation, whose magnification lies within the range spanned by the computed circles. Indeed, rotation of a frame is actually simpler in this scheme, as it can be accomplished simply by cyclically permuting the columns of the input data! Whereas in the keyframe interpolator it requires complex, extra coordinated affine transforms of keyframes and it requires one extra keyframe, zoomed out from the first keyframe needed without rotation, as the image corners will in general not lie in the first keyframe of larger width but will lie in the second (for image proportions not ridiculously far from square -- the Mercator method, by contrast, doesn't have any problems with long, thin frames or tall, narrow frames, but loses efficiency there as larger and larger portions of the outer circles are not used in non-heavily-oversampled regions of the frames -- unless, of course, the movie is also spinning quite fast! In which case no version of the keyframe method would avoid horrid undersampling of the regions far from the image center, so, advantage Mercator again).