Welcome to Fractal Forums

Fractal Math, Chaos Theory & Research => Mandelbrot & Julia Set => Topic started by: lycium on November 30, 2006, 09:13:48 AM




Title: mini supercomputer
Post by: lycium on November 30, 2006, 09:13:48 AM
i recently bought a geforce 8800 gts and have managed to tear myself away from playing neverwinter nights 2 long enough to make this post ;)

it's no secret that modern gpus are monsters of floating point computation and they have plentiful, dedicated memory bandwidth with which to support it. the variation of the 8800 i've purchased has 96 processors, almost completely ieee754 compliant, and 640mb of memory rated at 64gb/s. the most exciting thing about this new, general architecture is that it can be used for non-graphics applications, and to this end nvidia have a c compiler, libraries and special driver (which can work concurrently alongside the normal opengl/directx one) programmers can use to tap the hardware capabilities directly - without the need to recast the computational problem in terms of textures and pixel shaders etc. nvidia recently accepted my cuda developer application request, and while i have the software it's unfortunately not compatible with 64bit windows xp yet :( in fact the current xp64 driver is terrible:

(http://www.fractographer.com/propaganda/geforce8800.gif)

however, when stuff does work, the results can be quite impressive indeed: http://www.fractographer.com/propaganda/gf8800mandelbrot.png (this is why i posted in this section)

when they get their 64bit stuff up to snuff i'll certainly be looking to move large parts of my 2d and 3d fractal apps' computation to the gpu in an effort to get them working in realtime and/or being able to render high quality hd resolution video :)

more info on the architecure: http://techreport.com/reviews/2006q4/geforce-8800/index.x?pg=1


Title: Re: mini supercomputer
Post by: David Makin on November 30, 2006, 03:03:33 PM
Hmm - I can't afford more than around 100 max for a GFX card :-(
I'd be most interested in using the GPU to render 3D fractals (of all types).


Title: Re: mini supercomputer
Post by: lycium on December 01, 2006, 02:57:21 AM
ray tracing quaternionic julia sets is one of the earliest examples of using the gpu to render 3d fractals; as i alluded to previously earlier implementations had to recast the problem in terms of "standard" 3d rendering operations - in this case a pixel shader was rendering to a screen-sized quad:

http://graphics.cs.uiuc.edu/svn/kcrane/web/project_qjulia.html

that runs on really cheap boards these days, btw. i'm pretty sure you could pick up a geforce 7600 gt quite affordably, and that's already a very competent board - it doesn't have the outright ridiculous speed of the expensive ones, but it's highly programmable via glsl (opengl) or hlsl (d3d). it would trounce even one of intel's latest quad-core machines in that mandelbrot test i'm sure.


Title: Re: mini supercomputer
Post by: dentaku2 on January 27, 2007, 12:13:34 PM
This sounds very exciting! But it has to be supported by the standard graphics APIs (OpenGL). If there would be an OpenGL/Java API to this, I would love to integrate it in my fractal app! Are there any notes about this?


Title: Re: mini supercomputer
Post by: David Makin on January 27, 2007, 03:57:39 PM
This sounds very exciting! But it has to be supported by the standard graphics APIs (OpenGL). If there would be an OpenGL/Java API to this, I would love to integrate it in my fractal app! Are there any notes about this?

There is sample code using GPU code fragments to generate the Mandelbrot on ATI's site in the developer section - uses OpenGL:

http://ati.amd.com/developer/indexsc.html (http://ati.amd.com/developer/indexsc.html)

I've downloaded the relevant stuff but haven't tried it yet.


Title: Re: mini supercomputer
Post by: Nahee_Enterprises on January 27, 2007, 05:08:58 PM
David Makin wrote:
>
>    There is sample code using GPU code fragments to generate the
>    Mandelbrot on ATI's site in the developer section - uses OpenGL:
>    http://ati.amd.com/developer/indexsc.html

It appears that the only download link that is working from the above URL is for:
    http://ati.amd.com/developer/sdk/RadeonSDK/Samples/Downloads/Mandelbrot.zip (http://ati.amd.com/developer/sdk/RadeonSDK/Samples/Downloads/Mandelbrot.zip)

The Text file code fragments are not available.



Title: Re: mini supercomputer
Post by: lycium on January 28, 2007, 06:03:46 PM
the screenshot i posted is in fact of an opengl demo, one that comes with the nvidia sdk available from their website (src included).


Title: Pixel shader fractals?
Post by: dentaku2 on January 29, 2007, 07:56:22 PM
Would it be possible to squeeze fractals into pixel shader programs? Fortunately, many fractals can be generated with very little code, so the max shader size of todays graphics cards should be enough for this, right? That could be a good possibility for real time fractals ... give me OpenGL code! :o :o :o


Title: Re: mini supercomputer
Post by: lycium on January 30, 2007, 12:05:03 AM
Would it be possible to squeeze fractals into pixel shader programs? Fortunately, many fractals can be generated with very little code, so the max shader size of todays graphics cards should be enough for this, right? That could be a good possibility for real time fractals ... give me OpenGL code! :o :o :o

the screenshot i posted is in fact of an opengl demo, one that comes with the nvidia sdk available from their website (src included).

http://www.google.co.za/search?q=nvidia+opengl+mandelbrot&start=0&ie=utf-8&oe=utf-8&client=firefox-a&rls=org.mozilla:en-US:official
http://download.nvidia.com/developer/SDK/Individual_Samples/featured_samples.html

then ctrl+f, "mandelbrot", ...

just download the nvidia sdk, get some geforce6-level hardware, find some java opengl bindings, see if they support the glsl extensions, get familiar with the syntax and begin to want a g80 :D

there are 96 stream processors on the gts, capable of both near perfect ieee754 floating point and integer ops, running at 1.2ghz and attached 640mb of fast memory via a 320bit bus. the shaders/execution kernels can also write memory, so the whole thing is very flexible. nvidia's cuda program provides a c compiler and some very nice libraries (relevant to us fractalists); you have to agree to a nondisclosure agreement and i really can't give perf numbers. in any case, whatever you do with the gpu had better be more computationally intensive than a mandelbrot ;)


Title: Re: mini supercomputer
Post by: dentaku2 on January 30, 2007, 12:11:27 AM
Of course, this looks very good. But I don't touch anything that is not pure Java. Even Java 3D and JOGL needs system specific libraries and don't offer software only modes - for performance reasons, of course. But my first requirement is platform independency and ease of installation/deployment.


Title: Re: mini supercomputer
Post by: Duncan C on May 10, 2007, 05:27:18 PM
lycium,

That card sounds like quite a beast.

Did you ever write a fractal renderer using it's floating point abilities?

And does the math library that's included support double precision? It would be slower than single precision, obviously, but 96 processors doing double precision math with only single precision hardware support would still be a WHOLE LOT faster than 10 or 20 double-precision processors, especially with the fast memory in graphics cards.

I read up on this card based on your post. How much do you have to worry about the architecture of the card? (where the processors are in groups with shared memory access, etc.)  Or is all that transparent to you? How does the compiler deal with parallel processing?

This card would be worth buying as a floating point engine if it can be used easily.

Duncan C


Title: Re: mini supercomputer
Post by: lycium on May 11, 2007, 02:07:59 AM
the CUDA platform, which has been liberated from its NDA-only status btw, does indeed provide double precision functionality, but i think it's both:

1. beta only
2. not supported by hardware, i.e. emulated (10x slower or so than native fp)

the next generation after g80 will apparently have native double precision capabilities.

and yeah, talking about CUDA, you do have to be mindful of the 3-level memory hierarchy to get the best possible performance. since i've written low level code from an early age it's not really all that different a landscape (modern performance coding on the cpu is mostly about cache awareness too), but if you're new to it you might find it difficult; again, the same is true of writing fast code on the cpu, so in the end there's just no free lunch. this is slowly becoming the case too with cpus: ghz is out, multicore is in - that's simply how you best spend transistors these days.

however, the performance of gpus for certain types of parallel workloads (high arithmetic intensity) is just much higher than for cpus, and if you can exploit that then i'd say it's worth it.


Title: Re: mini supercomputer
Post by: doncasteel8587 on June 13, 2007, 11:39:36 PM
Of course, this looks very good. But I don't touch anything that is not pure Java. Even Java 3D and JOGL needs system specific libraries and don't offer software only modes - for performance reasons, of course. But my first requirement is platform independency and ease of installation/deployment.

Just a note.... Recent J3D forum discussions had some comments about doing away with the platform specific libraries. I didn't get the details, but if it happens it will be a huge boost to J3D!


Title: Re: mini supercomputer
Post by: alister on August 14, 2007, 05:59:02 AM
I've often wondered how I could use the gpu for a little extra processing power. The best I ever did was use the openGL libs to do some matrix opporations. That was some time ago, and I've long since lost the project files in a hard drive crash.


Title: Re: mini supercomputer
Post by: Duncan C on May 04, 2008, 04:12:35 PM
i recently bought a geforce 8800 gts and have managed to tear myself away from playing neverwinter nights 2 long enough to make this post ;)

it's no secret that modern gpus are monsters of floating point computation and they have plentiful, dedicated memory bandwidth with which to support it. the variation of the 8800 i've purchased has 96 processors, almost completely ieee754 compliant, and 640mb of memory rated at 64gb/s. the most exciting thing about this new, general architecture is that it can be used for non-graphics applications, and to this end nvidia have a c compiler, libraries and special driver (which can work concurrently alongside the normal opengl/directx one) programmers can use to tap the hardware capabilities directly.

lycium,

I browsed through the CUDA APIs the other day. Just about everything I looked at was geared towards same instructions, different data (SIDD) processing, and even tailoring your code to avoid branching, because that reduces concurrency and cache hits.

I'm interested in using the processor farm in one of these cards for general computing, where sometimes large numbers of processors will be doing completely different work. (Imagine running a hybrid genetic algorithm with neural networks, so the processor farm evolves optimized solutions to a problem.)

Where their work is parallel, I would, of course, follow the guidelines for parallel processing to get the most out of the multi-level cache architecture on the chip.

I've only done a little multi-threaded development before, and that on traditional multi-processor machines where each processor has it's own cache. I'm hardly an expert on the subject.

Do you think the architecture of NVIDIAs 8xxx series (and later) chips is so tuned to SIDD applications that you'd bring it to it's knees doing separate execution paths on different chips?


Duncan


Title: Re: mini supercomputer
Post by: lycium on May 05, 2008, 12:43:28 PM
it's SIMD (http://en.wikipedia.org/wiki/SIMD), and i'd strongly suggest tackling altivec in your situation, since you have access to an older mac and this experience will let you dabble with cell spu programming (via a ps3), which will be a kind of nirvana for you in terms of mandelbrot flops i think ;)

yeah it's a good idea on modern processors to use masks instead of small-scale branching because processors are deeply pipelined to reach high levels of speed (in ghz) and efficiency (in terms of balancing compute vs bandwidth resources); this applies even moreso to gpus, whereas the cell broadband engine is a nice agile mix of both.


Title: Re: mini supercomputer
Post by: keldor314 on June 10, 2008, 10:47:41 PM
Actually, concerning the Geforce 8x00, SIMD isn't completely correct either - the 8800GTX, for instance, has 16 fully seperate multiprocessors, each with 16 ALUs which are truely SIMD, for a total of 128 ALUs.  What this means is that within each multiprocessor, the 16 SIMD processors will run in lockstep with each other, BUT each of the 16 multiprocessors can branch seperately.  Think of it like a 16 core processor, where each processor operates on a 16-wide group of registers simultaneously.

In practice, this will give you perhaps 90% efficiency on fractal rendering, because these 16 units of a given "core" most likely are operating on a 4x4 grid of pixels, meaning that the vast majority of the time, the bailout of one of these pixels is not much sooner than any of the others.  The unit must iterate until all 16 pixels are bailed out (or MAX_ITERATIONS is reached), but since all 16 pixels are in a pretty tiny area (about the size of this comma , assuming you're using a small font) they will mostly end up finishing all about at the same time.

Branches are handled by masking out the computations of the units that take the branch, so that they effectively do nothing until all units are "finished" and ready to join in beyond the branch.  This means that for a iteration loop, for instance, the units that have hit the bailout will "sit around" and do nothing until all units (in the 4x4 grid, remember) have bailed out.  At this point, the multiprocessor can proceed down to the next line of code.  Note that the multiprocessor doesn't necessarily have to do the whole MAX_ITERATIONS, just enough that all the processors have reached bailout.

It's interesting to contrast this to the Radeon 2800 (I THINK - I'm not up as well on their specific GPU numbers...), which has 4 separate multiprocessors, each with 16 SIMD processors, each of THOSE in turn having 5 ALUs for a collective total of 320 ALUs.  The thing to note here (and why the 320 ALUs do not trump the 128 of the GeForce) is that the groups of 5 ALUs are bound together on the same pixel, rather than split among multiple pixels.  This leaves it to the programmer to somehow get their code to do the work for a single pixel to somehow spread out its arithmetic among the 5 ALUs.  (Actually, it's a bit more complicated than that, since the 5 ALUs are not exactly SIMD, rather, they are 4+1 SIMD, with 4 standard ALUs in SIMD, and one special super-ALU which kinda supplements the others - e.g. the 5th unit is the only one that has hardware accelerated transcendental operations (sin, cos, e^x, ect) and you might have a opcode that tells the 4 units to multiply, then add the 5th unit to each of their results)

ANYWAY... before I make your headaches from reading this any worse (it's HARD to explain this sort of thing without going even further into the jargon than I already did...), the ultimate point is GPUZ DO L33T FRACTALZ!!11!  XD

For example, I have a screensaver I wrote which can render some pretty hairy fractals (e^(sin(a*z)+cos(b*z) for instance) at a smooth framerate (30-60fps on average) with 50 or so iterations at a whopping screen resolution of 2048x1536!  What do you do with fractals rendered in 1/40th of a second?  Realtime animation, of course!  Watch the fractal change as the values of a and b change...

The downside to all this (and it's a big one) is that the GPU only does native single precision floating point - no zooming (at least not much) for you!  Still, you can of course write code in the shader to emulate higher precision the same way some fractal programs do deep zoom - it will just run a good deal slower, although still faster than the CPU...

By the way (insert shameless plug) if you want to see that screensaver, you can download it yourself from https://sourceforge.net/projects/gpujuliascreens/ (https://sourceforge.net/projects/gpujuliascreens/).  Note that it uses DirectX 10, and will not run on anything less, so don't bother if you don't have Windows Vista and a new graphics card (Geforce 8x00+, Radeon 2*00+).


Title: Re: mini supercomputer
Post by: cKleinhuis on June 10, 2008, 10:58:49 PM
hey there, great to see another GPU fanatic, it is true, the single floating point values are far to few to achieve high deep rates, but this way you stay realtime :D
have you experience with the extended floating point unit on nvidia cards ? i heard they can do 64 bit floating point precision - or something, but on the gpu ... O0 O0


Title: Re: mini supercomputer
Post by: cKleinhuis on June 10, 2008, 11:06:17 PM
i see you are using directx for rendering ... why ? i do hope you are not using dx10 .... but be sure to read about my alternating mandelbrot method, this way you can easily combine each of your fractal formula together with many many parameters, i have used this method in my fractalmovies.com screensaver

in fact the method yields fantastic new fractals without changing the calculation complexity

it is simply done by using more than one seed in the iteration, e.g. mandelbrot is:

z2+c

so, if you are using c1 and c2 throughout the iteration, it yields in smooth animation, in fact you can use any number of seeds in one iteration, clearly no more than iteration depth,

but beside of that, it is easy to just switch the whole formula in each step, even this yields smooth animatable results

please check out my screensaver demonstrating this method with a plain mandelbrot formula
http://fractalmovies.com/index.php?option=com_remository&Itemid=21&func=fileinfo&id=19

i have also written some tiny explanation about that:
http://fractalmovies.com/index.php?option=com_content&task=view&id=31&Itemid=34

please take your time to think about it, i think it would be perfect in a gpu accellerated screensaver running on pixel shader 3.0 hardware :D

cheers
ck
 O0


Title: Re: mini supercomputer
Post by: lycium on June 10, 2008, 11:30:26 PM
hmm, there's a few extremes going on in this topic ;) on the one hand, i wrote it quite a while ago to explain to a then-uninterested audience (e.g. charles wehner who is probably STILL hacking around with x86 and x87 asm); now there are 3 of us who all code 3d graphics professionally explaining to each other things we already know (the simd comment was directed at duncan c) :P


Title: Re: mini supercomputer
Post by: keldor314 on June 11, 2008, 12:09:21 AM
LOL, maybe you didn't provide enough shiny pictures?  XD

Anyway, thinking about that multiple parameters alternating between c1 and c2 would produce a 4th order julia set, in that every 2 iterations would be z = (z^2+c1)^2+c2
which is z = z^4+2*c1*z^2+c1*c1+c2, right?

The problem would be that the image produced by iteration 39 would be quite different from that from iteration 40, though a lot like iteration 41, so your users might be confused when you tweak the iteration value, and suddenly end up with a much different fractal!

In any case, it does indeed use dx10, partly because part of the reason I started is was to learn dx10 (it's quite a bit different from dx 9), and partly because one of the features of dx10 is the newer hlsl compiler, which supports dynamic branching.  The dx9 hlsl compiler only supported static branching - e.g. SM 2.  In order to take advantage of dynamic branching, you had to code the shaders in assembler.  Ick!  After digging through the documentation, I just found that the dx10 hlsl compiler is now the one used for dx9 as well, so it will be happy targeting SM 3 at least.

I guess this means it won't be too horrific to backport the thing to dx9, since it's not actually using any dx10 exclusive features.  Still, given that setting up the DX api is about as fun as gouging ones eyes out with a dinner fork...


Title: Re: mini supercomputer
Post by: lycium on June 11, 2008, 12:17:05 AM
LOL, maybe you didn't provide enough shiny pictures?  XD
well i haven't done much fractal computing with a gpu :P still prefer my 64bit x86 processor cores.

In any case, it does indeed use dx10, partly because part of the reason I started is was to learn dx10 (it's quite a bit different from dx 9), and partly because one of the features of dx10 is the newer hlsl compiler, which supports dynamic branching.  The dx9 hlsl compiler only supported static branching - e.g. SM 2.
dx9 definitely supports dynamic branching, in hlsl.

I guess this means it won't be too horrific to backport the thing to dx9, since it's not actually using any dx10 exclusive features.  Still, given that setting up the DX api is about as fun as gouging ones eyes out with a dinner fork...
even for crysis, dx9 is enough ;) there's a funny fiasco about the dx9/dx10/vista-only issues with that game, where actually you can just enable "dx10 only" features in dx9 via a registry key!

sure, there are things that really only dx10 can do and dx9 can't, but if you're worried about computational capabilities skip dx alltogether and go straight for cuda  O0


Title: Re: mini supercomputer
Post by: keldor314 on June 11, 2008, 12:36:55 AM
sure, there are things that really only dx10 can do and dx9 can't, but if you're worried about computational capabilities skip dx alltogether and go straight for cuda  O0

Problem with that is then instead of being Vista only, it becomes Nvidia only.  Besides, CUDA took FOREVER to be released for Vista.  As in, the first public Vista beta was only a month or two ago >.<  Hence, CUDA will have to wait for my next project.


Title: Re: mini supercomputer
Post by: cKleinhuis on June 11, 2008, 12:36:04 PM
Anyway, thinking about that multiple parameters alternating between c1 and c2 would produce a 4th order julia set, in that every 2 iterations would be z = (z^2+c1)^2+c2
which is z = z^4+2*c1*z^2+c1*c1+c2, right?

yes, you are absolutely right. this is the outcome, and you see, that the repeating appears every second iteration, you have a decent control about the evaluation :D

Quote
The problem would be that the image produced by iteration 39 would be quite different from that from iteration 40, though a lot like iteration 41, so your users might be confused when you tweak the iteration value, and suddenly end up with a much different fractal!


this is not true !!!

consider the calculated fractal until iteration 39, certain areas of the picture are already bailed out. this means that the next iteration can only create new pixels in the not yet bailouted areas. now the distance between c1 and c2 come into play, if they are nearly the same, you would not see any difference between the c1 only fractal compared to this.

if c1 and c2 are far away from eachother the image simply bails out earlier and at some unexpected positions, but iteration depth alone does not lead to unwanted results, believe me.

the only thing which really changes the outcome is when adding more seeds, please give it a try, replace
your seed variable with a seed[iteration%numSeeds] statement, and animate those seeds with some nice sine curves.


have fun
c.k



Title: Re: mini supercomputer
Post by: David Makin on June 11, 2008, 08:52:46 PM
Hi,

I'd love to have any one of you guys utilising the GPUs see if you can come up with a realtime GPU version of my escape-time 3D IFS algorithm (based on Hart's method).
It should be possible to get it done just about realtime - for instance the CPU/FPU render version of the 3D Menger Sponge takes around 13 seconds at 640*480 on my P4HT and the algorithm as written allows for any affine transforms - obviously there's a lot of potential for optimisation if the GPU version is restricted to a particular fractal such as the Sponge.

If anyone's interested the algorithm is the 3D IFS formula in mmf4.ufm from the Ultrafractal Formula database - http://formulas.ultrafractal.com/ - go to browse collection and "Makin, Dave".

bye
Dave


Title: Re: mini supercomputer
Post by: lycium on June 13, 2008, 08:50:19 AM
i've tried to grok that src a few times, always finding:

1. the source is HUGE!
2. the variable names aren't very descriptive :/
3. lack of vector functions etc (everything is super inline) means the code is quite opaque and just dropping it into cg or hlsl/glsl wouldn't work.
4. no idea where the meat of the algorithm is, because there's so much uf-specific stuff around it...

could you post the bit that would actually get ported here, please?


Title: Re: mini supercomputer
Post by: David Makin on June 13, 2008, 12:38:24 PM
i've tried to grok that src a few times, always finding:

1. the source is HUGE!
2. the variable names aren't very descriptive :/
3. lack of vector functions etc (everything is super inline) means the code is quite opaque and just dropping it into cg or hlsl/glsl wouldn't work.
4. no idea where the meat of the algorithm is, because there's so much uf-specific stuff around it...

could you post the bit that would actually get ported here, please?

1. I know, there's quite a bit that's superfluos to a GPU implimentation (such as calculating the fractal dimension of thte object).
2. That's just me, I'm a lazy typist (1 hand only).
3. In UF4 you can only do it "inline".
4. I'll post something here tonight - with comments :) (Am at work now)

bye
Dave


Title: Re: mini supercomputer
Post by: David Makin on June 14, 2008, 02:06:40 AM
Hi Thomas and all,

Here's the meat of the routine:
I should point out that the transforms here in the arrays are the reverse transforms i.e. the expanding ones, though I haven't changed the sign of the translations from the original and the scale values are the original contracting values.
You should also note that the order you put the transforms in will greatly affect the render speed especially for fractals without rotation/skew such as the Menger - it's best to have a pre-test routine to find which transforms have areas closest to the viewpoint and order the transforms so the closest are done first.

    x1 = gtestx - tx       ; gtestx, gtesty and gtestz are the centre of the
    y1 = gtesty - ty      ; bailout test sphere
    z1 = gtestz - tz      ; tx, ty and tz are the base point on the viewing ray
    found = 1e200        ; "found" is a distance, large value means "not found"
    nomore = true        ; flag is initially true to indicate no extra rays to traverse
    if x1*x1 + y1*y1 + z1*z1  - (x1*dx + y1*dy + z1*dz)^2<=gdist
; tests if distance of nearest approach of ray to centre of test sphere is less
; than the bailout distance (gdist), if not then we've missed the fractal
      xv[0] = t5 = tx   ; set temp variables and ray position for depth 0
      yv[0] = t6 = ty
      zv[0] = t7 = tz
      dxv[0] = dx       ; dx,dy and dz are the viewing ray's direction vector
      dyv[0] = dy      ; here we set the direction vectors at depth 0
      dzv[0] = dz
      k = gft             ; k = the first transform, gft
      s[0] = 1.0         ; scale at depth 0 = 1.0
      t12 = 1e200      ; not really necessary (see t12 below)
      j = 0               ; start at depth 0 in the transform tree
      repeat
        ss = s[j]*gt[(i[j]=k),12]  ; scale = oldscale*scale for this transform
; In the above note the i[j]=k i.e. set transform at depth j to k
; also note that the scale values are the original contracting values
; now transform the base point on the ray, storing in "current" values
; temp variables and variables for the next depth - note the (j=j+1)
        xv[(j=j+1)] = t5 = (x1=t5-gt[k,9])*gt[k,0] + (y1=t6-gt[k,10])*gt[k,3] \
                           + (z1=t7-gt[k,11])*gt[k,6]
        yv[j] = t6 = x1*gt[k,1] + y1*gt[k,4] + z1*gt[k,7]
        zv[j] = t7 = x1*gt[k,2] + y1*gt[k,5] + z1*gt[k,8]
; transform the ray's direction vectors and store in current and for depth
        dxv[j] = dx = (x1=dx)*gt[k,0] + (y1=dy)*gt[k,3] + dz*gt[k,6]
        dyv[j] = dy = x1*gt[k,1] + y1*gt[k,4] + dz*gt[k,7]
        dzv[j] = dz = x1*gt[k,2] + y1*gt[k,5] + dz*gt[k,8]
        x1 = gtestx - t5   ; distances from bailout sphere centre
        y1 = gtesty - t6
        z1 = gtestz - t7
        f = false      ; a flag to make things easier (see below)
        s[j] = ss      ; set scale for this depth
; Now test if we're within the bailout distance
        if (t9 = gdist - x1*x1 - y1*y1 - z1*z1 \
                 + (x2=1.0/(dx*dx+dy*dy+dz*dz))*(t11=x1*dx+y1*dy+z1*dz)^2)>=0.0
; if we are compute the distances along the ray to the intersections with the
; sphere (in original unscaled terms)
          t12 = (t11=x2*t11) - (t9=sqrt(x2*t9))
; if the nearest distance is <= the distance to the back clip (end)
; and the farthest is >= to the front clip
          if t12<=end && t11+t9>=t4
; if we've reached the max depth (@igen) or
; the depth is >= the minimum depth (@imin) and we've gone below the min scale
            if j>=@igen || (j>=@imin && ss<minscale)
; You can safely ignore the "if @flat bit and use the "else" only
              if @flat
                if t11<found
                  found = end = t11
                  if (@colouring==0 || @colouring>3) && @colmeth==0
                    nomore = false
                  endif
                  if (@colouring>0 || @colmeth==1) && ii==0
                    jj = l = j
                    while (l=l-1)>=0
                      is[l] = i[l]
                    endwhile
                  endif
                endif
              else
                if t12<found  ; if the nearest intersection distance is < the best found
                  found = end = t12  ; set found AND the back clip to the new distance
; if using lighting set nomore to false so we calculate the adjacent rays
; for getting the normals
                  if (@colouring==0 || @colouring>3) && @colmeth==0
                    nomore = false
                  endif
; store the transform numbers from each depth for certain colourings
                  if (@colouring>0 || @colmeth==1) && ii==0
                    jj = l = j
                    while (l=l-1)>=0
                      is[l] = i[l]
                    endwhile
                  endif
                endif
              endif
; else (within clips but not found a collision) get first transform at next depth
            elseif (k=gn[i[j-1],gntm1])>=0
; check depth flag for next transform k at depth j
              if !(f=df[k,j])   ; if k not allowed
                i[j] = k        ; adjust values for
                j = j + 1      ; routine to get the next valid one (below)
              endif
            endif
          endif
        endif
        if !f      ; if not going to new depth
          k = i[(j=j-1)]   ; goto previous depth, get transform from previous depth
          repeat
            if (k=k-1)<0  ; next transform from previous depth, OK ?
              if (j=j-1)<0 ; no so go up another depth, if done
                f = true   ; set flag to exit the repeat
              else
                k = i[j]   ; transform from previous
              endif
            elseif j>0   ; if not on first depth
              if (k=gn[i[j-1],k])>=0  ; transform k valid ?
                f = df[k,j]  ; yes, flag if k valid at depth j
              else
                k = i[(j=j-1)] ; no, go back up
              endif
            else
              f = df[k,0]   ; f = k valid at depth 0
            endif
          until f
          if (j>=0)          ; depth>=0
            t5 = xv[j]       ; retreive ray base point and direction vectors
            t6 = yv[j]      ; for the new depth
            t7 = zv[j]
            dx = dxv[j]
            dy = dyv[j]
            dz = dzv[j]
          endif
        endif
      until j<0
    endif

; If found a point on the fractal store values accordingly
    if (sc[ii]=found)<1e100
      if ii==0
        vv = found
        px[0] = xx + found*xc
        py[0] = yy + found*yc
        pz[0] = zz + found*zc
      elseif abs(vv-found)<onepix
        px[ii] = xx + found*xc
        py[ii] = yy + found*yc
        pz[ii] = zz + found*zc
      else
        sc[ii] = 1e200
      endif
    endif
    ii = ii + 1
  until (ii==1 && nomore) || ii>=5



Title: Re: mini supercomputer
Post by: David Makin on June 14, 2008, 01:41:54 PM
Hi again,

I should have mentioned that I missed out the top of the ii loop, all that does is set up the viewing ray base position, the viewing ray direction vectors and the initial back-clip (end) and the front clip (t4).
Also (fairly important) the initial direction vector of the viewing ray (dx,dy,dz) is normalised.

bye
Dave


Title: Re: mini supercomputer
Post by: lycium on June 14, 2008, 02:55:06 PM
Also (fairly important) the initial direction vector of the viewing ray (dx,dy,dz) is normalised.

of course, but it seems you forgot about that:

Code:
; Now test if we're within the bailout distance
        if (t9 = gdist - x1*x1 - y1*y1 - z1*z1 \
                 + (x2=1.0/(dx*dx+dy*dy+dz*dz))*(t11=x1*dx+y1*dy+z1*dz)^2)>=0.0

 O0

thanks for the code! it doesn't look very gpu-friendly though, with all that branching... can you explain it in pseudocode perhaps?

to make sure i'm getting this right, you start out with some linear maps M_k which would be used for the normal "chaos game" (stochastic) ifs, from those get their inverses M^{-1}_k and then... well, i've read hart's method back when and would like to brush up on it again, but exactly how does your implementation differ?

thanks, and sorry for all the questions!  :-X


Title: Re: mini supercomputer
Post by: David Makin on June 14, 2008, 04:21:08 PM
Hi,

I thought that was transparent !!

However there are a couple of differences from Hart's basic method:

1. It allows for RIFS - that's why the code to find the next transform is looking up in an array.

2. It allows for depth restrictions on transforms - that's why there's a look up in an array to find if a given transform is allowed at a given depth.

Here's some pseudo-code, I think I've got it correct. Hope it helps - the only bit that I'm hoping I've got correct is the loop to go to the next transform at the current depth and if we've done them all then to go up a depth.

Here's the pseudo-code for a single viewing ray:

calculate the base point for the ray
calculate the normalised direction vector for the ray
get the front clip (distance along ray from base point)
get the back clip offset (ditto)
set the "found" distance to a big value

if we've hit the bounding sphere

  store base point for depth 0
  store direction vectors for depth 0
  set current transform to first transform
  (normally number of transforms - 1)
  set scale at depth 0 to 1
  set depth to 0

  while depth>=0

    store current transform as transform for this depth
    current scale = scale at this depth*scale for current transform
    increment depth
    transform base point and direction vector storing values as "current" and for the new depth
    set flag to false (=="goto next transform at current depth"
    store current scale as scale for new depth

    if we're within the bailout distance

      if nearest intersection with sphere is <= distance to back clip and farthest intersection with sphere is >= front clip

        if reached max depth or (depth>min depth and reached minscale)

          if new nearest distance<found distance

            set found distance and back clip to new distance
            store any colouring information

          else

            current transform = first transform (normally number of transforms - 1)
            set flag to true (=="goto next depth" i.e. NOT "goto next transform at current depth")

          endif

        endif (reached max depth etc)

      endif (intersection with sphere within scan)

    endif (within bailout distance)

    if flag false (== "goto next transform at current depth")

      depth = depth - 1
      current transform = transform[depth]
      while flag==false

        current transform = current transform - 1
        if current transform<0

          depth = depth - 1
          if depth<0

            f = true

          else

            current transform = transform[depth]

          endif

        else

          f = true

        endif

      endwhile

      if depth>=0

        set current base point and direction vectors to values from depth

      endif

    endif

  endwhile (depth>=0)

  if found distance<big value (set at start)

    point = base point + direction vector * found

  endif


Title: Re: mini supercomputer
Post by: David Makin on June 14, 2008, 06:55:00 PM
Also (fairly important) the initial direction vector of the viewing ray (dx,dy,dz) is normalised.

of course, but it seems you forgot about that:

Code:
; Now test if we're within the bailout distance
        if (t9 = gdist - x1*x1 - y1*y1 - z1*z1 \
                 + (x2=1.0/(dx*dx+dy*dy+dz*dz))*(t11=x1*dx+y1*dy+z1*dz)^2)>=0.0

No I didn't forget - the transforms scale the direction vectors.
For transforms that have uniform scaling (x,y,z) and no skewing/shearing etc. - such as those for the Menger Sponge - the calculations can be simplified to avoid the recalculation of inverse of the scale of the normal since the value will be the same as the overall contractive scale (that's why the routine still uses the contractive scale values).


Title: Re: mini supercomputer
Post by: lycium on June 15, 2008, 11:08:47 AM
Also (fairly important) the initial direction vector of the viewing ray (dx,dy,dz) is normalised.

of course, but it seems you forgot about that:

Code:
; Now test if we're within the bailout distance
        if (t9 = gdist - x1*x1 - y1*y1 - z1*z1 \
                 + (x2=1.0/(dx*dx+dy*dy+dz*dz))*(t11=x1*dx+y1*dy+z1*dz)^2)>=0.0

No I didn't forget - the transforms scale the direction vectors.
For transforms that have uniform scaling (x,y,z) and no skewing/shearing etc. - such as those for the Menger Sponge - the calculations can be simplified to avoid the recalculation of inverse of the scale of the normal since the value will be the same as the overall contractive scale (that's why the routine still uses the contractive scale values).
x2=1.0/(dx*dx+dy*dy+dz*dz) = 1.0


Title: Re: mini supercomputer
Post by: David Makin on June 15, 2008, 09:22:29 PM
Also (fairly important) the initial direction vector of the viewing ray (dx,dy,dz) is normalised.

of course, but it seems you forgot about that:

Code:
; Now test if we're within the bailout distance
        if (t9 = gdist - x1*x1 - y1*y1 - z1*z1 \
                 + (x2=1.0/(dx*dx+dy*dy+dz*dz))*(t11=x1*dx+y1*dy+z1*dz)^2)>=0.0

No I didn't forget - the transforms scale the direction vectors.
For transforms that have uniform scaling (x,y,z) and no skewing/shearing etc. - such as those for the Menger Sponge - the calculations can be simplified to avoid the recalculation of inverse of the scale of the normal since the value will be the same as the overall contractive scale (that's why the routine still uses the contractive scale values).
x2=1.0/(dx*dx+dy*dy+dz*dz) = 1.0

No because just prior to that we transformed the direction vectors:

        dxv[j] = dx = (x1=dx)*gt[k,0] + (y1=dy)*gt[k,3] + dz*gt[k,6]
        dyv[j] = dy = x1*gt[k,1] + y1*gt[k,4] + dz*gt[k,7]
        dzv[j] = dz = x1*gt[k,2] + y1*gt[k,5] + dz*gt[k,8]

Which means that they were at least scaled, possible skewed, sheared etc.



Title: Re: mini supercomputer
Post by: lycium on June 15, 2008, 09:51:20 PM
ah sorry, all clear now (i am used to code with const-declarations, which helps a lot in matters like these)! however, i think you can still avoid those operations if you know:

1. that (dx,dy,dz) is normalised in the beginning
2. 1 / |M| for all the xform matrices M

then you could compute (dx,dy,dz) = M(dx,dy,dz) * Mscale, where Mscale is inverse of detM, to simplify some of the later computations.


Title: Re: mini supercomputer
Post by: David Makin on June 16, 2008, 02:44:29 PM
As I said:

"For transforms that have uniform scaling (x,y,z) and no skewing/shearing etc. - such as those for the Menger Sponge - the calculations can be simplified to avoid the recalculation of inverse of the scale of the normal since the value will be the same as the overall contractive scale (that's why the routine still uses the contractive scale values)."

I mean *all* the transforms in the IFS of course.
Note that with pure rotations you could also still use the overall scale value.
However if there is non-uniform scaling, or skewing or shearing etc. then you have to effectively re-normalise (because the overall transformation of dx,dy,dz is the accumulation of all the transforms from each depth in the tree)