Logo by haltenny - Contribute your own Logo!

END OF AN ERA, FRACTALFORUMS.COM IS CONTINUED ON FRACTALFORUMS.ORG

it was a great time but no longer maintainable by c.Kleinhuis contact him for any data retrieval,
thanks and see you perhaps in 10 years again

this forum will stay online for reference
News: Follow us on Twitter
 
*
Welcome, Guest. Please login or register. April 25, 2024, 10:07:47 AM


Login with username, password and session length


The All New FractalForums is now in Public Beta Testing! Visit FractalForums.org and check it out!


Pages: 1 [2] 3   Go Down
  Print  
Share this topic on DiggShare this topic on FacebookShare this topic on GoogleShare this topic on RedditShare this topic on StumbleUponShare this topic on Twitter
Author Topic: mini supercomputer  (Read 17172 times)
0 Members and 2 Guests are viewing this topic.
lycium
Fractal Supremo
*****
Posts: 1158



WWW
« Reply #15 on: May 05, 2008, 12:43:28 PM »

it's SIMD, and i'd strongly suggest tackling altivec in your situation, since you have access to an older mac and this experience will let you dabble with cell spu programming (via a ps3), which will be a kind of nirvana for you in terms of mandelbrot flops i think wink

yeah it's a good idea on modern processors to use masks instead of small-scale branching because processors are deeply pipelined to reach high levels of speed (in ghz) and efficiency (in terms of balancing compute vs bandwidth resources); this applies even moreso to gpus, whereas the cell broadband engine is a nice agile mix of both.
Logged

keldor314
Guest
« Reply #16 on: June 10, 2008, 10:47:41 PM »

Actually, concerning the Geforce 8x00, SIMD isn't completely correct either - the 8800GTX, for instance, has 16 fully seperate multiprocessors, each with 16 ALUs which are truely SIMD, for a total of 128 ALUs.  What this means is that within each multiprocessor, the 16 SIMD processors will run in lockstep with each other, BUT each of the 16 multiprocessors can branch seperately.  Think of it like a 16 core processor, where each processor operates on a 16-wide group of registers simultaneously.

In practice, this will give you perhaps 90% efficiency on fractal rendering, because these 16 units of a given "core" most likely are operating on a 4x4 grid of pixels, meaning that the vast majority of the time, the bailout of one of these pixels is not much sooner than any of the others.  The unit must iterate until all 16 pixels are bailed out (or MAX_ITERATIONS is reached), but since all 16 pixels are in a pretty tiny area (about the size of this comma , assuming you're using a small font) they will mostly end up finishing all about at the same time.

Branches are handled by masking out the computations of the units that take the branch, so that they effectively do nothing until all units are "finished" and ready to join in beyond the branch.  This means that for a iteration loop, for instance, the units that have hit the bailout will "sit around" and do nothing until all units (in the 4x4 grid, remember) have bailed out.  At this point, the multiprocessor can proceed down to the next line of code.  Note that the multiprocessor doesn't necessarily have to do the whole MAX_ITERATIONS, just enough that all the processors have reached bailout.

It's interesting to contrast this to the Radeon 2800 (I THINK - I'm not up as well on their specific GPU numbers...), which has 4 separate multiprocessors, each with 16 SIMD processors, each of THOSE in turn having 5 ALUs for a collective total of 320 ALUs.  The thing to note here (and why the 320 ALUs do not trump the 128 of the GeForce) is that the groups of 5 ALUs are bound together on the same pixel, rather than split among multiple pixels.  This leaves it to the programmer to somehow get their code to do the work for a single pixel to somehow spread out its arithmetic among the 5 ALUs.  (Actually, it's a bit more complicated than that, since the 5 ALUs are not exactly SIMD, rather, they are 4+1 SIMD, with 4 standard ALUs in SIMD, and one special super-ALU which kinda supplements the others - e.g. the 5th unit is the only one that has hardware accelerated transcendental operations (sin, cos, e^x, ect) and you might have a opcode that tells the 4 units to multiply, then add the 5th unit to each of their results)

ANYWAY... before I make your headaches from reading this any worse (it's HARD to explain this sort of thing without going even further into the jargon than I already did...), the ultimate point is GPUZ DO L33T FRACTALZ!!11!  XD

For example, I have a screensaver I wrote which can render some pretty hairy fractals (e^(sin(a*z)+cos(b*z) for instance) at a smooth framerate (30-60fps on average) with 50 or so iterations at a whopping screen resolution of 2048x1536!  What do you do with fractals rendered in 1/40th of a second?  Realtime animation, of course!  Watch the fractal change as the values of a and b change...

The downside to all this (and it's a big one) is that the GPU only does native single precision floating point - no zooming (at least not much) for you!  Still, you can of course write code in the shader to emulate higher precision the same way some fractal programs do deep zoom - it will just run a good deal slower, although still faster than the CPU...

By the way (insert shameless plug) if you want to see that screensaver, you can download it yourself from https://sourceforge.net/projects/gpujuliascreens/.  Note that it uses DirectX 10, and will not run on anything less, so don't bother if you don't have Windows Vista and a new graphics card (Geforce 8x00+, Radeon 2*00+).
Logged
cKleinhuis
Administrator
Fractal Senior
*******
Posts: 7044


formerly known as 'Trifox'


WWW
« Reply #17 on: June 10, 2008, 10:58:49 PM »

hey there, great to see another GPU fanatic, it is true, the single floating point values are far to few to achieve high deep rates, but this way you stay realtime cheesy
have you experience with the extended floating point unit on nvidia cards ? i heard they can do 64 bit floating point precision - or something, but on the gpu ... afro afro
Logged

---

divide and conquer - iterate and rule - chaos is No random!
cKleinhuis
Administrator
Fractal Senior
*******
Posts: 7044


formerly known as 'Trifox'


WWW
« Reply #18 on: June 10, 2008, 11:06:17 PM »

i see you are using directx for rendering ... why ? i do hope you are not using dx10 .... but be sure to read about my alternating mandelbrot method, this way you can easily combine each of your fractal formula together with many many parameters, i have used this method in my fractalmovies.com screensaver

in fact the method yields fantastic new fractals without changing the calculation complexity

it is simply done by using more than one seed in the iteration, e.g. mandelbrot is:

z2+c

so, if you are using c1 and c2 throughout the iteration, it yields in smooth animation, in fact you can use any number of seeds in one iteration, clearly no more than iteration depth,

but beside of that, it is easy to just switch the whole formula in each step, even this yields smooth animatable results

please check out my screensaver demonstrating this method with a plain mandelbrot formula
http://fractalmovies.com/index.php?option=com_remository&Itemid=21&func=fileinfo&id=19

i have also written some tiny explanation about that:
http://fractalmovies.com/index.php?option=com_content&task=view&id=31&Itemid=34

please take your time to think about it, i think it would be perfect in a gpu accellerated screensaver running on pixel shader 3.0 hardware cheesy

cheers
ck
 afro
Logged

---

divide and conquer - iterate and rule - chaos is No random!
lycium
Fractal Supremo
*****
Posts: 1158



WWW
« Reply #19 on: June 10, 2008, 11:30:26 PM »

hmm, there's a few extremes going on in this topic wink on the one hand, i wrote it quite a while ago to explain to a then-uninterested audience (e.g. charles wehner who is probably STILL hacking around with x86 and x87 asm); now there are 3 of us who all code 3d graphics professionally explaining to each other things we already know (the simd comment was directed at duncan c) tongue stuck out
Logged

keldor314
Guest
« Reply #20 on: June 11, 2008, 12:09:21 AM »

LOL, maybe you didn't provide enough shiny pictures?  XD

Anyway, thinking about that multiple parameters alternating between c1 and c2 would produce a 4th order julia set, in that every 2 iterations would be z = (z^2+c1)^2+c2
which is z = z^4+2*c1*z^2+c1*c1+c2, right?

The problem would be that the image produced by iteration 39 would be quite different from that from iteration 40, though a lot like iteration 41, so your users might be confused when you tweak the iteration value, and suddenly end up with a much different fractal!

In any case, it does indeed use dx10, partly because part of the reason I started is was to learn dx10 (it's quite a bit different from dx 9), and partly because one of the features of dx10 is the newer hlsl compiler, which supports dynamic branching.  The dx9 hlsl compiler only supported static branching - e.g. SM 2.  In order to take advantage of dynamic branching, you had to code the shaders in assembler.  Ick!  After digging through the documentation, I just found that the dx10 hlsl compiler is now the one used for dx9 as well, so it will be happy targeting SM 3 at least.

I guess this means it won't be too horrific to backport the thing to dx9, since it's not actually using any dx10 exclusive features.  Still, given that setting up the DX api is about as fun as gouging ones eyes out with a dinner fork...
Logged
lycium
Fractal Supremo
*****
Posts: 1158



WWW
« Reply #21 on: June 11, 2008, 12:17:05 AM »

LOL, maybe you didn't provide enough shiny pictures?  XD
well i haven't done much fractal computing with a gpu tongue stuck out still prefer my 64bit x86 processor cores.

In any case, it does indeed use dx10, partly because part of the reason I started is was to learn dx10 (it's quite a bit different from dx 9), and partly because one of the features of dx10 is the newer hlsl compiler, which supports dynamic branching.  The dx9 hlsl compiler only supported static branching - e.g. SM 2.
dx9 definitely supports dynamic branching, in hlsl.

I guess this means it won't be too horrific to backport the thing to dx9, since it's not actually using any dx10 exclusive features.  Still, given that setting up the DX api is about as fun as gouging ones eyes out with a dinner fork...
even for crysis, dx9 is enough wink there's a funny fiasco about the dx9/dx10/vista-only issues with that game, where actually you can just enable "dx10 only" features in dx9 via a registry key!

sure, there are things that really only dx10 can do and dx9 can't, but if you're worried about computational capabilities skip dx alltogether and go straight for cuda  afro
Logged

keldor314
Guest
« Reply #22 on: June 11, 2008, 12:36:55 AM »

sure, there are things that really only dx10 can do and dx9 can't, but if you're worried about computational capabilities skip dx alltogether and go straight for cuda  afro

Problem with that is then instead of being Vista only, it becomes Nvidia only.  Besides, CUDA took FOREVER to be released for Vista.  As in, the first public Vista beta was only a month or two ago >.<  Hence, CUDA will have to wait for my next project.
Logged
cKleinhuis
Administrator
Fractal Senior
*******
Posts: 7044


formerly known as 'Trifox'


WWW
« Reply #23 on: June 11, 2008, 12:36:04 PM »

Anyway, thinking about that multiple parameters alternating between c1 and c2 would produce a 4th order julia set, in that every 2 iterations would be z = (z^2+c1)^2+c2
which is z = z^4+2*c1*z^2+c1*c1+c2, right?

yes, you are absolutely right. this is the outcome, and you see, that the repeating appears every second iteration, you have a decent control about the evaluation cheesy

Quote
The problem would be that the image produced by iteration 39 would be quite different from that from iteration 40, though a lot like iteration 41, so your users might be confused when you tweak the iteration value, and suddenly end up with a much different fractal!


this is not true !!!

consider the calculated fractal until iteration 39, certain areas of the picture are already bailed out. this means that the next iteration can only create new pixels in the not yet bailouted areas. now the distance between c1 and c2 come into play, if they are nearly the same, you would not see any difference between the c1 only fractal compared to this.

if c1 and c2 are far away from eachother the image simply bails out earlier and at some unexpected positions, but iteration depth alone does not lead to unwanted results, believe me.

the only thing which really changes the outcome is when adding more seeds, please give it a try, replace
your seed variable with a seed[iteration%numSeeds] statement, and animate those seeds with some nice sine curves.


have fun
c.k

Logged

---

divide and conquer - iterate and rule - chaos is No random!
David Makin
Global Moderator
Fractal Senior
******
Posts: 2286



Makin' Magic Fractals
WWW
« Reply #24 on: June 11, 2008, 08:52:46 PM »

Hi,

I'd love to have any one of you guys utilising the GPUs see if you can come up with a realtime GPU version of my escape-time 3D IFS algorithm (based on Hart's method).
It should be possible to get it done just about realtime - for instance the CPU/FPU render version of the 3D Menger Sponge takes around 13 seconds at 640*480 on my P4HT and the algorithm as written allows for any affine transforms - obviously there's a lot of potential for optimisation if the GPU version is restricted to a particular fractal such as the Sponge.

If anyone's interested the algorithm is the 3D IFS formula in mmf4.ufm from the Ultrafractal Formula database - http://formulas.ultrafractal.com/ - go to browse collection and "Makin, Dave".

bye
Dave
Logged

The meaning and purpose of life is to give life purpose and meaning.

http://www.fractalgallery.co.uk/
"Makin' Magic Music" on Jango
lycium
Fractal Supremo
*****
Posts: 1158



WWW
« Reply #25 on: June 13, 2008, 08:50:19 AM »

i've tried to grok that src a few times, always finding:

1. the source is HUGE!
2. the variable names aren't very descriptive Sceptical
3. lack of vector functions etc (everything is super inline) means the code is quite opaque and just dropping it into cg or hlsl/glsl wouldn't work.
4. no idea where the meat of the algorithm is, because there's so much uf-specific stuff around it...

could you post the bit that would actually get ported here, please?
Logged

David Makin
Global Moderator
Fractal Senior
******
Posts: 2286



Makin' Magic Fractals
WWW
« Reply #26 on: June 13, 2008, 12:38:24 PM »

i've tried to grok that src a few times, always finding:

1. the source is HUGE!
2. the variable names aren't very descriptive Sceptical
3. lack of vector functions etc (everything is super inline) means the code is quite opaque and just dropping it into cg or hlsl/glsl wouldn't work.
4. no idea where the meat of the algorithm is, because there's so much uf-specific stuff around it...

could you post the bit that would actually get ported here, please?

1. I know, there's quite a bit that's superfluos to a GPU implimentation (such as calculating the fractal dimension of thte object).
2. That's just me, I'm a lazy typist (1 hand only).
3. In UF4 you can only do it "inline".
4. I'll post something here tonight - with comments smiley (Am at work now)

bye
Dave
Logged

The meaning and purpose of life is to give life purpose and meaning.

http://www.fractalgallery.co.uk/
"Makin' Magic Music" on Jango
David Makin
Global Moderator
Fractal Senior
******
Posts: 2286



Makin' Magic Fractals
WWW
« Reply #27 on: June 14, 2008, 02:06:40 AM »

Hi Thomas and all,

Here's the meat of the routine:
I should point out that the transforms here in the arrays are the reverse transforms i.e. the expanding ones, though I haven't changed the sign of the translations from the original and the scale values are the original contracting values.
You should also note that the order you put the transforms in will greatly affect the render speed especially for fractals without rotation/skew such as the Menger - it's best to have a pre-test routine to find which transforms have areas closest to the viewpoint and order the transforms so the closest are done first.

    x1 = gtestx - tx       ; gtestx, gtesty and gtestz are the centre of the
    y1 = gtesty - ty      ; bailout test sphere
    z1 = gtestz - tz      ; tx, ty and tz are the base point on the viewing ray
    found = 1e200        ; "found" is a distance, large value means "not found"
    nomore = true        ; flag is initially true to indicate no extra rays to traverse
    if x1*x1 + y1*y1 + z1*z1  - (x1*dx + y1*dy + z1*dz)^2<=gdist
; tests if distance of nearest approach of ray to centre of test sphere is less
; than the bailout distance (gdist), if not then we've missed the fractal
      xv[0] = t5 = tx   ; set temp variables and ray position for depth 0
      yv[0] = t6 = ty
      zv[0] = t7 = tz
      dxv[0] = dx       ; dx,dy and dz are the viewing ray's direction vector
      dyv[0] = dy      ; here we set the direction vectors at depth 0
      dzv[0] = dz
      k = gft             ; k = the first transform, gft
      s[0] = 1.0         ; scale at depth 0 = 1.0
      t12 = 1e200      ; not really necessary (see t12 below)
      j = 0               ; start at depth 0 in the transform tree
      repeat
        ss = s[j]*gt[(i[j]=k),12]  ; scale = oldscale*scale for this transform
; In the above note the i[j]=k i.e. set transform at depth j to k
; also note that the scale values are the original contracting values
; now transform the base point on the ray, storing in "current" values
; temp variables and variables for the next depth - note the (j=j+1)
        xv[(j=j+1)] = t5 = (x1=t5-gt[k,9])*gt[k,0] + (y1=t6-gt[k,10])*gt[k,3] \
                           + (z1=t7-gt[k,11])*gt[k,6]
        yv[j] = t6 = x1*gt[k,1] + y1*gt[k,4] + z1*gt[k,7]
        zv[j] = t7 = x1*gt[k,2] + y1*gt[k,5] + z1*gt[k,8]
; transform the ray's direction vectors and store in current and for depth
        dxv[j] = dx = (x1=dx)*gt[k,0] + (y1=dy)*gt[k,3] + dz*gt[k,6]
        dyv[j] = dy = x1*gt[k,1] + y1*gt[k,4] + dz*gt[k,7]
        dzv[j] = dz = x1*gt[k,2] + y1*gt[k,5] + dz*gt[k,8]
        x1 = gtestx - t5   ; distances from bailout sphere centre
        y1 = gtesty - t6
        z1 = gtestz - t7
        f = false      ; a flag to make things easier (see below)
        s[j] = ss      ; set scale for this depth
; Now test if we're within the bailout distance
        if (t9 = gdist - x1*x1 - y1*y1 - z1*z1 \
                 + (x2=1.0/(dx*dx+dy*dy+dz*dz))*(t11=x1*dx+y1*dy+z1*dz)^2)>=0.0
; if we are compute the distances along the ray to the intersections with the
; sphere (in original unscaled terms)
          t12 = (t11=x2*t11) - (t9=sqrt(x2*t9))
; if the nearest distance is <= the distance to the back clip (end)
; and the farthest is >= to the front clip
          if t12<=end && t11+t9>=t4
; if we've reached the max depth (@igen) or
; the depth is >= the minimum depth (@imin) and we've gone below the min scale
            if j>=@igen || (j>=@imin && ss<minscale)
; You can safely ignore the "if @flat bit and use the "else" only
              if @flat
                if t11<found
                  found = end = t11
                  if (@colouring==0 || @colouring>3) && @colmeth==0
                    nomore = false
                  endif
                  if (@colouring>0 || @colmeth==1) && ii==0
                    jj = l = j
                    while (l=l-1)>=0
                      is[l] = i[l]
                    endwhile
                  endif
                endif
              else
                if t12<found  ; if the nearest intersection distance is < the best found
                  found = end = t12  ; set found AND the back clip to the new distance
; if using lighting set nomore to false so we calculate the adjacent rays
; for getting the normals
                  if (@colouring==0 || @colouring>3) && @colmeth==0
                    nomore = false
                  endif
; store the transform numbers from each depth for certain colourings
                  if (@colouring>0 || @colmeth==1) && ii==0
                    jj = l = j
                    while (l=l-1)>=0
                      is[l] = i[l]
                    endwhile
                  endif
                endif
              endif
; else (within clips but not found a collision) get first transform at next depth
            elseif (k=gn[i[j-1],gntm1])>=0
; check depth flag for next transform k at depth j
              if !(f=df[k,j])   ; if k not allowed
                i[j] = k        ; adjust values for
                j = j + 1      ; routine to get the next valid one (below)
              endif
            endif
          endif
        endif
        if !f      ; if not going to new depth
          k = i[(j=j-1)]   ; goto previous depth, get transform from previous depth
          repeat
            if (k=k-1)<0  ; next transform from previous depth, OK ?
              if (j=j-1)<0 ; no so go up another depth, if done
                f = true   ; set flag to exit the repeat
              else
                k = i[j]   ; transform from previous
              endif
            elseif j>0   ; if not on first depth
              if (k=gn[i[j-1],k])>=0  ; transform k valid ?
                f = df[k,j]  ; yes, flag if k valid at depth j
              else
                k = i[(j=j-1)] ; no, go back up
              endif
            else
              f = df[k,0]   ; f = k valid at depth 0
            endif
          until f
          if (j>=0)          ; depth>=0
            t5 = xv[j]       ; retreive ray base point and direction vectors
            t6 = yv[j]      ; for the new depth
            t7 = zv[j]
            dx = dxv[j]
            dy = dyv[j]
            dz = dzv[j]
          endif
        endif
      until j<0
    endif

; If found a point on the fractal store values accordingly
    if (sc[ii]=found)<1e100
      if ii==0
        vv = found
        px[0] = xx + found*xc
        py[0] = yy + found*yc
        pz[0] = zz + found*zc
      elseif abs(vv-found)<onepix
        px[ii] = xx + found*xc
        py[ii] = yy + found*yc
        pz[ii] = zz + found*zc
      else
        sc[ii] = 1e200
      endif
    endif
    ii = ii + 1
  until (ii==1 && nomore) || ii>=5

« Last Edit: June 14, 2008, 02:21:46 AM by David Makin » Logged

The meaning and purpose of life is to give life purpose and meaning.

http://www.fractalgallery.co.uk/
"Makin' Magic Music" on Jango
David Makin
Global Moderator
Fractal Senior
******
Posts: 2286



Makin' Magic Fractals
WWW
« Reply #28 on: June 14, 2008, 01:41:54 PM »

Hi again,

I should have mentioned that I missed out the top of the ii loop, all that does is set up the viewing ray base position, the viewing ray direction vectors and the initial back-clip (end) and the front clip (t4).
Also (fairly important) the initial direction vector of the viewing ray (dx,dy,dz) is normalised.

bye
Dave
« Last Edit: June 14, 2008, 02:23:12 PM by David Makin » Logged

The meaning and purpose of life is to give life purpose and meaning.

http://www.fractalgallery.co.uk/
"Makin' Magic Music" on Jango
lycium
Fractal Supremo
*****
Posts: 1158



WWW
« Reply #29 on: June 14, 2008, 02:55:06 PM »

Also (fairly important) the initial direction vector of the viewing ray (dx,dy,dz) is normalised.

of course, but it seems you forgot about that:

Code:
; Now test if we're within the bailout distance
        if (t9 = gdist - x1*x1 - y1*y1 - z1*z1 \
                 + (x2=1.0/(dx*dx+dy*dy+dz*dz))*(t11=x1*dx+y1*dy+z1*dz)^2)>=0.0

 afro

thanks for the code! it doesn't look very gpu-friendly though, with all that branching... can you explain it in pseudocode perhaps?

to make sure i'm getting this right, you start out with some linear maps M_k which would be used for the normal "chaos game" (stochastic) ifs, from those get their inverses M^{-1}_k and then... well, i've read hart's method back when and would like to brush up on it again, but exactly how does your implementation differ?

thanks, and sorry for all the questions!  lips are sealed
Logged

Pages: 1 [2] 3   Go Down
  Print  
 
Jump to:  


Powered by MySQL Powered by PHP Powered by SMF 1.1.21 | SMF © 2015, Simple Machines

Valid XHTML 1.0! Valid CSS! Dilber MC Theme by HarzeM
Page created in 0.269 seconds with 24 queries. (Pretty URLs adds 0.014s, 2q)