mini supercomputer

lycium

Fractal Supremo

Posts: 1158

Re: mini supercomputer

« Reply #15 on: May 05, 2008, 12:43:28 PM »

it's SIMD, and i'd strongly suggest tackling altivec in your situation, since you have access to an older mac and this experience will let you dabble with cell spu programming (via a ps3), which will be a kind of nirvana for you in terms of mandelbrot flops i think wink

yeah it's a good idea on modern processors to use masks instead of small-scale branching because processors are deeply pipelined to reach high levels of speed (in ghz) and efficiency (in terms of balancing compute vs bandwidth resources); this applies even moreso to gpus, whereas the cell broadband engine is a nice agile mix of both.


	Logged

http://chaoticafractals.com | http://indigorenderer.com | http://lyc.deviantart.com

keldor314

Guest

Re: mini supercomputer

« Reply #16 on: June 10, 2008, 10:47:41 PM »

Actually, concerning the Geforce 8x00, SIMD isn't completely correct either - the 8800GTX, for instance, has 16 fully seperate multiprocessors, each with 16 ALUs which are truely SIMD, for a total of 128 ALUs. What this means is that within each multiprocessor, the 16 SIMD processors will run in lockstep with each other, BUT each of the 16 multiprocessors can branch seperately. Think of it like a 16 core processor, where each processor operates on a 16-wide group of registers simultaneously.

In practice, this will give you perhaps 90% efficiency on fractal rendering, because these 16 units of a given "core" most likely are operating on a 4x4 grid of pixels, meaning that the vast majority of the time, the bailout of one of these pixels is not much sooner than any of the others. The unit must iterate until all 16 pixels are bailed out (or MAX_ITERATIONS is reached), but since all 16 pixels are in a pretty tiny area (about the size of this comma , assuming you're using a small font) they will mostly end up finishing all about at the same time.

Branches are handled by masking out the computations of the units that take the branch, so that they effectively do nothing until all units are "finished" and ready to join in beyond the branch. This means that for a iteration loop, for instance, the units that have hit the bailout will "sit around" and do nothing until all units (in the 4x4 grid, remember) have bailed out. At this point, the multiprocessor can proceed down to the next line of code. Note that the multiprocessor doesn't necessarily have to do the whole MAX_ITERATIONS, just enough that all the processors have reached bailout.

It's interesting to contrast this to the Radeon 2800 (I THINK - I'm not up as well on their specific GPU numbers...), which has 4 separate multiprocessors, each with 16 SIMD processors, each of THOSE in turn having 5 ALUs for a collective total of 320 ALUs. The thing to note here (and why the 320 ALUs do not trump the 128 of the GeForce) is that the groups of 5 ALUs are bound together on the same pixel, rather than split among multiple pixels. This leaves it to the programmer to somehow get their code to do the work for a single pixel to somehow spread out its arithmetic among the 5 ALUs. (Actually, it's a bit more complicated than that, since the 5 ALUs are not exactly SIMD, rather, they are 4+1 SIMD, with 4 standard ALUs in SIMD, and one special super-ALU which kinda supplements the others - e.g. the 5th unit is the only one that has hardware accelerated transcendental operations (sin, cos, e^x, ect) and you might have a opcode that tells the 4 units to multiply, then add the 5th unit to each of their results)

ANYWAY... before I make your headaches from reading this any worse (it's HARD to explain this sort of thing without going even further into the jargon than I already did...), the ultimate point is GPUZ DO L33T FRACTALZ!!11! XD

For example, I have a screensaver I wrote which can render some pretty hairy fractals (e^(sin(a*z)+cos(b*z) for instance) at a smooth framerate (30-60fps on average) with 50 or so iterations at a whopping screen resolution of 2048x1536! What do you do with fractals rendered in 1/40th of a second? Realtime animation, of course! Watch the fractal change as the values of a and b change...

The downside to all this (and it's a big one) is that the GPU only does native single precision floating point - no zooming (at least not much) for you! Still, you can of course write code in the shader to emulate higher precision the same way some fractal programs do deep zoom - it will just run a good deal slower, although still faster than the CPU...

By the way (insert shameless plug) if you want to see that screensaver, you can download it yourself from https://sourceforge.net/projects/gpujuliascreens/. Note that it uses DirectX 10, and will not run on anything less, so don't bother if you don't have Windows Vista and a new graphics card (Geforce 8x00+, Radeon 2*00+).


	Logged

cKleinhuis

Administrator
Fractal Senior

Posts: 7044

formerly known as 'Trifox'

Re: mini supercomputer

« Reply #17 on: June 10, 2008, 10:58:49 PM »

hey there, great to see another GPU fanatic, it is true, the single floating point values are far to few to achieve high deep rates, but this way you stay realtime cheesy

have you experience with the extended floating point unit on nvidia cards ? i heard they can do 64 bit floating point precision - or something, but on the gpu ... afro


	Logged

---

divide and conquer - iterate and rule - chaos is No random!

cKleinhuis

Administrator
Fractal Senior

Posts: 7044

formerly known as 'Trifox'

Re: mini supercomputer

« Reply #18 on: June 10, 2008, 11:06:17 PM »

i see you are using directx for rendering ... why ? i do hope you are not using dx10 .... but be sure to read about my alternating mandelbrot method, this way you can easily combine each of your fractal formula together with many many parameters, i have used this method in my fractalmovies.com screensaver

in fact the method yields fantastic new fractals without changing the calculation complexity

it is simply done by using more than one seed in the iteration, e.g. mandelbrot is:

z2+c

so, if you are using c1 and c2 throughout the iteration, it yields in smooth animation, in fact you can use any number of seeds in one iteration, clearly no more than iteration depth,

but beside of that, it is easy to just switch the whole formula in each step, even this yields smooth animatable results

please check out my screensaver demonstrating this method with a plain mandelbrot formula
http://fractalmovies.com/index.php?option=com_remository&Itemid=21&func=fileinfo&id=19

i have also written some tiny explanation about that:
http://fractalmovies.com/index.php?option=com_content&task=view&id=31&Itemid=34

please take your time to think about it, i think it would be perfect in a gpu accellerated screensaver running on pixel shader 3.0 hardware cheesy

cheers
ck


	Logged

---

divide and conquer - iterate and rule - chaos is No random!

lycium

Fractal Supremo

Posts: 1158

Re: mini supercomputer

« Reply #19 on: June 10, 2008, 11:30:26 PM »

hmm, there's a few extremes going on in this topic wink

on the one hand, i wrote it quite a while ago to explain to a then-uninterested audience (e.g. charles wehner who is probably STILL hacking around with x86 and x87 asm); now there are 3 of us who all code 3d graphics professionally explaining to each other things we already know (the simd comment was directed at duncan c) tongue stuck out


	Logged

http://chaoticafractals.com | http://indigorenderer.com | http://lyc.deviantart.com

keldor314

Guest

Re: mini supercomputer

« Reply #20 on: June 11, 2008, 12:09:21 AM »

LOL, maybe you didn't provide enough shiny pictures? XD

Anyway, thinking about that multiple parameters alternating between c1 and c2 would produce a 4th order julia set, in that every 2 iterations would be z = (z^2+c1)^2+c2
which is z = z^4+2*c1*z^2+c1*c1+c2, right?

The problem would be that the image produced by iteration 39 would be quite different from that from iteration 40, though a lot like iteration 41, so your users might be confused when you tweak the iteration value, and suddenly end up with a much different fractal!

In any case, it does indeed use dx10, partly because part of the reason I started is was to learn dx10 (it's quite a bit different from dx 9), and partly because one of the features of dx10 is the newer hlsl compiler, which supports dynamic branching. The dx9 hlsl compiler only supported static branching - e.g. SM 2. In order to take advantage of dynamic branching, you had to code the shaders in assembler. Ick! After digging through the documentation, I just found that the dx10 hlsl compiler is now the one used for dx9 as well, so it will be happy targeting SM 3 at least.

I guess this means it won't be too horrific to backport the thing to dx9, since it's not actually using any dx10 exclusive features. Still, given that setting up the DX api is about as fun as gouging ones eyes out with a dinner fork...


	Logged

lycium

Fractal Supremo

Posts: 1158

Re: mini supercomputer

« Reply #21 on: June 11, 2008, 12:17:05 AM »

Quote from: keldor314 on June 11, 2008, 12:09:21 AM

LOL, maybe you didn't provide enough shiny pictures? XD

well i haven't done much fractal computing with a gpu tongue stuck out

still prefer my 64bit x86 processor cores.

Quote from: keldor314 on June 11, 2008, 12:09:21 AM

In any case, it does indeed use dx10, partly because part of the reason I started is was to learn dx10 (it's quite a bit different from dx 9), and partly because one of the features of dx10 is the newer hlsl compiler, which supports dynamic branching. The dx9 hlsl compiler only supported static branching - e.g. SM 2.

dx9 definitely supports dynamic branching, in hlsl.

Quote from: keldor314 on June 11, 2008, 12:09:21 AM

I guess this means it won't be too horrific to backport the thing to dx9, since it's not actually using any dx10 exclusive features. Still, given that setting up the DX api is about as fun as gouging ones eyes out with a dinner fork...

even for crysis, dx9 is enough wink

there's a funny fiasco about the dx9/dx10/vista-only issues with that game, where actually you can just enable "dx10 only" features in dx9 via a registry key!

sure, there are things that really only dx10 can do and dx9 can't, but if you're worried about computational capabilities skip dx alltogether and go straight for cuda afro


	Logged

http://chaoticafractals.com | http://indigorenderer.com | http://lyc.deviantart.com

keldor314

Guest

Re: mini supercomputer

« Reply #22 on: June 11, 2008, 12:36:55 AM »

Quote from: lycium on June 11, 2008, 12:17:05 AM

sure, there are things that really only dx10 can do and dx9 can't, but if you're worried about computational capabilities skip dx alltogether and go straight for cuda afro

Problem with that is then instead of being Vista only, it becomes Nvidia only. Besides, CUDA took FOREVER to be released for Vista. As in, the first public Vista beta was only a month or two ago >.< Hence, CUDA will have to wait for my next project.


	Logged

cKleinhuis

Administrator
Fractal Senior

Posts: 7044

formerly known as 'Trifox'

Re: mini supercomputer

« Reply #23 on: June 11, 2008, 12:36:04 PM »

Quote from: keldor314 on June 11, 2008, 12:09:21 AM

Anyway, thinking about that multiple parameters alternating between c1 and c2 would produce a 4th order julia set, in that every 2 iterations would be z = (z^2+c1)^2+c2
which is z = z^4+2*c1*z^2+c1*c1+c2, right?

yes, you are absolutely right. this is the outcome, and you see, that the repeating appears every second iteration, you have a decent control about the evaluation cheesy

Quote

The problem would be that the image produced by iteration 39 would be quite different from that from iteration 40, though a lot like iteration 41, so your users might be confused when you tweak the iteration value, and suddenly end up with a much different fractal!

this is not true !!!

consider the calculated fractal until iteration 39, certain areas of the picture are already bailed out. this means that the next iteration can only create new pixels in the not yet bailouted areas. now the distance between c1 and c2 come into play, if they are nearly the same, you would not see any difference between the c1 only fractal compared to this.

if c1 and c2 are far away from eachother the image simply bails out earlier and at some unexpected positions, but iteration depth alone does not lead to unwanted results, believe me.

the only thing which really changes the outcome is when adding more seeds, please give it a try, replace
your seed variable with a seed[iteration%numSeeds] statement, and animate those seeds with some nice sine curves.

have fun
c.k


	Logged

---

divide and conquer - iterate and rule - chaos is No random!

David Makin

Global Moderator
Fractal Senior

Posts: 2286

Re: mini supercomputer

« Reply #24 on: June 11, 2008, 08:52:46 PM »

Hi,

I'd love to have any one of you guys utilising the GPUs see if you can come up with a realtime GPU version of my escape-time 3D IFS algorithm (based on Hart's method).
It should be possible to get it done just about realtime - for instance the CPU/FPU render version of the 3D Menger Sponge takes around 13 seconds at 640*480 on my P4HT and the algorithm as written allows for any affine transforms - obviously there's a lot of potential for optimisation if the GPU version is restricted to a particular fractal such as the Sponge.

If anyone's interested the algorithm is the 3D IFS formula in mmf4.ufm from the Ultrafractal Formula database - http://formulas.ultrafractal.com/ - go to browse collection and "Makin, Dave".

bye
Dave


	Logged

The meaning and purpose of life is to give life purpose and meaning.

http://www.fractalgallery.co.uk/
"Makin' Magic Music" on Jango

lycium

Fractal Supremo

Posts: 1158

Re: mini supercomputer

« Reply #25 on: June 13, 2008, 08:50:19 AM »

i've tried to grok that src a few times, always finding:

1. the source is HUGE!
2. the variable names aren't very descriptive Sceptical

3. lack of vector functions etc (everything is super inline) means the code is quite opaque and just dropping it into cg or hlsl/glsl wouldn't work.
4. no idea where the meat of the algorithm is, because there's so much uf-specific stuff around it...

could you post the bit that would actually get ported here, please?


	Logged

http://chaoticafractals.com | http://indigorenderer.com | http://lyc.deviantart.com

David Makin

Global Moderator
Fractal Senior

Posts: 2286

Re: mini supercomputer

« Reply #26 on: June 13, 2008, 12:38:24 PM »

Quote from: lycium on June 13, 2008, 08:50:19 AM

i've tried to grok that src a few times, always finding:

1. the source is HUGE!
2. the variable names aren't very descriptive Sceptical

1. I know, there's quite a bit that's superfluos to a GPU implimentation (such as calculating the fractal dimension of thte object).
2. That's just me, I'm a lazy typist (1 hand only).
3. In UF4 you can only do it "inline".
4. I'll post something here tonight - with comments

(Am at work now)

bye
Dave


	Logged

The meaning and purpose of life is to give life purpose and meaning.

http://www.fractalgallery.co.uk/
"Makin' Magic Music" on Jango

David Makin

Global Moderator
Fractal Senior

Posts: 2286

Re: mini supercomputer

« Reply #27 on: June 14, 2008, 02:06:40 AM »

Hi Thomas and all,

Here's the meat of the routine:
I should point out that the transforms here in the arrays are the reverse transforms i.e. the expanding ones, though I haven't changed the sign of the translations from the original and the scale values are the original contracting values.
You should also note that the order you put the transforms in will greatly affect the render speed especially for fractals without rotation/skew such as the Menger - it's best to have a pre-test routine to find which transforms have areas closest to the viewpoint and order the transforms so the closest are done first.

x1 = gtestx - tx ; gtestx, gtesty and gtestz are the centre of the
y1 = gtesty - ty ; bailout test sphere
z1 = gtestz - tz ; tx, ty and tz are the base point on the viewing ray
found = 1e200 ; "found" is a distance, large value means "not found"
nomore = true ; flag is initially true to indicate no extra rays to traverse
if x1*x1 + y1*y1 + z1*z1 - (x1*dx + y1*dy + z1*dz)^2<=gdist
; tests if distance of nearest approach of ray to centre of test sphere is less
; than the bailout distance (gdist), if not then we've missed the fractal
xv[0] = t5 = tx ; set temp variables and ray position for depth 0
yv[0] = t6 = ty
zv[0] = t7 = tz
dxv[0] = dx ; dx,dy and dz are the viewing ray's direction vector
dyv[0] = dy ; here we set the direction vectors at depth 0
dzv[0] = dz
k = gft ; k = the first transform, gft
s[0] = 1.0 ; scale at depth 0 = 1.0
t12 = 1e200 ; not really necessary (see t12 below)
j = 0 ; start at depth 0 in the transform tree
repeat
ss = s[j]*gt[(i[j]=k),12] ; scale = oldscale*scale for this transform
; In the above note the i[j]=k i.e. set transform at depth j to k
; also note that the scale values are the original contracting values
; now transform the base point on the ray, storing in "current" values
; temp variables and variables for the next depth - note the (j=j+1)
xv[(j=j+1)] = t5 = (x1=t5-gt[k,9])*gt[k,0] + (y1=t6-gt[k,10])*gt[k,3] \
+ (z1=t7-gt[k,11])*gt[k,6]
yv[j] = t6 = x1*gt[k,1] + y1*gt[k,4] + z1*gt[k,7]
zv[j] = t7 = x1*gt[k,2] + y1*gt[k,5] + z1*gt[k,8]
; transform the ray's direction vectors and store in current and for depth
dxv[j] = dx = (x1=dx)*gt[k,0] + (y1=dy)*gt[k,3] + dz*gt[k,6]
dyv[j] = dy = x1*gt[k,1] + y1*gt[k,4] + dz*gt[k,7]
dzv[j] = dz = x1*gt[k,2] + y1*gt[k,5] + dz*gt[k,8]
x1 = gtestx - t5 ; distances from bailout sphere centre
y1 = gtesty - t6
z1 = gtestz - t7
f = false ; a flag to make things easier (see below)
s[j] = ss ; set scale for this depth
; Now test if we're within the bailout distance
if (t9 = gdist - x1*x1 - y1*y1 - z1*z1 \
+ (x2=1.0/(dx*dx+dy*dy+dz*dz))*(t11=x1*dx+y1*dy+z1*dz)^2)>=0.0
; if we are compute the distances along the ray to the intersections with the
; sphere (in original unscaled terms)
t12 = (t11=x2*t11) - (t9=sqrt(x2*t9))
; if the nearest distance is <= the distance to the back clip (end)
; and the farthest is >= to the front clip
if t12<=end && t11+t9>=t4
; if we've reached the max depth (@igen) or
; the depth is >= the minimum depth (@imin) and we've gone below the min scale
if j>=@igen || (j>=@imin && ss<minscale)
; You can safely ignore the "if @flat bit and use the "else" only
if @flat
if t11<found
found = end = t11
if (@colouring==0 || @colouring>3) && @colmeth==0
nomore = false
endif
if (@colouring>0 || @colmeth==1) && ii==0
jj = l = j
while (l=l-1)>=0
is[l] = i[l]
endwhile
endif
endif
else
if t12<found ; if the nearest intersection distance is < the best found
found = end = t12 ; set found AND the back clip to the new distance
; if using lighting set nomore to false so we calculate the adjacent rays
; for getting the normals
if (@colouring==0 || @colouring>3) && @colmeth==0
nomore = false
endif
; store the transform numbers from each depth for certain colourings
if (@colouring>0 || @colmeth==1) && ii==0
jj = l = j
while (l=l-1)>=0
is[l] = i[l]
endwhile
endif
endif
endif
; else (within clips but not found a collision) get first transform at next depth
elseif (k=gn[i[j-1],gntm1])>=0
; check depth flag for next transform k at depth j
if !(f=df[k,j]) ; if k not allowed
i[j] = k ; adjust values for
j = j + 1 ; routine to get the next valid one (below)
endif
endif
endif
endif
if !f ; if not going to new depth
k = i[(j=j-1)] ; goto previous depth, get transform from previous depth
repeat
if (k=k-1)<0 ; next transform from previous depth, OK ?
if (j=j-1)<0 ; no so go up another depth, if done
f = true ; set flag to exit the repeat
else
k = i[j] ; transform from previous
endif
elseif j>0 ; if not on first depth
if (k=gn[i[j-1],k])>=0 ; transform k valid ?
f = df[k,j] ; yes, flag if k valid at depth j
else
k = i[(j=j-1)] ; no, go back up
endif
else
f = df[k,0] ; f = k valid at depth 0
endif
until f
if (j>=0) ; depth>=0
t5 = xv[j] ; retreive ray base point and direction vectors
t6 = yv[j] ; for the new depth
t7 = zv[j]
dx = dxv[j]
dy = dyv[j]
dz = dzv[j]
endif
endif
until j<0
endif

; If found a point on the fractal store values accordingly
if (sc[ii]=found)<1e100
if ii==0
vv = found
px[0] = xx + found*xc
py[0] = yy + found*yc
pz[0] = zz + found*zc
elseif abs(vv-found)<onepix
px[ii] = xx + found*xc
py[ii] = yy + found*yc
pz[ii] = zz + found*zc
else
sc[ii] = 1e200
endif
endif
ii = ii + 1
until (ii==1 && nomore) || ii>=5


« Last Edit: June 14, 2008, 02:21:46 AM by David Makin »	Logged

The meaning and purpose of life is to give life purpose and meaning.

http://www.fractalgallery.co.uk/
"Makin' Magic Music" on Jango

David Makin

Global Moderator
Fractal Senior

Posts: 2286

Re: mini supercomputer

« Reply #28 on: June 14, 2008, 01:41:54 PM »

Hi again,

I should have mentioned that I missed out the top of the ii loop, all that does is set up the viewing ray base position, the viewing ray direction vectors and the initial back-clip (end) and the front clip (t4).
Also (fairly important) the initial direction vector of the viewing ray (dx,dy,dz) is normalised.

bye
Dave


« Last Edit: June 14, 2008, 02:23:12 PM by David Makin »	Logged

The meaning and purpose of life is to give life purpose and meaning.

http://www.fractalgallery.co.uk/
"Makin' Magic Music" on Jango

lycium

Fractal Supremo

Posts: 1158

Re: mini supercomputer

« Reply #29 on: June 14, 2008, 02:55:06 PM »

Quote from: David Makin on June 14, 2008, 01:41:54 PM

Also (fairly important) the initial direction vector of the viewing ray (dx,dy,dz) is normalised.

of course, but it seems you forgot about that:

Quote from: David Makin on June 14, 2008, 02:06:40 AM

Code:

; Now test if we're within the bailout distance
        if (t9 = gdist - x1*x1 - y1*y1 - z1*z1 \
                 + (x2=1.0/(dx*dx+dy*dy+dz*dz))*(t11=x1*dx+y1*dy+z1*dz)^2)>=0.0

thanks for the code! it doesn't look very gpu-friendly though, with all that branching... can you explain it in pseudocode perhaps?

to make sure i'm getting this right, you start out with some linear maps M_k which would be used for the normal "chaos game" (stochastic) ifs, from those get their inverses M^{-1}_k and then... well, i've read hart's method back when and would like to brush up on it again, but exactly how does your implementation differ?

thanks, and sorry for all the questions! lips are sealed


	Logged

http://chaoticafractals.com | http://indigorenderer.com | http://lyc.deviantart.com

Pages: 1 [2] 3 Go Down

« previous next »

	Author	Topic: mini supercomputer (Read 20819 times)
		Description:
0 Members and 1 Guest are viewing this topic.

Related Topics
	Subject	Started by	Replies	Views	Last post
	Has anyone ever had access to a supercomputer? Non-Fractal related Chit-Chat	aluminumstudios	12	3867	April 12, 2010, 04:52:49 AM by aluminumstudios
	New Fujitsu Supercomputer Has 23 Petaflop Potential Non-Fractal related Chit-Chat	ker2x	4	2503	November 10, 2011, 11:46:06 PM by Aexion
	Supercomputer Images Showcase (Rate My Fractal)	Pauldelbrot	0	1433	June 13, 2012, 01:07:30 AM by Pauldelbrot
	Double apparition of Louis XIV at the stairways of a Mini-minibulb in the... Mandelbulb3D Gallery	KRAFTWERK	0	1671	April 19, 2013, 03:52:19 PM by KRAFTWERK
	Mini-mandelbrot Images Showcase (Rate My Fractal)	kubinator4321	1	1521	December 02, 2014, 03:33:26 PM by SeryZone

END OF AN ERA, FRACTALFORUMS.COM IS CONTINUED ON FRACTALFORUMS.ORG

it was a great time but no longer maintainable by c.Kleinhuis contact him for any data retrieval,
thanks and see you perhaps in 10 years again

The All New FractalForums is now in Public Beta Testing! Visit FractalForums.org and check it out!

	Welcome, Guest. Please login or register.	February 28, 2026, 12:31:28 PM
		Login with username, password and session length

END OF AN ERA, FRACTALFORUMS.COM IS CONTINUED ON FRACTALFORUMS.ORG

it was a great time but no longer maintainable by c.Kleinhuis contact him for any data retrieval, thanks and see you perhaps in 10 years again

The All New FractalForums is now in Public Beta Testing! Visit FractalForums.org and check it out!

it was a great time but no longer maintainable by c.Kleinhuis contact him for any data retrieval,
thanks and see you perhaps in 10 years again