lycium
|
|
« on: November 30, 2006, 09:13:48 AM » |
|
i recently bought a geforce 8800 gts and have managed to tear myself away from playing neverwinter nights 2 long enough to make this post it's no secret that modern gpus are monsters of floating point computation and they have plentiful, dedicated memory bandwidth with which to support it. the variation of the 8800 i've purchased has 96 processors, almost completely ieee754 compliant, and 640mb of memory rated at 64gb/s. the most exciting thing about this new, general architecture is that it can be used for non-graphics applications, and to this end nvidia have a c compiler, libraries and special driver (which can work concurrently alongside the normal opengl/directx one) programmers can use to tap the hardware capabilities directly - without the need to recast the computational problem in terms of textures and pixel shaders etc. nvidia recently accepted my cuda developer application request, and while i have the software it's unfortunately not compatible with 64bit windows xp yet in fact the current xp64 driver is terrible: however, when stuff does work, the results can be quite impressive indeed: http://www.fractographer.com/propaganda/gf8800mandelbrot.png (this is why i posted in this section) when they get their 64bit stuff up to snuff i'll certainly be looking to move large parts of my 2d and 3d fractal apps' computation to the gpu in an effort to get them working in realtime and/or being able to render high quality hd resolution video more info on the architecure: http://techreport.com/reviews/2006q4/geforce-8800/index.x?pg=1
|
|
|
Logged
|
|
|
|
David Makin
|
|
« Reply #1 on: November 30, 2006, 03:03:33 PM » |
|
Hmm - I can't afford more than around 100 max for a GFX card :-( I'd be most interested in using the GPU to render 3D fractals (of all types).
|
|
|
Logged
|
|
|
|
lycium
|
|
« Reply #2 on: December 01, 2006, 02:57:21 AM » |
|
ray tracing quaternionic julia sets is one of the earliest examples of using the gpu to render 3d fractals; as i alluded to previously earlier implementations had to recast the problem in terms of "standard" 3d rendering operations - in this case a pixel shader was rendering to a screen-sized quad: http://graphics.cs.uiuc.edu/svn/kcrane/web/project_qjulia.htmlthat runs on really cheap boards these days, btw. i'm pretty sure you could pick up a geforce 7600 gt quite affordably, and that's already a very competent board - it doesn't have the outright ridiculous speed of the expensive ones, but it's highly programmable via glsl (opengl) or hlsl (d3d). it would trounce even one of intel's latest quad-core machines in that mandelbrot test i'm sure.
|
|
|
Logged
|
|
|
|
dentaku2
Guest
|
|
« Reply #3 on: January 27, 2007, 12:13:34 PM » |
|
This sounds very exciting! But it has to be supported by the standard graphics APIs (OpenGL). If there would be an OpenGL/Java API to this, I would love to integrate it in my fractal app! Are there any notes about this?
|
|
|
Logged
|
|
|
|
David Makin
|
|
« Reply #4 on: January 27, 2007, 03:57:39 PM » |
|
This sounds very exciting! But it has to be supported by the standard graphics APIs (OpenGL). If there would be an OpenGL/Java API to this, I would love to integrate it in my fractal app! Are there any notes about this?
There is sample code using GPU code fragments to generate the Mandelbrot on ATI's site in the developer section - uses OpenGL: http://ati.amd.com/developer/indexsc.htmlI've downloaded the relevant stuff but haven't tried it yet.
|
|
|
Logged
|
|
|
|
|
lycium
|
|
« Reply #6 on: January 28, 2007, 06:03:46 PM » |
|
the screenshot i posted is in fact of an opengl demo, one that comes with the nvidia sdk available from their website (src included).
|
|
|
Logged
|
|
|
|
dentaku2
Guest
|
|
« Reply #7 on: January 29, 2007, 07:56:22 PM » |
|
|
|
|
Logged
|
|
|
|
lycium
|
|
« Reply #8 on: January 30, 2007, 12:05:03 AM » |
|
the screenshot i posted is in fact of an opengl demo, one that comes with the nvidia sdk available from their website (src included).
http://www.google.co.za/search?q=nvidia+opengl+mandelbrot&start=0&ie=utf-8&oe=utf-8&client=firefox-a&rls=org.mozilla:en-US:officialhttp://download.nvidia.com/developer/SDK/Individual_Samples/featured_samples.htmlthen ctrl+f, "mandelbrot", ... just download the nvidia sdk, get some geforce6-level hardware, find some java opengl bindings, see if they support the glsl extensions, get familiar with the syntax and begin to want a g80 there are 96 stream processors on the gts, capable of both near perfect ieee754 floating point and integer ops, running at 1.2ghz and attached 640mb of fast memory via a 320bit bus. the shaders/execution kernels can also write memory, so the whole thing is very flexible. nvidia's cuda program provides a c compiler and some very nice libraries (relevant to us fractalists); you have to agree to a nondisclosure agreement and i really can't give perf numbers. in any case, whatever you do with the gpu had better be more computationally intensive than a mandelbrot
|
|
|
Logged
|
|
|
|
dentaku2
Guest
|
|
« Reply #9 on: January 30, 2007, 12:11:27 AM » |
|
Of course, this looks very good. But I don't touch anything that is not pure Java. Even Java 3D and JOGL needs system specific libraries and don't offer software only modes - for performance reasons, of course. But my first requirement is platform independency and ease of installation/deployment.
|
|
|
Logged
|
|
|
|
Duncan C
|
|
« Reply #10 on: May 10, 2007, 05:27:18 PM » |
|
lycium,
That card sounds like quite a beast.
Did you ever write a fractal renderer using it's floating point abilities?
And does the math library that's included support double precision? It would be slower than single precision, obviously, but 96 processors doing double precision math with only single precision hardware support would still be a WHOLE LOT faster than 10 or 20 double-precision processors, especially with the fast memory in graphics cards.
I read up on this card based on your post. How much do you have to worry about the architecture of the card? (where the processors are in groups with shared memory access, etc.) Or is all that transparent to you? How does the compiler deal with parallel processing?
This card would be worth buying as a floating point engine if it can be used easily.
Duncan C
|
|
|
Logged
|
Regards,
Duncan C
|
|
|
lycium
|
|
« Reply #11 on: May 11, 2007, 02:07:59 AM » |
|
the CUDA platform, which has been liberated from its NDA-only status btw, does indeed provide double precision functionality, but i think it's both:
1. beta only 2. not supported by hardware, i.e. emulated (10x slower or so than native fp)
the next generation after g80 will apparently have native double precision capabilities.
and yeah, talking about CUDA, you do have to be mindful of the 3-level memory hierarchy to get the best possible performance. since i've written low level code from an early age it's not really all that different a landscape (modern performance coding on the cpu is mostly about cache awareness too), but if you're new to it you might find it difficult; again, the same is true of writing fast code on the cpu, so in the end there's just no free lunch. this is slowly becoming the case too with cpus: ghz is out, multicore is in - that's simply how you best spend transistors these days.
however, the performance of gpus for certain types of parallel workloads (high arithmetic intensity) is just much higher than for cpus, and if you can exploit that then i'd say it's worth it.
|
|
|
Logged
|
|
|
|
doncasteel8587
Guest
|
|
« Reply #12 on: June 13, 2007, 11:39:36 PM » |
|
Of course, this looks very good. But I don't touch anything that is not pure Java. Even Java 3D and JOGL needs system specific libraries and don't offer software only modes - for performance reasons, of course. But my first requirement is platform independency and ease of installation/deployment.
Just a note.... Recent J3D forum discussions had some comments about doing away with the platform specific libraries. I didn't get the details, but if it happens it will be a huge boost to J3D!
|
|
|
Logged
|
|
|
|
alister
Guest
|
|
« Reply #13 on: August 14, 2007, 05:59:02 AM » |
|
I've often wondered how I could use the gpu for a little extra processing power. The best I ever did was use the openGL libs to do some matrix opporations. That was some time ago, and I've long since lost the project files in a hard drive crash.
|
|
|
Logged
|
|
|
|
Duncan C
|
|
« Reply #14 on: May 04, 2008, 04:12:35 PM » |
|
i recently bought a geforce 8800 gts and have managed to tear myself away from playing neverwinter nights 2 long enough to make this post it's no secret that modern gpus are monsters of floating point computation and they have plentiful, dedicated memory bandwidth with which to support it. the variation of the 8800 i've purchased has 96 processors, almost completely ieee754 compliant, and 640mb of memory rated at 64gb/s. the most exciting thing about this new, general architecture is that it can be used for non-graphics applications, and to this end nvidia have a c compiler, libraries and special driver (which can work concurrently alongside the normal opengl/directx one) programmers can use to tap the hardware capabilities directly. lycium, I browsed through the CUDA APIs the other day. Just about everything I looked at was geared towards same instructions, different data (SIDD) processing, and even tailoring your code to avoid branching, because that reduces concurrency and cache hits. I'm interested in using the processor farm in one of these cards for general computing, where sometimes large numbers of processors will be doing completely different work. (Imagine running a hybrid genetic algorithm with neural networks, so the processor farm evolves optimized solutions to a problem.) Where their work is parallel, I would, of course, follow the guidelines for parallel processing to get the most out of the multi-level cache architecture on the chip. I've only done a little multi-threaded development before, and that on traditional multi-processor machines where each processor has it's own cache. I'm hardly an expert on the subject. Do you think the architecture of NVIDIAs 8xxx series (and later) chips is so tuned to SIDD applications that you'd bring it to it's knees doing separate execution paths on different chips? Duncan
|
|
|
Logged
|
Regards,
Duncan C
|
|
|
|