GPU galore

marius

Fractal Lover

Posts: 206

GPU galore

« on: April 18, 2012, 11:26:28 PM »

I'm running with a AMD5850 so far. Which is fine capability-wise, but I want more flops for more fps 720+p 3d. As in real-time.

Anyone here have a AMD7970? How about two in cross-fire? 7.5 Tflops sounds tasty.. grin

Anything that nvdia offers in that budget range (1K$) that would beat it or otherwise would be more compelling?

Anything in the rumor mill that suggests to wait? Cross-fire 7990s for more $$ and more flops? confused


	Logged

cKleinhuis

Administrator
Fractal Senior

Posts: 7044

formerly known as 'Trifox'

Re: GPU galore

« Reply #1 on: April 19, 2012, 12:45:42 AM »

i have bought a radeon card 1 year ago, because they had the single point terraflop that made me take the 6800hd for just 150€ and i am satisfied with it, i am unsure if nvidia catched up the single point throughput today, and i would believe that ati/amd is still in front on the single precision base ... so just go for ati/amd cheesy

but this is just a guess! angel


	Logged

---

divide and conquer - iterate and rule - chaos is No random!

real_het

Forums Freshman

Posts: 13

Re: GPU galore

« Reply #2 on: April 19, 2012, 08:06:36 AM »

Hello,

hd7970: 925MHz * 2048streams * 2mad = 3.7888 TFlops on stock clock
I've tested it on +21% overclock for about 10 minutes, and the temperature was stabilized at 85 celsius, and the fan was like on 40% speed.
So I think it has twice as much overclocking potential than the previous cards (it was like 10%). That's 4.583 TFlops cheesy

For the NVidia, I was curious and searched for the specs of it, and it has two basic clock settings:
- base clock: 1006(idle)..1058 MHz
- boost clock: 1058(idle)..1113 MHz
Let's say we're using the maximum standard freq specified by vendor, so the TFlops will be:
gtx680: 1113MHz * 1536cudacores * 2mad = 3.419 TFlops

If you want to use DP floats, the 7970 will do it in 1/4 SP rate, and the gtx680 will do it on 1/24 rate (8 dp cores per 192 cuda cores).

(I've never tried to render 3d Mandelbrot, but I think on the new GCN architecture it will be more faster: for example it can do jumps/conditional_jumps in a single clock, not like 40 clocks or something. Some other things I've found out: the 7970 needs 4x more threads to launch in order to work optimally and kernels must not use more than 64 registers (128 was the limit earlier). Although it can allocate all the 256 registers for a single wavefront but it will result in a -30% penalty. Lol I can't wait to have some free time finally)


	Logged

Syntopia

Fractal Molossus

Posts: 681

Re: GPU galore

« Reply #3 on: April 19, 2012, 05:09:06 PM »

Quote from: real_het on April 19, 2012, 08:06:36 AM

If you want to use DP floats, the 7970 will do it in 1/4 SP rate, and the gtx680 will do it on 1/24 rate (8 dp cores per 192 cuda cores).

Ouch, 1/24 rate is amazingly bad for double precision. The gtx580 did DP at 1/8 rate, and the Tesla do DP at 1/2 rate. Even though the gtx680 is more than double as fast as the gtx580 for single precision, it will be slower for double precision!

I've always used Nvidia cards, but next time it is going to be an ATI card.


	Logged

taurus

Fractal Supremo

Posts: 1175

Re: GPU galore

« Reply #4 on: April 20, 2012, 08:27:02 AM »

aren't there aspects beyond all that theoretical calculation speed?
i know from the proffessional cad/cam segment, that the drivers are almost more important than pure processing power. that's why amd/ati is almost irrelevant in the professional segment, as the open gl drivers of nvitia are far more effective than those of amd/ati. you need a twice as fast ati card to reach the same open gl performance as nvidia cards.
are there similar effects for open cl or what so ever language, or is the driver not relevant for tasks besides the graphics?


	Logged

when life offers you a lemon, get yourself some salt and tequila!

Syntopia

Fractal Molossus

Posts: 681

Re: GPU galore

« Reply #5 on: April 20, 2012, 09:17:56 AM »

Quote from: taurus66 on April 20, 2012, 08:27:02 AM

aren't there aspects beyond all that theoretical calculation speed?

Sure - the reason I've always chosen Nvidia is because of better drivers (and CUDA). ATI's GLSL compiler seems less robust than Nvidia.

But in terms of double precision performance it is very difficult to ignore ATI - even though the theoretical numbers might not reflect reality, an HD7970 will be almost an order of magnitude faster than the GTX680 for double precision.


	Logged

ker2x

Fractal Molossus

Posts: 795

Re: GPU galore

« Reply #6 on: April 20, 2012, 10:59:55 AM »

Documentation !!!

When i had an ATI (my only ATI, X1600 Pro) i had to use the NVidia documentation to learn about shaders & co.
ATI's website is horrible and it's very hard to find anything but "Look at theses AMAZING shiny things!".

And now, even if NVidia is heavily promoting CUDA vs OpenCL. I bought a NVidia's CUDA-oriented book and learned a lot about OpenCL and how GPU works.
ATI is supposed to promote OpenCL (they don't have CUDA and killed the other "GPGPU languages") but it really hard to find any useful documentation from ATI.
That's one of the reasons i like NVidia more than ATI.

"Sans maitrise, la puissance n'est rien" grin


	Logged

often times... there are other approaches which are kinda crappy until you put them in the context of parallel machines
(en) http://www.blog-gpgpu.com/ , (fr) http://www.keru.org/ ,
Sysadmin & DBA @ http://www.over-blog.com/

real_het

Forums Freshman

Posts: 13

Re: GPU galore

« Reply #7 on: April 20, 2012, 11:06:35 AM »

As far as I know, NV has a lot more complex instruction_decoder that heavily supports out of order execution across every 4 32bitALUs.
When there are lots of dependencies in a sequential code stream, it is efficient to look ahead in the code and alter the execution order when possible in order to feed all execution units. But this needs so many transistors.
This technique is able to get the maximum performance even from a poorly optimized piece of code. (btw the best of this are x86/64 processors, they are designed to dominate benchmarks even if those benchmarks aren't optimized for them at all, this way losing raw performance)

However AMD did it in a different way: The compiler must specify each execution units (4 or 5) what to do and when to do it. If you put data dependent instructions into a single clock cycle, then it will calculate wrong values, no consistency check will be issued by hardware. It's all the compiler's responsibility to generate code which can utilize all execution units in every cycles.
This design is more sensitive for the compile time optimizations, but when your code fits all these requirements, then you can get like 99%-of that theoretical performance.

I think with the new cards they getting closer: In the AMD there is now an intelligent scheduler coordinating 16wide 32bit SIMD ALUs, but it's not Out of order execution, rather like -> sharing resources across wavefronts on every clocks. There are four 16wide SIMD 32bit ALUs, and one 64bit scalar ALU. When you able to feed 4 wavefronts at a time, then these resources can be used at 100% capacity, it means 64vector ops and 16 scalar ops in a single clock (latency is still 2 clocks because its pipelined). In the worst case, when you can only do one wavefronts, the preformance will drop to half of the optimal. This is another importent thing that you must enqueue at least 8192 wavefronts(64threads) to a 7970, and this leads to another problem -> must not use more than 64 registers (out of 256) to let all 4 wavefronts have their own register space.

I think driver is not an issue, when the card get it's job, it can do it all alone until it finishes. The more important thing is how the compilers are working: There is a difficult way from the 'human readable' OpenCL, through the more_or_less readable AMD_IL(intermediate asm-like language), and then finally your idea reaches the machine code level of a super-complicated hardware.


	Logged

real_het

Forums Freshman

Posts: 13

Re: GPU galore

« Reply #8 on: April 20, 2012, 11:22:34 AM »

"Documentation !!!"

Absolutely true grin

Since jan.11 there is no 7970 ISA specification.
They plan to deprecate CAL/AMD_IL (but they can't, because OpenCL sits on top of it :p)

So the up to date specifications for the nem GCN/Southern Islands architecture comes from these sources:
A marketing brochure from 2011 july(?) -> http://developer.amd.com/afds/assets/presentations/2620_final.pdf
And a disassembler (included in driver suite): with it you can monitor the low_level asm generated by OpenCL or AMD_IL programs.

I guess they hate to write docs so much, I can bet even the document writer guys ordered to write code for the OpenCL compiler cheesy


	Logged

Syntopia

Fractal Molossus

Posts: 681

Re: GPU galore

« Reply #9 on: April 20, 2012, 01:37:47 PM »

Quote from: real_het on April 20, 2012, 11:06:35 AM

I've only quickly browsed this review, but as I understand it, NVIDIA has gone back to a simpler in-order execution model for the Kepler architecture to save transistors: http://www.anandtech.com/show/5699/nvidia-geforce-gtx-680-review/3 - I'm no expert here, though - just annoyed about the bad double precision performance.


	Logged

A Noniem

Alien

Posts: 38

Re: GPU galore

« Reply #10 on: April 20, 2012, 02:12:20 PM »

GCN is supposed to be a big improvement when it comes to GPGPU. It's also nice that all GCN cards (starting with the 7750) have double precision. Only downside is that the double precision performance of the 7700/7800 series is relatively low compared to the 7900 series. 7700/7800 series have 1/16th the double precision performance compared to single precision, while this is 1/4th(!!!) for the 7900 series.

For single and double precision Gflops you might want to check out http://en.wikipedia.org/wiki/Southern_Islands_(GPU_family)#Chipset_table

AMD offers more raw Gflop performance compared to nVidia and AMD's openCL drivers are good, if not better than nVidia's (although if you have an nVidia card you probably want to use Cuda anyway)

It seems like you have a high budget, so I'd recommend the 7900 series cards. They have an amazing double precision performance. (almost 1TFLOP double precision for the 7970)


	Logged

Adam Majewski

Fractal Lover

Posts: 221

Re: GPU galore

« Reply #11 on: May 14, 2012, 07:18:18 PM »

I'm thinking of bying PC for gpugpu. If I have understand the experts opinion AMD is better ( faster) but Nvidia have better doc. What should I choose ? OpenCl or CUDA ?
huh?


	Logged

ker2x

Fractal Molossus

Posts: 795

Re: GPU galore

« Reply #12 on: May 15, 2012, 10:23:39 AM »

Quote from: Adam Majewski on May 14, 2012, 07:18:18 PM

I'm thinking of bying PC for gpugpu. If I have understand the experts opinion AMD is better ( faster) but Nvidia have better doc. What should I choose ? OpenCl or CUDA ?
huh?

Only NVidia cards support Cuda.
OpenCL is supported by NVidia and AMD cards and some CPU from Intel and AMD now support OpenCL too.
i suggest OpenCL unless you need something specific provided by a CUDA library, support, and better documentation.

And if you're willing to write your own code, i suggest NVidia

The theoric max peak power is very hard to reach and require a very good knowledge of the GPU architecture ... which is provided by a good documentation grin


	Logged

cbuchner1

Fractal Phenom

Posts: 443

Re: GPU galore

« Reply #13 on: May 15, 2012, 12:45:07 PM »

Here's an argument for OpenCL:

OpenCL is supported by Intel, AMD and nVidia and available on Mac, Windows and Linux. On Intel
CPUs the code will be automatically translated to the SSE or AVX vector instruction set.

OpenCL's major drawback is that it doesn't support some C++ features like templates (which CUDA does).

Christian


	Logged

A Noniem

Alien

Posts: 38

Re: GPU galore

« Reply #14 on: May 15, 2012, 06:19:49 PM »

Cuda however is a more mature language than OpenCL and indeed does include some C++ features which OpenCL completely lacks. Personally I prefer OpenCL over Cuda because of the cross-hardware capabilities and the fact that CUDA is vendor-locked. It's by the way not AMD + OpenCL vs nVidia + Cuda. nVidia also supports OpenCL.


« Last Edit: May 15, 2012, 07:05:00 PM by A Noniem »	Logged

Pages: [1] Go Down

« previous next »

	Author	Topic: GPU galore (Read 2361 times)
		Description: Looking to update
0 Members and 1 Guest are viewing this topic.

Related Topics
	Subject	Started by	Replies	Views	Last post
	Berries Galore Mandelbulb3D Gallery	1Bryan1	0	1181	March 27, 2016, 05:05:45 AM by 1Bryan1

END OF AN ERA, FRACTALFORUMS.COM IS CONTINUED ON FRACTALFORUMS.ORG

it was a great time but no longer maintainable by c.Kleinhuis contact him for any data retrieval,
thanks and see you perhaps in 10 years again

The All New FractalForums is now in Public Beta Testing! Visit FractalForums.org and check it out!

	Welcome, Guest. Please login or register.	March 03, 2026, 06:48:33 AM
		Login with username, password and session length

END OF AN ERA, FRACTALFORUMS.COM IS CONTINUED ON FRACTALFORUMS.ORG

it was a great time but no longer maintainable by c.Kleinhuis contact him for any data retrieval, thanks and see you perhaps in 10 years again

The All New FractalForums is now in Public Beta Testing! Visit FractalForums.org and check it out!

it was a great time but no longer maintainable by c.Kleinhuis contact him for any data retrieval,
thanks and see you perhaps in 10 years again