Logo by Maya - Contribute your own Logo!

END OF AN ERA, FRACTALFORUMS.COM IS CONTINUED ON FRACTALFORUMS.ORG

it was a great time but no longer maintainable by c.Kleinhuis contact him for any data retrieval,
thanks and see you perhaps in 10 years again

this forum will stay online for reference
News: Support us via Flattr FLATTR Link
 
*
Welcome, Guest. Please login or register. April 25, 2024, 11:57:38 PM


Login with username, password and session length


The All New FractalForums is now in Public Beta Testing! Visit FractalForums.org and check it out!


Pages: [1]   Go Down
  Print  
Share this topic on DiggShare this topic on FacebookShare this topic on GoogleShare this topic on RedditShare this topic on StumbleUponShare this topic on Twitter
Author Topic: GPU galore  (Read 1867 times)
Description: Looking to update
0 Members and 1 Guest are viewing this topic.
marius
Fractal Lover
**
Posts: 206


« on: April 18, 2012, 11:26:28 PM »

I'm running with a AMD5850 so far. Which is fine capability-wise, but I want more flops for more fps 720+p 3d. As in real-time.

Anyone here have a AMD7970? How about two in cross-fire? 7.5 Tflops sounds tasty..  grin

Anything that nvdia offers in that budget range (1K$) that would beat it or otherwise would be more compelling?

Anything in the rumor mill that suggests to wait? Cross-fire 7990s for more $$ and more flops? confused
Logged
cKleinhuis
Administrator
Fractal Senior
*******
Posts: 7044


formerly known as 'Trifox'


WWW
« Reply #1 on: April 19, 2012, 12:45:42 AM »

i have bought a radeon card 1 year ago, because they had the single point terraflop that made me take the 6800hd for just 150€ and i am satisfied with it, i am unsure if nvidia catched up the single point throughput today, and i would believe that ati/amd is still in front on the single precision base ... so just go for ati/amd cheesy but this is just a guess! angel
Logged

---

divide and conquer - iterate and rule - chaos is No random!
real_het
Forums Freshman
**
Posts: 13


« Reply #2 on: April 19, 2012, 08:06:36 AM »

Hello,

hd7970: 925MHz * 2048streams * 2mad = 3.7888 TFlops  on stock clock
I've tested it on +21% overclock for about 10 minutes, and the temperature was stabilized at 85 celsius, and the fan was like on 40% speed.
So I think it has twice as much overclocking potential than the previous cards (it was like 10%). That's 4.583 TFlops cheesy

For the NVidia, I was curious and searched for the specs of it, and it has two basic clock settings:
- base clock:  1006(idle)..1058 MHz
- boost clock: 1058(idle)..1113 MHz
Let's say we're using the maximum standard freq specified by vendor, so the TFlops will be:
gtx680: 1113MHz * 1536cudacores * 2mad = 3.419 TFlops

If you want to use DP floats, the 7970 will do it in 1/4 SP rate, and the gtx680 will do it on 1/24 rate (8 dp cores per 192 cuda cores).

(I've never tried to render 3d Mandelbrot, but I think on the new GCN architecture it will be more faster: for example it can do jumps/conditional_jumps in a single clock, not like 40 clocks or something. Some other things I've found out: the 7970 needs 4x more threads to launch in order to work optimally and kernels must not use more than 64 registers (128 was the limit earlier). Although it can allocate all the 256 registers for a single wavefront but it will result in a -30% penalty. Lol I can't wait to have some free time finally)
Logged
Syntopia
Fractal Molossus
**
Posts: 681



syntopiadk
WWW
« Reply #3 on: April 19, 2012, 05:09:06 PM »

If you want to use DP floats, the 7970 will do it in 1/4 SP rate, and the gtx680 will do it on 1/24 rate (8 dp cores per 192 cuda cores).

Ouch, 1/24 rate is amazingly bad for double precision. The gtx580 did DP at 1/8 rate, and the Tesla do DP at 1/2 rate. Even though the gtx680 is more than double as fast as the gtx580 for single precision, it will be slower for double precision!

I've always used Nvidia cards, but next time it is going to be an ATI card.

Logged
taurus
Fractal Supremo
*****
Posts: 1175



profile.php?id=1339106810 @taurus_arts_66
WWW
« Reply #4 on: April 20, 2012, 08:27:02 AM »

aren't there aspects beyond all that theoretical calculation speed?
i know from the proffessional cad/cam segment, that the drivers are almost more important than pure processing power. that's why amd/ati is almost irrelevant in the professional segment, as the open gl drivers of nvitia are far more effective than those of amd/ati. you need a twice as fast ati card to reach the same open gl performance as nvidia cards.
are there similar effects for open cl or what so ever language, or is the driver not relevant for tasks besides the graphics?
Logged

when life offers you a lemon, get yourself some salt and tequila!
Syntopia
Fractal Molossus
**
Posts: 681



syntopiadk
WWW
« Reply #5 on: April 20, 2012, 09:17:56 AM »

aren't there aspects beyond all that theoretical calculation speed?

Sure - the reason I've always chosen Nvidia is because of better drivers (and CUDA). ATI's GLSL compiler seems less robust than Nvidia.

But in terms of double precision performance it is very difficult to ignore ATI - even though the theoretical numbers might not reflect reality, an HD7970 will be almost an order of magnitude faster than the GTX680 for double precision.
Logged
ker2x
Fractal Molossus
**
Posts: 795


WWW
« Reply #6 on: April 20, 2012, 10:59:55 AM »

Documentation !!!

When i had an ATI (my only ATI, X1600 Pro) i had to use the NVidia documentation to learn about shaders & co.
ATI's website is horrible and it's very hard to find anything but "Look at theses AMAZING shiny things!".

And now, even if NVidia is heavily promoting CUDA vs OpenCL. I bought a NVidia's CUDA-oriented book and learned a lot about OpenCL and how GPU works.
ATI is supposed to promote OpenCL (they don't have CUDA and killed the other "GPGPU languages") but it really hard to find any useful documentation from ATI.
That's one of the reasons i like NVidia more than ATI.

"Sans maitrise, la puissance n'est rien"  grin
Logged

often times... there are other approaches which are kinda crappy until you put them in the context of parallel machines
(en) http://www.blog-gpgpu.com/ , (fr) http://www.keru.org/ ,
Sysadmin & DBA @ http://www.over-blog.com/
real_het
Forums Freshman
**
Posts: 13


« Reply #7 on: April 20, 2012, 11:06:35 AM »

As far as I know, NV has a lot more complex instruction_decoder that heavily supports out of order execution across every 4 32bitALUs.
When there are lots of dependencies in a sequential code stream, it is efficient to look ahead in the code and alter the execution order when possible in order to feed all execution units. But this needs so many transistors.
This technique is able to get the maximum performance even from a poorly optimized piece of code. (btw the best of this are x86/64 processors, they are designed to dominate benchmarks even if those benchmarks aren't optimized for them at all, this way losing raw performance)

However AMD did it in a different way: The compiler must specify each execution units (4 or 5) what to do and when to do it. If you put data dependent instructions into a single clock cycle, then it will calculate wrong values, no consistency check will be issued by hardware. It's all the compiler's responsibility to generate code which can utilize all execution units in every cycles.
This design is more sensitive for the compile time optimizations, but when your code fits all these requirements, then you can get like 99%-of that theoretical performance.

I think with the new cards they getting closer: In the AMD there is now an intelligent scheduler coordinating 16wide 32bit SIMD ALUs, but it's not Out of order execution, rather like -> sharing resources across wavefronts on every clocks. There are four 16wide SIMD 32bit ALUs, and one 64bit scalar ALU. When you able to feed 4 wavefronts at a time, then these resources can be used at 100% capacity, it means 64vector ops and 16 scalar ops in a single clock (latency is still 2 clocks because its pipelined). In the worst case, when you can only do one wavefronts, the preformance will drop to half of the optimal. This is another importent thing that you must enqueue at least 8192 wavefronts(64threads) to a 7970, and this leads to another problem -> must not use more than 64 registers (out of 256) to let all 4 wavefronts have their own register space.

I think driver is not an issue, when the card get it's job, it can do it all alone until it finishes. The more important thing is how the compilers are working: There is a difficult way from the 'human readable' OpenCL, through the more_or_less readable AMD_IL(intermediate asm-like language), and then finally your idea reaches the machine code level of a super-complicated hardware.
Logged
real_het
Forums Freshman
**
Posts: 13


« Reply #8 on: April 20, 2012, 11:22:34 AM »

"Documentation !!!"

Absolutely true grin

Since jan.11 there is no 7970 ISA specification.
They plan to deprecate CAL/AMD_IL (but they can't, because OpenCL sits on top of it :p)

So the up to date specifications for the nem GCN/Southern Islands architecture comes from these sources:
A marketing brochure from 2011 july(?) -> http://developer.amd.com/afds/assets/presentations/2620_final.pdf
And a disassembler (included in driver suite): with it you can monitor the low_level asm generated by OpenCL or AMD_IL programs.

I guess they hate to write docs so much, I can bet even the document writer guys ordered to write code for the OpenCL compiler  cheesy
Logged
Syntopia
Fractal Molossus
**
Posts: 681



syntopiadk
WWW
« Reply #9 on: April 20, 2012, 01:37:47 PM »

As far as I know, NV has a lot more complex instruction_decoder that heavily supports out of order execution across every 4 32bitALUs.
When there are lots of dependencies in a sequential code stream, it is efficient to look ahead in the code and alter the execution order when possible in order to feed all execution units. But this needs so many transistors.

I've only quickly browsed this review, but as I understand it, NVIDIA has gone back to a simpler in-order execution model for the Kepler architecture to save transistors: http://www.anandtech.com/show/5699/nvidia-geforce-gtx-680-review/3 - I'm no expert here, though - just annoyed about the bad double precision performance.
Logged
A Noniem
Alien
***
Posts: 38


« Reply #10 on: April 20, 2012, 02:12:20 PM »

GCN is supposed to be a big improvement when it comes to GPGPU. It's also nice that all GCN cards (starting with the 7750) have double precision. Only downside is that the double precision performance of the 7700/7800 series is relatively low compared to the 7900 series. 7700/7800 series have 1/16th the double precision performance compared to single precision, while this is 1/4th(!!!) for the 7900 series.

For single and double precision Gflops you might want to check out http://en.wikipedia.org/wiki/Southern_Islands_(GPU_family)#Chipset_table

AMD offers more raw Gflop performance compared to nVidia and AMD's openCL drivers are good, if not better than nVidia's (although if you have an nVidia card you probably want to use Cuda anyway)

It seems like you have a high budget, so I'd recommend the 7900 series cards. They have an amazing double precision performance. (almost 1TFLOP double precision for the 7970)
Logged
Adam Majewski
Fractal Lover
**
Posts: 221


WWW
« Reply #11 on: May 14, 2012, 07:18:18 PM »

I'm thinking of bying PC for gpugpu. If I have understand the experts opinion AMD is better ( faster) but Nvidia have better doc. What should I choose ? OpenCl or CUDA ?
 huh?
Logged
ker2x
Fractal Molossus
**
Posts: 795


WWW
« Reply #12 on: May 15, 2012, 10:23:39 AM »

I'm thinking of bying PC for gpugpu. If I have understand the experts opinion AMD is better ( faster) but Nvidia have better doc. What should I choose ? OpenCl or CUDA ?
 huh?

Only NVidia cards support Cuda.
OpenCL is supported by NVidia and AMD cards and some CPU from Intel and AMD now support OpenCL too.
i suggest OpenCL unless you need something specific provided by a CUDA library, support, and better documentation.

And if you're willing to write your own code, i suggest NVidia smiley
The theoric max peak power is very hard to reach and require a very good knowledge of the GPU architecture ... which is provided by a good documentation  grin
Logged

often times... there are other approaches which are kinda crappy until you put them in the context of parallel machines
(en) http://www.blog-gpgpu.com/ , (fr) http://www.keru.org/ ,
Sysadmin & DBA @ http://www.over-blog.com/
cbuchner1
Fractal Phenom
******
Posts: 443


« Reply #13 on: May 15, 2012, 12:45:07 PM »


Here's an argument for OpenCL:

OpenCL is supported by Intel, AMD and nVidia and available on Mac, Windows and Linux. On Intel
CPUs the code will be automatically translated to the SSE or AVX vector instruction set.

OpenCL's major drawback is that it doesn't support some C++ features like templates (which CUDA does).

Christian
Logged
A Noniem
Alien
***
Posts: 38


« Reply #14 on: May 15, 2012, 06:19:49 PM »

Cuda however is a more mature language than OpenCL and indeed does include some C++ features which OpenCL completely lacks. Personally I prefer OpenCL over Cuda because of the cross-hardware capabilities and the fact that CUDA is vendor-locked. It's by the way not AMD + OpenCL vs nVidia + Cuda. nVidia also supports OpenCL.
« Last Edit: May 15, 2012, 07:05:00 PM by A Noniem » Logged
Pages: [1]   Go Down
  Print  
 
Jump to:  

Related Topics
Subject Started by Replies Views Last post
Berries Galore Mandelbulb3D Gallery 1Bryan1 0 749 Last post March 27, 2016, 05:05:45 AM
by 1Bryan1

Powered by MySQL Powered by PHP Powered by SMF 1.1.21 | SMF © 2015, Simple Machines

Valid XHTML 1.0! Valid CSS! Dilber MC Theme by HarzeM
Page created in 0.309 seconds with 24 queries. (Pretty URLs adds 0.012s, 2q)