If you liked Assembler, you’ll love OpenCL (and Cuda too)

ker2x

Fractal Molossus

Posts: 795

If you liked Assembler, you’ll love OpenCL (and Cuda too)

« on: July 30, 2010, 09:18:22 AM »

i found this article http://www.streamcomputing.nl/blog/2010-07-22/the-rise-of-the-gpgpu-compilers on http://www.khronos.org/news/C124/ (home of openCL, openGL, ... standard) homepage

I agree (of course grin

), i like assembler (as long as it support SSE2 and higher instruction), i do some assembler (pureBasic + TASM), and i love OpenCL cheesy

(i wanted to do GPGPU since the 1st 3Dfx, of course it was nearly impossible to do, and i had to wait for Cuda & OpenCL (i tried cGL, but too weird for me))

I bought the NVidia book "Programming Massively parallel processor", the book is all about speed, optimization, speed and optimization (and speed (and optimization)).
That make sense, the only goal of GPGPU is ... being fast, and faster (and maybe the fastest, too). So the book focus on speed and optimization and how the GPU architecture can be used at its best (and there is a lot to learn, it's totally different than CPU architecture)

If you do not care about speed, forget about GPGPU, it's weird, it's a completely different architecture, it have some crazy limitation and behaviours, memory access is painfully slow compared to arithmetic operation, ... eg : an addition take 1/8th cycle to proceed, a global memory access take 800 cycles of latency. angry

If you care about speed. Well ... as long as your problem is embarrassingly parallel, it's insanely incredibly f*cking fast ! Forget about what you know (easy for me, i don't know much about programming) and be ready to run hundreds of thousands of threads per second per core (did i mentioned that high-end gpu card have 300~500 cores ? each one running at more than 1Ghz ?).

High-end GFX card support 64bits floating point operation, the Tesla M2050/M2070 run at 500GFlops double precision, or 1TeraFlops(!) single precision.
An intel Q6600 run at 38GFlops, an high-end i7 cpu run at ~50GFlops.

Most CPU optimization are about trading Memory vs CPU, which is a bad thing to do with GPU, but you have more than 8000 registers (forgot the exact amount) and "read-after-write" register latency is "only" 24 cycles. Okay, that still sux, considering you can do 8 additions/cycle, but memory access latency can be hidden if you run many many many threads (another thread can run while another is waiting for memory latency).

My best optimization (for now) involved : Bruteforce \o/

And... ho wait, i'm being late for work, bbl ... sad


« Last Edit: July 30, 2010, 09:20:00 AM by ker2x »	Logged

often times... there are other approaches which are kinda crappy until you put them in the context of parallel machines
(en) http://www.blog-gpgpu.com/ , (fr) http://www.keru.org/ ,
Sysadmin & DBA @ http://www.over-blog.com/

kram1032

Fractal Senior

Posts: 1863

Re: If you liked Assembler, you’ll love OpenCL (and Cuda too)

« Reply #1 on: July 30, 2010, 09:04:35 PM »

Quote from: ker2x on July 30, 2010, 09:18:22 AM

speed and optimization (and speed (and optimization)).

Are that few iterations enough to show the fractal structure of this?


	Logged

ker2x

Fractal Molossus

Posts: 795

Re: If you liked Assembler, you’ll love OpenCL (and Cuda too)

« Reply #2 on: July 31, 2010, 11:00:59 PM »

Quote from: kram1032 on July 30, 2010, 09:04:35 PM

Quote from: ker2x on July 30, 2010, 09:18:22 AM

speed and optimization (and speed (and optimization)).

Are that few iterations enough to show the fractal structure of this?

Only if coded in LISP


	Logged

Synaesthesia

Forums Newbie

Posts: 2

Re: If you liked Assembler, you’ll love OpenCL (and Cuda too)

« Reply #3 on: August 03, 2010, 02:39:25 PM »

Thank you, I wanna try this out..


	Logged

Duncan C

Fractal Fanatic

Posts: 348

Re: If you liked Assembler, you’ll love OpenCL (and Cuda too)

« Reply #4 on: August 24, 2010, 01:54:01 PM »

Quote from: ker2x on July 30, 2010, 09:18:22 AM

), i like assembler (as long as it support SSE2 and higher instruction), i do some assembler (pureBasic + TASM), and i love OpenCL cheesy

ker2x,

Thanks for the description. Does the NVIDA book cover OpenCL?

And is there any built-in support for higher precision math?

I really want a card that supports double precision in hardware. However, Apple is really terrible about video card support. They support only a couple of cards for any given machine, and they are always yesterday's middle-of-the-road cards.

On another subject, from the reading I've done about GPGPU, they are not great for fractal calculations because they are highly tuned to SIMD operations. How do you handle situations where each GPU needs to run a different code-path in order to iterate a separate pixel?

Duncan C


	Logged

Regards,

Duncan C

ker2x

Fractal Molossus

Posts: 795

Re: If you liked Assembler, you’ll love OpenCL (and Cuda too)

« Reply #5 on: September 06, 2010, 09:11:31 AM »

Quote from: Duncan C on August 24, 2010, 01:54:01 PM

ker2x,
Thanks for the description. Does the NVIDA book cover OpenCL?

Only a small chapter.
If you follow the NVidia book chapter by chapter (and you should) you learn CUDA first, then there is a chapter that explain how to write an OpenCL code according to what you wrote in CUDA.

It's not so bad, if you didn't knew OpenCL first.
The problem is : there is a *lot* to remember and the names are sometimes very confusing, and the naming differences between OpenCL and CUDA even more confusing.

Beside that, it's the very same architecture, you write the very same C code for the Cuda/OpenCL Kernel and function, and the optimisation tips and tricks are exactly the sames.
OpenCL and CUDA are just 2 differents API to "talk" to the GPU Card. But the code you upload to the GPU is the same code for both, it's only the "host" (cpu) code that change.

Quote

And is there any built-in support for higher precision math?

i don't know any useable arbitrary precision lib for gpu yet. you should probably take a close look to http://www.mpir.org/

Quote

I really want a card that supports double precision in hardware. However, Apple is really terrible about video card support. They support only a couple of cards for any given machine, and they are always yesterday's middle-of-the-road cards.

what you (usually) want is the latest generation of GPU (with latest feature, eg: double precision) , not the fastest card.
I'm perfectly happy developping GPU code on my laptop powered by a ION2.
I'm not buying a $150 latest generation GPU card for my desktop because i'm a gamer and my happier with my old highend 8800GTX than a last-generation low-end GFX Card.

If you're not gaming (or happy with gaming on apple hardware (lol?)) i suggest to buy a low end PC with a low-end last generation gfx card, and here you go !

Quote

On another subject, from the reading I've done about GPGPU, they are not great for fractal calculations because they are highly tuned to SIMD operations. How do you handle situations where each GPU needs to run a different code-path in order to iterate a separate pixel?

I didn't know any nice way to avoid that problem. So ... Bruteforce and confidence into the GPU Scheduler. embarrass

As far as i understood, different code-path do not block all the others threads just 1 cuda core (middle-range and high-end card have between 256 to 512 Cores)

Considering our (fractals) problem, you will not have 100% GPU occupancy, live with it.
But even a 50% Occupancy is still incredibely fast


	Logged

cbuchner1

Fractal Phenom

Posts: 443

Re: If you liked Assembler, you’ll love OpenCL (and Cuda too)

« Reply #6 on: September 06, 2010, 11:07:09 AM »

yeah scheduling granularity are "warps" of 32 threads. If you compute rectangular chunks of 8x4 pixels in each warp, for example, you will get a good spatial localization of each warp. Meaning the iteration depth for all threads in this warp should be similar (statistically speaking).

Christian


	Logged

Pages: [1] Go Down

« previous next »

	Author	Topic: If you liked Assembler, you’ll love OpenCL (and Cuda too) (Read 5860 times)
		Description:
0 Members and 1 Guest are viewing this topic.

Related Topics
	Subject	Started by	Replies	Views	Last post
	A Place for Love Mandel Brot	talfrac	0	2396	April 14, 2010, 03:48:37 PM by talfrac
	GCN ISA Assembler / AMD_IL Error Checker Programming	real_het	0	1686	November 15, 2012, 04:38:39 PM by real_het
	mandelbulb3D and CUDA Programming	scavenger	12	8771	May 08, 2013, 01:25:50 PM by elphinstone
	CUDA Y.A.M.Z Programming « 1 2 ... 5 6 »	3dickulus	75	18916	January 27, 2015, 02:38:01 AM by 3dickulus
	Anyone played with Arrayfire ? (CUDA/OpenCL/CPU) Programming « 1 2 »	ker2x	18	14933	February 16, 2016, 11:35:27 AM by ker2x

END OF AN ERA, FRACTALFORUMS.COM IS CONTINUED ON FRACTALFORUMS.ORG

it was a great time but no longer maintainable by c.Kleinhuis contact him for any data retrieval,
thanks and see you perhaps in 10 years again

The All New FractalForums is now in Public Beta Testing! Visit FractalForums.org and check it out!

	Welcome, Guest. Please login or register.	March 03, 2026, 06:48:46 AM
		Login with username, password and session length

END OF AN ERA, FRACTALFORUMS.COM IS CONTINUED ON FRACTALFORUMS.ORG

it was a great time but no longer maintainable by c.Kleinhuis contact him for any data retrieval, thanks and see you perhaps in 10 years again

The All New FractalForums is now in Public Beta Testing! Visit FractalForums.org and check it out!

it was a great time but no longer maintainable by c.Kleinhuis contact him for any data retrieval,
thanks and see you perhaps in 10 years again