CUDA Y.A.M.Z

ker2x

Fractal Molossus

Posts: 795

Re: CUDA Y.A.M.Z

« Reply #15 on: February 03, 2014, 11:44:00 AM »

nvcc fatal : Unsupported gpu architecture 'compute_30'

Look like i need some software upgrade sad


	Logged

often times... there are other approaches which are kinda crappy until you put them in the context of parallel machines
(en) http://www.blog-gpgpu.com/ , (fr) http://www.keru.org/ ,
Sysadmin & DBA @ http://www.over-blog.com/

ker2x

Fractal Molossus

Posts: 795

Re: CUDA Y.A.M.Z

« Reply #16 on: February 03, 2014, 11:54:05 AM »

Quote from: 3dickulus on January 24, 2014, 05:37:37 PM

accessing an array on gpu like

Code:

index=y*width+x;
n = array[index]

is slower than accessing like

Code:

index=y*width
n = array[index+x]

I think it's because the compiler optimizes "index" as an incremented register (haven't checked assembler output)

I don't know where you found this in the source code (did a quick grep) but

Code:

n = array[y*width+x]

should be event faster, to avoid Read-after-write latency.
And since it's a fused multiply-add :

Code:

n = array[fma(y,width, x)]

no ?
(sorry, can't try it right now)


	Logged

3dickulus

Global Moderator
Fractal Senior

Posts: 1558

Re: CUDA Y.A.M.Z

« Reply #17 on: February 03, 2014, 02:04:04 PM »

in the CMakeLists.txt file you can adjust the "GENCODE" option from 30 to 13

in "simplerender"

Code:

    // map from threadIdx/BlockIdx to pixel position
    int x = threadIdx.x + blockIdx.x * blockDim.x;
    int y = threadIdx.y + blockIdx.y * blockDim.y;
    // offset into output buffer
    int offset = (y*width);
    int clu;

    clu = fracfunc(lookup.x[offset+x], lookup.y[offset+x], d_divergence, d_maxiter, d_maxcol);

    // set pixel to color
    pixels[offset+x] = clookup[clu];

just speculating on my part:

using fma would calculate the value and store in var named offset, then look it up three times
using register would init reg once then for useage it would inc reg once and read reg three times

the speedup was very small, a few uSec

Struct Of Arrays vs Array Of Structs is better too but again not a lot

I haven't tried explicitly specifying fma(y,width, x) and I haven't examined the ptx assembler output to see if the compiler optimizes "x*y+n" as such, I understood that it does from what I've read, but I'll give it a go the next time I'm tinkering with that code.


	Logged

Resistance is fertile...
You will be illuminated!

#B^] https://en.wikibooks.org/wiki/Fractals/fragmentarium

ker2x

Fractal Molossus

Posts: 795

Re: CUDA Y.A.M.Z

« Reply #18 on: February 03, 2014, 02:48:52 PM »

Code:

/usr/local/cuda/include/host_config.h:82:2: error: #error -- unsupported GNU version! gcc 4.5 and up are not supported!

I still need to do some software upgrade anyway angry

In my experience (which is OpenCL, but a GPU is a GPU) using temporary variable to store intermediate reusable result is not always a good idea (as it is with CPU).
Memory access (any memory) is Slowwwwwwwwww while fma is done in 1 cycle (well, 8 fma per cycle actually, in ideal case (i don't remember the details))

i found in my code that using long brute force formula instead of splitting it in small part that could be reused was faster.
i'm busy learning FPGA, didn't played with GPU since a long time. (that's why i need some software update grin

)


	Logged

3dickulus

Global Moderator
Fractal Senior

Posts: 1558

Re: CUDA Y.A.M.Z

« Reply #19 on: February 03, 2014, 03:35:20 PM »

global mem is slowest, __constant__ is faster, __shared__ is supposed to be almost as fast as registers, I'm just taking it for granted that most operations (individually) are going to be as fast or faster than cpu, if your gfx card supports it you can configure __shared__ mem access as 8 bytes instead of the default 4 bytes but this too raises some timing issues when mixing access to floats and doubles.

I'm very new to this GPU stuff and really just tinkering around, I'm sure I'll be making lots of mistakes embarrass

The mandelbrot function in this was originally written on a 16Mhz 68000 Motorola CPU and optimized to compare with the assembler language, I'm sure that it's no where near as fast as more recent methods.


	Logged

Resistance is fertile...
You will be illuminated!

#B^] https://en.wikibooks.org/wiki/Fractals/fragmentarium

3dickulus

Global Moderator
Fractal Senior

Posts: 1558

Re: CUDA Y.A.M.Z

« Reply #20 on: February 04, 2014, 10:13:57 PM »

Quote from: ker2x on February 03, 2014, 02:48:52 PM

Code:

/usr/local/cuda/include/host_config.h:82:2: error: #error -- unsupported GNU version! gcc 4.5 and up are not supported!

I still need to do some software upgrade anyway angry

gcc 4.7 works with CUDA 5.5

FPGA looks really interesting, soon harddrives will be obsolete ?


	Logged

Resistance is fertile...
You will be illuminated!

#B^] https://en.wikibooks.org/wiki/Fractals/fragmentarium

3dickulus

Global Moderator
Fractal Senior

Posts: 1558

Re: CUDA Y.A.M.Z

« Reply #21 on: February 11, 2014, 12:05:06 AM »

Quote from: ker2x on February 03, 2014, 11:54:05 AM

should be event faster, to avoid Read-after-write latency.
And since it's a fused multiply-add :

Code:

n = array[fma(y,width, x)]

no ?
(sorry, can't try it right now)

finally found some time to test this.

when I try

Code:

n = array[fma(y,width, x)]

I get

Code:

error: expression must have integral or enum type

using...

Code:

index=y*width
n = array[index+x]

Benchmark:
Number of frames: 1000
Avg msec per frame: 2.307000

using...

Code:

index=fma(y,width,x)
n = array[index]

Benchmark:
Number of frames: 1000
Avg msec per frame: 2.315000

based on this result it seems that in this particular case fma(x,y,z) is not faster than x*y ... +z , fma took 0.008 seconds longer, really small but still...


	Logged

Resistance is fertile...
You will be illuminated!

#B^] https://en.wikibooks.org/wiki/Fractals/fragmentarium

3dickulus

Global Moderator
Fractal Senior

Posts: 1558

Re: CUDA Y.A.M.Z

« Reply #22 on: February 13, 2014, 09:51:43 PM »

using GPU ARPREC vs CPU ARPREC

Precision = 1000 digits
Iterations = 50000

CenterX=-1.6285250343823361883838262548545045410348661236584141601769446480597742613978652866884534147015438695866330324908358967896639954103891124710858983400930548513556561043206532203421245449026144184088972414350968083884667444419967905346296122369829125808715959589508004033683543856979653688355703357362596797912790215729307519005701670693821446556437233617104789888651797464159254445779960831204398799963531144151826157979487498653952347762098181463615781771842676200742084357646694586924130378944176232745311158540371210889162520250043231951

CenterY=0.0006786726672986807036534783258711032797071726013030485649555004108079612273532854099054877878854548091617523457504272839323659938218011891095623421488662732771512925574351926524067445041101561085174335116653613367712657068886759881133817710332110609908168838538010271006938137369081253790287372868878042144253940125139876166256963471716935289610917988374315494511164693787453719396359779881539576510119495792820935158658401193999362816789422637990442661756991633307202607735211837364599264587661238085446201036331218816037486284880927809

-Z=(distance from center to edge of rendered area) = 2.420740E-530

CPU = 1310.34 sec
GPU = 213.925 sec

No optimizations, estimations or perturbation, just calculate every pixel until we get a value using standard M set calculation. Xn+1 = (Xn * Xn) + X0


	Logged

Resistance is fertile...
You will be illuminated!

#B^] https://en.wikibooks.org/wiki/Fractals/fragmentarium

3dickulus

Global Moderator
Fractal Senior

Posts: 1558

Re: CUDA Y.A.M.Z

« Reply #23 on: February 20, 2014, 11:54:26 AM »

Something of a milestone for me, using my old code to zoom this deep. So now that I have a gui for low bit (double) zoom render on GPU and garprec for hi precision render on GPU I am going to fiddle around with the SuperFractalThing maths and may try to build a garprec complex data type to use with those formulae.

1.2420740E-530.jpg (140.68 KB, 512x512 - viewed 335 times.)
	Logged

Resistance is fertile...
You will be illuminated!

#B^] https://en.wikibooks.org/wiki/Fractals/fragmentarium

3dickulus

Global Moderator
Fractal Senior

Posts: 1558

Re: CUDA Y.A.M.Z

« Reply #24 on: March 15, 2014, 11:07:46 AM »

Another installment in the YAMZee file....
I've managed to cobble together a working huh?

Qt C++ clone huh?

of SuperFractalThing huh?

Recap:
1: Really fast double precision GPU Mandel zoomer
2: GpuARbitraryPRECision Mandel zoomer
3: SFT Qt C++ (ported from java source uses ARPREC lib)

Now all I have to do is mash them together into a GPU friendly MandelZoomer cheesy

anyone interested in the SFT C++ code just drop me a note, GCC, CMake, ARPREC, Linux required

Disclaimer: highly experimental miscellaneous hackings best described as "brutish butchery"

cudabrotSFT.jpg (49.7 KB, 800x651 - viewed 369 times.)
	Logged

Resistance is fertile...
You will be illuminated!

#B^] https://en.wikibooks.org/wiki/Fractals/fragmentarium

3dickulus

Global Moderator
Fractal Senior

Posts: 1558

Re: CUDA Y.A.M.Z

« Reply #25 on: March 19, 2014, 02:50:38 AM »

Well, after a lot of crashes and weird looking fractal rejects it is starting to stabilize, no more lost 25MB chunks, zoom/pan not choking and seems to be ok with minor rendering tasks, no big zooms yet but now I have some code to play with

a C++ version of SFT

my humblest thanks and appreciation to K.I. Martin for the SuperFractalThing.

As always, source code is available. It's a little rough, may crash, and probably won't compile out-of-the-box but it does work on my machine cheesy

input is always appreciated, I have some ideas about where to apply GPU code but need to get things more stable ie: checks and balances.

cudabrot2.jpg (135.41 KB, 600x532 - viewed 344 times.)
« Last Edit: March 26, 2014, 05:32:01 PM by 3dickulus, Reason: url »	Logged

Resistance is fertile...
You will be illuminated!

#B^] https://en.wikibooks.org/wiki/Fractals/fragmentarium

3dickulus

Global Moderator
Fractal Senior

Posts: 1558

Re: CUDA Y.A.M.Z

« Reply #26 on: March 23, 2014, 05:17:24 AM »

So...

I have the SFT engine ported from java to C++, shoe horned into a Qt GUI and functional enough to get some deep images, still a few blobs and other issues but it is basically a clone of the java version, the translation to equivalent Qt gnu C++ was a breeze (still needs fine tuning), the thread spawner was probably the hardest part. I think the 2DIndexBuffer class and gnu-threading can be replaced by QImage and QThread stuff. Currently renders to 2DIndexBuffer then converts into a pixmap that gets set as a GL texture, doing it this way so that moving code to the GPU will be easier as GPU has access to texture buffers and this sets up all the GL code I need.

Speed test between the C++ version and the Java version using the attached parameters file @ 1024x768 250000 iterations zoom e-1550

3Dickulus C++ version (ARPREC)

(warm up) 1st run 58.88 Sec
2nd run 57.55 Sec

Java version standalone desktop app downloaded from SFT Sourceforge (BigDecimal)

(warm up?) 1st run 767.314 Sec
2nd run 766.096 Sec

I recall someone here saying "the math libs do all the work so the language doesn't really matter, interpreted or compiled", I beg to differ wink

Moving this to GPU shouldn't be too hard, some stuff will work while other stuff just won't fly but in the end a LOT of stuff can be offloaded to the GPU.

Java > C++ = 10x faster I'm hoping the GPU will be 10x faster again

1.09550882553E-1550.txt (3.19 KB - downloaded 137 times.)
	Logged

Resistance is fertile...
You will be illuminated!

#B^] https://en.wikibooks.org/wiki/Fractals/fragmentarium

3dickulus

Global Moderator
Fractal Senior

Posts: 1558

Re: CUDA Y.A.M.Z

« Reply #27 on: March 26, 2014, 05:28:12 PM »

found some interesting things in the source code, removed some bits, optimized a few do-nothing-tests/loops (with the delete key)
I think I've got a stable base to work from so now the plan is to tweak the GUI a little, add a palette twiddler and a settings dialog then start on the GPU/GL part.

this is the deepest I've gone but after spending a couple of hours zooming and panning I think I can say it runs pretty smoothly cheesy

attached settings are in SFT format, latest source code is available

1645.jpg (244.44 KB, 1024x768 - viewed 318 times.) 6.76383099125E-1645.txt (3.38 KB - downloaded 127 times.)
	Logged

Resistance is fertile...
You will be illuminated!

#B^] https://en.wikibooks.org/wiki/Fractals/fragmentarium

knighty

Fractal Iambus

Posts: 819

Re: CUDA Y.A.M.Z

« Reply #28 on: April 10, 2014, 03:44:05 PM »

Thank you for the c++ port of SFt!

I haven't been able to compile it yet -mostly because of OpenGl extensions which are not handled the same way under win32-, but after reading some of the code I have a (noobish grin

) question:
- It looks like most of the allocated memory is not freed elsewhere. for example in sftgui.cpp whenever the user selects new, a new QPixmap is allocated on the heap. Does Qt provide a grabage collector?


	Logged

3dickulus

Global Moderator
Fractal Senior

Posts: 1558

Re: CUDA Y.A.M.Z

« Reply #29 on: April 10, 2014, 04:59:47 PM »

Qt provides excellent GC, anything allocated (not malloc) in an object is freed when the object is destroyed and if needed you can add cleanup code very easily.
EDIT: if you reuse a pixmap (or any QObject afaik) the old one is destroyed first.

On my system the memory consumption looks like this...

before running...
KiB Mem: 6123220 total, 2733684 used, 3389536 free, 313704 buffers
while running...
KiB Mem: 6123220 total, 2792516 used, 3330704 free, 313780 buffers
after running...
KiB Mem: 6123220 total, 2718576 used, 3404644 free, 313864 buffers

there seems to be a few k extra after a run probably due to tossing out some firefox caches or something

I have made some changes since posting the code... separated Engine from GUI, added fractional iteration count, map iteration count to frequencies ie:380-780 angstom units (just for fun), color map has 1024 places.

I have been fiddling with the code a lot because I want to make sure that it's as stable as possible before trying to move it to the GPU, recently had this error when increasing zoom past E-2023

------------------------------
*** MPROUN: Exponent overflow.
*** mpabrt: execution terminated, error code =69
Segmentation fault
------------------------------

I'm in the process of tracking that down, please be aware that this port is only a hack'n'chop job to get the engine running, the GL stuff is not required but the intent is to have the GPU writing the texture buffer , it can just as easily map the pixmap as widget contents directly. A bonus from using QGLWidget is that it exploits hardware multisampling

MultiSampled.jpg (28.67 KB, 575x433 - viewed 333 times.) noMultiSample.jpg (36.19 KB, 559x417 - viewed 340 times.)
« Last Edit: April 10, 2014, 05:15:41 PM by 3dickulus, Reason: verbosity »	Logged

Resistance is fertile...
You will be illuminated!

#B^] https://en.wikibooks.org/wiki/Fractals/fragmentarium

Pages: 1 [2] 3 4 ... 6 Go Down

« previous next »

	Author	Topic: CUDA Y.A.M.Z (Read 18912 times)
		Description: Yet Another Mandelbrot Zoomer
0 Members and 1 Guest are viewing this topic.

Related Topics
	Subject	Started by	Replies	Views	Last post
	CUDA Programming with Ruby Programming	ker2x	1	4007	September 28, 2010, 09:55:46 PM by ker2x
	New CUDA developer Meet & Greet	chacharles	9	5375	May 19, 2011, 01:06:00 PM by chacharles
	CUDA Benoit Announcements & News	RogerDahl	7	5571	April 26, 2011, 09:39:00 PM by RogerDahl
	mandelbulb3D and CUDA Programming	scavenger	12	8767	May 08, 2013, 01:25:50 PM by elphinstone
	Anyone played with Arrayfire ? (CUDA/OpenCL/CPU) Programming « 1 2 »	ker2x	18	14929	February 16, 2016, 11:35:27 AM by ker2x

END OF AN ERA, FRACTALFORUMS.COM IS CONTINUED ON FRACTALFORUMS.ORG

it was a great time but no longer maintainable by c.Kleinhuis contact him for any data retrieval,
thanks and see you perhaps in 10 years again

The All New FractalForums is now in Public Beta Testing! Visit FractalForums.org and check it out!

	Welcome, Guest. Please login or register.	March 03, 2026, 12:58:19 AM
		Login with username, password and session length

END OF AN ERA, FRACTALFORUMS.COM IS CONTINUED ON FRACTALFORUMS.ORG

it was a great time but no longer maintainable by c.Kleinhuis contact him for any data retrieval, thanks and see you perhaps in 10 years again

The All New FractalForums is now in Public Beta Testing! Visit FractalForums.org and check it out!

it was a great time but no longer maintainable by c.Kleinhuis contact him for any data retrieval,
thanks and see you perhaps in 10 years again