Welcome to Fractal Forums

Fractal Software => Programming => Topic started by: 3dickulus on January 16, 2014, 06:04:51 PM




Title: CUDA Y.A.M.Z
Post by: 3dickulus on January 16, 2014, 06:04:51 PM

CudaBrot Source code only. (http://www.digilanti.org/cudabrot/)

Resurrecting an old favorite from my collection of antiques...

This program opens a window and just pans and zooms around the Mandelbrot set using the mouse or arrow keys, that's all it does, no fancy stuff (yet), it calculates every pixel every frame, no optimizing other than calculating the x,y values in one pass then feeding them into the mandelbrot routine for the second pass.

I kept it as simple as possible to make it easy to tinker with Qt, CUDA and Fractals, about 25k of code and the really interesting bits are only a couple of hundred lines.

For the novice, this is an easy bit of code to get your head around. It demonstrates how to use Qt's QGLWidget, Mouse and Keyboard Events, Timers, QtDesigner Forms ( : add some menus if you like : ) , how to set variables on the GPU for CUDA kernels from C++, how to setup and access buffers/textures for writing on the GPU and rendering by OpenGL and  how to compile CUDA code in your C++ project.

For the expert, it's a simple bit of code that can be used to test crunching routines on CUDA GPUs.
Two kernels, one fills an array with x,y datas the other one reads data, makes a calculation and stores the result in an RGBA pixel buffer.

Currently only double precision but some preliminary tests with double double and quad double are promising, this is intended as a test bed leading to arbitrary precision on the GPU for calculating fractals.

The default start coords are Tick-Tock from dinkydau (http://www.fractalforums.com/images-showcase-(rate-my-fractal)/tick-tock/), if you get this compiled just hit the "+" key on the keypad to zoom in, on a good GPU it only takes a couple of seconds to exhaust the limits of 64bit double precision at about 35-40 ms/frame. For a benchmark execute from the commandline with "benchmark" as the only option, you should see something like...
Code:
> Device 0: < GeForce GTX 760 >, Compute SM 3.0 detected
Benchmark:
        Max iterations:         1024
        Number of frames:       1000
        Avg msec per frame:     8.979000

An interesting experience seeing my crusty old routines crunching images in milliseconds what used to be minutes...

If you can write a doodler like this...

Code:
  uint   count = 0;
  double p = r;
  double q = i;
  double a;

  do  {
    if ( p * p + q * q > divergence ) break;

    a  =  p;
    p +=  q;
    p *= (a - q);
    q *= (a + a);
    p +=  r;
    q +=  i;
  }
  while (count++ < maxiter );

  return (count >= maxiter) ? 0 : ( count % maxcol + 2 );

...then you can test it on nVidia GPUs in this little proggie.

My goal is to learn how to implement some of the optimizations to be found here (on FractalForums) in high precision GPU code.



Title: Re: CUDA Y.A.M.Z
Post by: Adam Majewski on January 16, 2014, 07:11:40 PM
Could you write smth about CUDA installation ? This is still the first problem for me. TIA


Title: Re: CUDA Y.A.M.Z
Post by: 3dickulus on January 16, 2014, 07:47:19 PM
I'm using...

SuSE 12.3
kernel 3.7.10-1.24-desktop
x86_64 GNU/Linux

you?


Title: Re: CUDA Y.A.M.Z
Post by: 3dickulus on January 16, 2014, 08:07:15 PM
https://developer.nvidia.com/cuda-downloads has install packages and "getting started" guides for win & lin

for linux you install the rpm according to the RPM / DEB Installation Instructions (http://docs.nvidia.com/cuda/cuda-getting-started-guide-for-linux/index.html#package-manager-installation)

for windows check the Windows Getting Started Guide (http://docs.nvidia.com/cuda/cuda-getting-started-guide-for-microsoft-windows/index.html)

you can also add these repositories via your linux package manager...
http://developer.download.nvidia.com/compute/cuda/repos/opensuse122/x86_64 ( 12.2 package works for 12.3 too)
http://download.nvidia.com/opensuse/12.3


Title: Re: CUDA Y.A.M.Z
Post by: Adam Majewski on January 18, 2014, 02:06:43 PM
Linux 3.11.0-15-generic  
x86_64
Ubuntu 13.10
NVIDIA-SMI 5.319.60   Driver Version: 319.60 
gcc version 4.8.1 (Ubuntu/Linaro 4.8.1-10ubuntu9)


I have tried :

sudo bash cuda_5.5.22_linux_64.run

result :


Do you accept the previously read EULA? (accept/decline/quit):  accept
You are attempting to install on an unsupported configuration. Do you wish to continue? ((y)es/(n)o) [ default is no ]: y
Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 319.37? ((y)es/(n)o/(q)uit): n
Install the CUDA 5.5 Toolkit? ((y)es/(n)o/(q)uit): y
Enter Toolkit Location [ default is /usr/local/cuda-5.5 ]:
Install the CUDA 5.5 Samples? ((y)es/(n)o/(q)uit): y
Enter CUDA Samples Location [ default is /home/a/NVIDIA_CUDA-5.5_Samples ]:
   Unsupported compiler: 4.8.1
Missing recommended library: libXmu.so

Cannot find Toolkit in /usr/local/cuda-5.5

===========
= Summary =
===========

Driver:   Not Selected
Toolkit:  Installation Failed. Using unsupported Compiler.
Samples:  Cannot find Toolkit in /usr/local/cuda-5.5


Logfile is /tmp/cuda_install_10291.log

??
My gcc i to new ?



Title: Re: CUDA Y.A.M.Z
Post by: 3dickulus on January 18, 2014, 04:54:55 PM
NVIDIA-SMI 5.319.60   Driver Version: 319.60  

You have the same driver I do, I am using GCC 4.7

Some have had success with Ubuntu, review this posting for instructions

http://askubuntu.com/questions/380609/anyone-has-successfully-installed-cuda-5-5-on-ubuntu-13-10-64-bit

hopefully this will work for you :)

summary: The 319 driver works with the Ubuntu 12.10 deb package and GCC 4.7.2 or less


Title: Re: CUDA Y.A.M.Z
Post by: 3dickulus on January 18, 2014, 06:31:47 PM
after a little googling these are the simplest instructions I have found but I don't have Ubuntu so  I can't test...

From:http://installion.co.uk/ubuntu/saucy/multiverse/n/nvidia-cuda-toolkit/install.html
Quote
Check the multiverse repository is enabled.

Inspect /etc/apt/sources.list using your favourite editor with sudo which will ensure that you have the correct permissions.

~> sudo gedit /etc/apt/sources.list

Ensure that "multiverse" is included.

After any changes you should run this command to update your system.

~> sudo apt-get update

Install nvidia-cuda-toolkit

~> sudo apt-get install nvidia-cuda-toolkit

Which will install nvidia-cuda-toolkit and any other packages on which it depends.

You can always remove nvidia-cuda-toolkit again by following the instructions at this link... http://installion.co.uk/ubuntu/saucy/multiverse/n/nvidia-cuda-toolkit/uninstall.html



Title: Re: CUDA Y.A.M.Z
Post by: Adam Majewski on January 18, 2014, 09:21:08 PM
Thx. It works now :

nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2012 NVIDIA Corporation
Built on Fri_Sep_21_17:28:58_PDT_2012
Cuda compilation tools, release 5.0, V0.2.1221


Title: Re: CUDA Y.A.M.Z
Post by: 3dickulus on January 18, 2014, 09:31:55 PM
Great :)

just curious, which method worked?


Title: Re: CUDA Y.A.M.Z
Post by: Adam Majewski on January 18, 2014, 10:42:03 PM
Great :)

just curious, which method worked?


sudo apt-get install nvidia-cuda-toolkit



Title: Re: CUDA Y.A.M.Z
Post by: 3dickulus on January 22, 2014, 07:19:09 PM
an interesting pair found in jul-man hybrid


Title: Re: CUDA Y.A.M.Z
Post by: 3dickulus on January 24, 2014, 05:37:37 PM
accessing an array on gpu like
Code:
index=y*width+x;
n = array[index]

is slower than accessing like
Code:
index=y*width
n = array[index+x]

       I think it's because the compiler optimizes "index" as an incremented register (haven't checked assembler output)

found another small speedup by reducing the iterate function to 2 muls

Code:

    do
    {
        a  = p;
        p += q;
        p *= (a - q);
        q *= (a + a);
        p += r;
        q += i;
        if( ( p > 2.0 || p < -2.0 ) && ( q > 2.0 || q < -2.0 ) )
            return ( count );
    }
    while (count++ < maxiter );

    return 0;


from 8.9 ms/f to 7.6 ms/f  11.5% faster (in my dumb benchmark)

every bit helps :)


Title: Re: CUDA Y.A.M.Z
Post by: JohnVV on January 29, 2014, 10:39:40 PM
Also one thing to keep in mind is a lot of distros are using the Nouveau driver BY DEFAULT
and replacing it with the Nvidia.run ( or the Akmod-nvidia on rpm based disrtos)
or fallowing the SuSE "" the repo way" or "the hard way" is not just a few mouse clicks


at least on the rpm systems i am used to
so people might want to reread the linux forums for your OS and MAKE 100% sure
that the nvidia driver is being used and NOT the nouveau  driver


Title: Re: CUDA Y.A.M.Z
Post by: 3dickulus on January 29, 2014, 11:37:50 PM
a good point, thanks for mentioning it  :)

I've made the bold assumption that anyone browsing this board (and this thread) would have the inclination to install the drivers and dev stuff before looking for code to play with, but if not, I'm always willing to help get coders compiling.

it also might be worth mentioning that http://developer.download.nvidia.com/compute/cuda/repos/opensuse122/x86_64/cuda-repo-opensuse122-5.5-0.x86_64.rpm is not the dev package itself but installs a repo that contains all the right stuff for SuSE 12.2 and does work with 12.3, in fact it should work with any RH type rpm install system.





Title: Re: CUDA Y.A.M.Z
Post by: 3dickulus on January 30, 2014, 05:39:01 PM
w00t!!!

I have managed to get GARPREC compiled and working as a linkable library, simple test results show it's good up to about 10,000 decimal places, wow, that's a lot of resources for one number...

Hacked it into CudaBrot and here's some results, the first render was way off, processing arrays in one fell swoop, it was only doing one iteration I think, a bit of hacking and voila, we have mandelbrot in a blink, oops, it's sideways, then tried a couple of zooms on tictoc, one at 1e-14 and another at 1e-91  but as you can see from the image there is a bug somewhere, not enough bits when calculating the Re/Im lookup array?


Title: Re: CUDA Y.A.M.Z
Post by: ker2x on February 03, 2014, 11:44:00 AM
nvcc fatal   : Unsupported gpu architecture 'compute_30'

Look like i need some software upgrade  :sad1:


Title: Re: CUDA Y.A.M.Z
Post by: ker2x on February 03, 2014, 11:54:05 AM
accessing an array on gpu like
Code:
index=y*width+x;
n = array[index]

is slower than accessing like
Code:
index=y*width
n = array[index+x]

       I think it's because the compiler optimizes "index" as an incremented register (haven't checked assembler output)

I don't know where you found this in the source code (did a quick grep) but
Code:
n = array[y*width+x]
should be event faster, to avoid Read-after-write latency.
And since it's a fused multiply-add :
Code:
n = array[fma(y,width, x)]

no ?
(sorry, can't try it right now)


Title: Re: CUDA Y.A.M.Z
Post by: 3dickulus on February 03, 2014, 02:04:04 PM
in the CMakeLists.txt file you can adjust the "GENCODE" option from 30 to 13  :)

in "simplerender"

Code:
    // map from threadIdx/BlockIdx to pixel position
    int x = threadIdx.x + blockIdx.x * blockDim.x;
    int y = threadIdx.y + blockIdx.y * blockDim.y;
    // offset into output buffer
    int offset = (y*width);
    int clu;

    clu = fracfunc(lookup.x[offset+x], lookup.y[offset+x], d_divergence, d_maxiter, d_maxcol);

    // set pixel to color
    pixels[offset+x] = clookup[clu];


just speculating on my part:

using fma would calculate the value and store in var named offset, then look it up three times
using register would init reg once then for useage it would inc reg once and read reg three times

the speedup was very small, a few uSec

Struct Of Arrays vs Array Of Structs is better too but again not a lot

I haven't tried explicitly specifying fma(y,width, x) and I haven't examined the ptx assembler output to see if the compiler optimizes "x*y+n" as such, I understood that it does from what I've read, but I'll give it a go the next time I'm tinkering with that code.



Title: Re: CUDA Y.A.M.Z
Post by: ker2x on February 03, 2014, 02:48:52 PM
Code:
/usr/local/cuda/include/host_config.h:82:2: error: #error -- unsupported GNU version! gcc 4.5 and up are not supported!
I still need to do some software upgrade anyway  :angry:

In my experience (which is OpenCL, but a GPU is a GPU) using temporary variable to store intermediate reusable result is not always a good idea (as it is with CPU).
Memory access (any memory) is Slowwwwwwwwww while fma is done in 1 cycle (well, 8 fma per cycle actually, in ideal case (i don't remember the details))

i found in my code that using long brute force formula instead of splitting it in small part that could be reused was faster.
i'm busy learning FPGA, didn't played with GPU since a long time. (that's why i need some software update  ;D )


Title: Re: CUDA Y.A.M.Z
Post by: 3dickulus on February 03, 2014, 03:35:20 PM
global mem is slowest, __constant__ is faster, __shared__ is supposed to be almost as fast as registers, I'm just taking it for granted that most operations (individually) are going to be as fast or faster than cpu, if your gfx card supports it you can configure __shared__ mem access as 8 bytes instead of the default 4 bytes but this too raises some timing issues when mixing access to floats and doubles.

I'm very new to this GPU stuff and really just tinkering around, I'm sure I'll be making lots of mistakes  :embarrass:

The mandelbrot function in this was originally written on a 16Mhz 68000 Motorola CPU and optimized to compare with the assembler language, I'm sure that it's no where near as fast as more recent methods.


Title: Re: CUDA Y.A.M.Z
Post by: 3dickulus on February 04, 2014, 10:13:57 PM
Code:
/usr/local/cuda/include/host_config.h:82:2: error: #error -- unsupported GNU version! gcc 4.5 and up are not supported!
I still need to do some software upgrade anyway  :angry:

gcc 4.7 works with CUDA 5.5 :)

FPGA looks really interesting, soon harddrives will be obsolete ?


Title: Re: CUDA Y.A.M.Z
Post by: 3dickulus on February 11, 2014, 12:05:06 AM
should be event faster, to avoid Read-after-write latency.
And since it's a fused multiply-add :
Code:
n = array[fma(y,width, x)]

no ?
(sorry, can't try it right now)


finally found some time to test this.

when I try
Code:
n = array[fma(y,width, x)]
I get
Code:
error: expression must have integral or enum type


using...
Code:
index=y*width
n = array[index+x]
Benchmark:
        Number of frames:        1000
        Avg msec per frame:     2.307000

using...
Code:
index=fma(y,width,x)
n = array[index]
Benchmark:
        Number of frames:        1000
        Avg msec per frame:     2.315000

based on this result it seems that in this particular case fma(x,y,z) is not faster than x*y ... +z , fma took 0.008 seconds longer, really small but still...


Title: Re: CUDA Y.A.M.Z
Post by: 3dickulus on February 13, 2014, 09:51:43 PM
using GPU ARPREC (http://www.digilanti.org/cudabrot/garprec_1.2.1.zip) vs CPU ARPREC

Precision = 1000 digits
Iterations = 50000

CenterX=-1.6285250343823361883838262548545045410348661236584141601769446480597742613978652866884534147015438695866330324908358967896639954103891124710858983400930548513556561043206532203421245449026144184088972414350968083884667444419967905346296122369829125808715959589508004033683543856979653688355703357362596797912790215729307519005701670693821446556437233617104789888651797464159254445779960831204398799963531144151826157979487498653952347762098181463615781771842676200742084357646694586924130378944176232745311158540371210889162520250043231951

CenterY=0.0006786726672986807036534783258711032797071726013030485649555004108079612273532854099054877878854548091617523457504272839323659938218011891095623421488662732771512925574351926524067445041101561085174335116653613367712657068886759881133817710332110609908168838538010271006938137369081253790287372868878042144253940125139876166256963471716935289610917988374315494511164693787453719396359779881539576510119495792820935158658401193999362816789422637990442661756991633307202607735211837364599264587661238085446201036331218816037486284880927809

-Z=(distance from center to edge of rendered area) = 2.420740E-530

CPU = 1310.34 sec
GPU = 213.925 sec

No optimizations, estimations or perturbation, just calculate every pixel until we get a value using standard M set calculation.  Xn+1 = (Xn * Xn) + X0


Title: Re: CUDA Y.A.M.Z
Post by: 3dickulus on February 20, 2014, 11:54:26 AM
Something of a milestone for me, using my old code to zoom this deep. So now that I have a gui for low bit (double) zoom render on GPU and garprec for hi precision render on GPU I am going to fiddle around with the SuperFractalThing maths and may try to build a garprec complex data type to use with those formulae.


Title: Re: CUDA Y.A.M.Z
Post by: 3dickulus on March 15, 2014, 11:07:46 AM
Another installment in the YAMZee file....
I've managed to cobble together a working  ??? Qt C++ clone ??? of SuperFractalThing ???

Recap:
1: Really fast double precision GPU Mandel zoomer
2: GpuARbitraryPRECision Mandel zoomer
3: SFT Qt C++ (ported from java source uses ARPREC lib)

Now all I have to do is mash them together into a GPU friendly MandelZoomer  :D

anyone interested in the SFT C++ code just drop me a note, GCC, CMake, ARPREC, Linux required

Disclaimer: highly experimental miscellaneous hackings best described as "brutish butchery"


Title: Re: CUDA Y.A.M.Z
Post by: 3dickulus on March 19, 2014, 02:50:38 AM
Well, after a lot of crashes and weird looking fractal rejects it is starting to stabilize, no more lost 25MB chunks, zoom/pan not choking and seems to be ok with minor rendering tasks, no big zooms yet but now I have some code to play with :) a C++ version of SFT :) my humblest thanks and appreciation to K.I. Martin for the SuperFractalThing.

As always, source code is available. (http://www.digilanti.org/cudabrot/SFTC.zip) It's a little rough, may crash, and probably won't compile out-of-the-box but it does work on my machine :D input is always appreciated, I have some ideas about where to apply GPU code but need to get things more stable ie: checks and balances.



Title: Re: CUDA Y.A.M.Z
Post by: 3dickulus on March 23, 2014, 05:17:24 AM
So...

I have the SFT engine ported from java to C++, shoe horned into a Qt GUI and functional enough to get some deep images, still a few blobs and other issues but it is basically a clone of the java version, the translation to equivalent Qt gnu C++ was a breeze (still needs fine tuning), the thread spawner was probably the hardest part. I think the 2DIndexBuffer class and gnu-threading can be replaced by QImage and QThread stuff. Currently renders to 2DIndexBuffer then converts into a pixmap that gets set as a GL texture, doing it this way so that moving code to the GPU will be easier as GPU has access to texture buffers and this sets up all the GL code I need.

Speed test between the C++ version and the Java version using the attached parameters file @ 1024x768 250000 iterations zoom e-1550

3Dickulus C++ version (ARPREC)

 (warm up)        1st run 58.88 Sec
                          2nd run 57.55 Sec

Java version standalone desktop app downloaded from SFT Sourceforge (http://sourceforge.net/projects/suprfractalthng/)   (BigDecimal)

 (warm up?)      1st run 767.314 Sec
                         2nd run 766.096 Sec


I recall someone here saying "the math libs do all the work so the language doesn't really matter, interpreted or compiled", I beg to differ ;)

Moving this to GPU shouldn't be too hard, some stuff will work while other stuff just won't fly but in the end a LOT of stuff can be offloaded to the GPU.

Java > C++ = 10x faster        I'm hoping the GPU will be 10x faster again :)



Title: Re: CUDA Y.A.M.Z
Post by: 3dickulus on March 26, 2014, 05:28:12 PM
found some interesting things in the source code, removed some bits, optimized a few do-nothing-tests/loops (with the delete key)
I think I've got a stable base to work from so now the plan is to tweak the GUI a little, add a palette twiddler and a settings dialog then start on the GPU/GL part.

this is the deepest I've gone but after spending a couple of hours zooming and panning I think I can say it runs pretty smoothly  :D

attached settings are in SFT format, latest source code is available (http://www.digilanti.org/cudabrot/SFTC.zip)


Title: Re: CUDA Y.A.M.Z
Post by: knighty on April 10, 2014, 03:44:05 PM
Thank you for the c++ port of SFt!  :)
I haven't been able to compile it yet -mostly because of OpenGl extensions which are not handled the same way under win32-, but after reading some of the code I have a (noobish  ;D) question:
- It looks like most of the allocated memory is not freed elsewhere. for example in sftgui.cpp whenever the user selects new, a new QPixmap is allocated on the heap. Does Qt provide a grabage collector?


Title: Re: CUDA Y.A.M.Z
Post by: 3dickulus on April 10, 2014, 04:59:47 PM
Qt provides excellent GC, anything allocated (not malloc) in an object is freed when the object is destroyed and if needed you can add cleanup code very easily.
EDIT: if you reuse a pixmap (or any QObject afaik) the old one is destroyed first.

On my system the memory consumption looks like this...

before running...
KiB Mem:   6123220 total,  2733684 used,  3389536 free,   313704 buffers
while running...
KiB Mem:   6123220 total,  2792516 used,  3330704 free,   313780 buffers
after running...
KiB Mem:   6123220 total,  2718576 used,  3404644 free,   313864 buffers

there seems to be a few k extra after a run probably due to tossing out some firefox caches or something

I have made some changes since posting the code... separated Engine from GUI, added fractional iteration count, map iteration count to frequencies ie:380-780 angstom units (just for fun), color map has 1024 places.

I have been fiddling with the code a lot because I want to make sure that it's as stable as possible before trying to move it to the GPU, recently had this error when increasing zoom past E-2023

------------------------------
*** MPROUN: Exponent overflow.
*** mpabrt: execution terminated, error code =69
Segmentation fault
------------------------------

I'm in the process of tracking that down, please be aware that this port is only a hack'n'chop job to get the engine running, the GL stuff is not required but the intent is to have the GPU writing the texture buffer , it can just as easily map the pixmap as widget contents directly. A bonus from using QGLWidget is that it exploits hardware multisampling :)




Title: Re: CUDA Y.A.M.Z
Post by: knighty on April 10, 2014, 07:03:45 PM
Thanks a lot.

recently had this error when increasing zoom past E-2023

------------------------------
*** MPROUN: Exponent overflow.
*** mpabrt: execution terminated, error code =69
Segmentation fault
------------------------------
Maybe because the number of digits is set to 2048 in main.cpp


Title: Re: CUDA Y.A.M.Z
Post by: 3dickulus on April 10, 2014, 07:15:43 PM
anticipated that and set it to 3192 @ around E-1800

I suspect it's in FillInCubic()


Title: Re: CUDA Y.A.M.Z
Post by: knighty on April 10, 2014, 09:52:29 PM
Ok! ^-^
Finally, it compiled successfully using Qt5.2. I used QOpenGLFunctions for OpenGL extensions.
Nice application!


Title: Re: CUDA Y.A.M.Z
Post by: 3dickulus on April 10, 2014, 11:46:15 PM
cool :)

re: FillInCubic()

extra_exponent normally ticks down to 0 and all is well, the problem is when converting from AP number to DOUBLE number, ARPRECs mp_real::mpmdc(); function is supposed to take care of this...

 /// from mp_real to double source code
  /**
   * This procedure takes the mp_real A, and splits it into
   * a double, b, and a exponent, n.
   *
   * On exit, the following should be roughly true:
   *
   *       a ==(roughly) b*2^n
   */
/// from mp_real to double source code

for some reason "n" = -21474836481 when extra_exponent reaches around 101 ??? before this "n" fluctuates between 0 and -720

I also had used mp_real::n_digits instead of mp_real::n_precwords in the mpmdc() but that didn't seem to have any effect, should be "1" erroneous values ignored?



Title: Re: CUDA Y.A.M.Z
Post by: 3dickulus on April 12, 2014, 08:16:57 AM
cool :)

re: FillInCubic()

extra_exponent normally ticks down to 0 and all is well, the problem is when converting from AP number to DOUBLE number, ARPRECs mp_real::mpmdc(); function is supposed to take care of this...


... it was the Repeater test in CalculationManager class   :embarrass:


Title: Re: CUDA Y.A.M.Z
Post by: 3dickulus on April 16, 2014, 04:24:45 PM
I see Pauldebrot has come up with something re:glitches  (http://www.fractalforums.com/announcements-and-news/pertubation-theory-glitches-improvement/) that looks really interesting, in the mean time, here's some of my results, partial glitch solving.
(gawdy colors accentuate blobs)



Title: Re: CUDA Y.A.M.Z
Post by: 3dickulus on May 14, 2014, 09:02:04 PM
I think I might be on to something...

this is DD's "Flake" location rendered in 309.6 sec with my C++ port of SFT, reference point picked automagically. I have been working on getting the engine as tight as I can before I start playing with a CUDA version, it might be a while and may be abandoned for a whack at porting kallesfraktaler2 to Qt/linux :) I haven't posted the latest version of this code yet but if you want to play with it let me know and I'll make some time to zip it up.


Title: Re: CUDA Y.A.M.Z
Post by: 3dickulus on June 09, 2014, 12:57:04 AM
In porting SFT Java to C++ I'm reasonably sure that it's good because the C++ version reproduces all of the same glitches that the Java version shows. But before trying to implement glitch detection and correction I thought I would try a slightly different approach, that is, glitch avoidance through improved accuracy...

This code computes the magnitude of a complex number and avoids overflow, returns between 0 and inf, I have used this routine to replace (x*x)+(y*y) in the Details class and the Approximation class.  ( in theory this shouldn't work??? because it returns sqrt(x*x+y*y) but I'll let the result speak for itself )

Code:
/// compute the magnitude of a complex number.
double Approximation::cMag(double re, double im)
{
    double r;

    re = fabs(re);
    im = fabs(im);

    if (re > im) {
        r = fabs(im/re);
        return re*sqrt(1.0+r*r);
    }

    if (im == 0.0)
        return 0.0;

    r = fabs(re/im);
    return im*sqrt(1.0+r*r);
}

I also replaced (x*x)-(y*y) with (x-y)*(x+y)  because...
"The expression x*x - y*y is more accurate when rewritten as (x - y)*(x + y) because a catastrophic cancellation is replaced with a benign one."

These two changes allow rendering of "Flake" (above) and "Polished Emerald" (below) much more accurately when compared with the original SFT code. I wish I had more time and brains to dedicate to this but so far so good :)

Edit: yes I know it's not perfect but it's better than the blob that the java code rendered :)


Title: Re: CUDA Y.A.M.Z
Post by: 3dickulus on August 19, 2014, 02:12:58 PM
for the latest installment in the Y.A.M.Z. file I would like to thank knighty for the win version of arprec lib (http://crd.lbl.gov/~dhbailey/mpdist/arprec-2.2.17.tar.gz) and help with getting the QtCreator project to compile and run on windows  :beer: :beer: :beer:

finally have Pauldebrot's glitch thing working, well, my version of it at least. :D

SFTC63 (http://www.digilanti.org/cudabrot/) yeah that's right, it took me 63 versions to go from the original java to Qt C++ and add auto glitch detection, still needs a few more incarnations but I'm very happy with it so far.


Title: Re: CUDA Y.A.M.Z
Post by: knighty on August 19, 2014, 09:17:29 PM
You are welcome.  :D
And thank you for the new version. I think it would be nice to provide a compiled version for those who don't want to install Qt and compile the project by themselves.


Title: Re: CUDA Y.A.M.Z
Post by: 3dickulus on August 20, 2014, 12:53:32 AM
@knighty: I do understand that everyone can't just throw together a dev box to compile and run this code so I have posted your win bin version on my website. (http://www.digilanti.org/cudabrot/) I don't usually like to distribute Windows binaries created with Qt because of the required DLLs, mine may not be the same as yours so they must be included in the zip. I would much rather encourage all to figure out how to compile the Qt project so each has a version that is optimized for their machine/arch/OS. The thing is, that the code changes soooo fast that there are tweaks (yes already) that are not in the win bin vers, like, adjusted the glitch detection range (will probably add a menu/option for user adjustment) because as a value bound to the screen size it gets sloppy with small 256x192 or large images > 2048x1536 :)

Another reason for preferring a source only dist is that it encourages hacking  :dink:

cheers mate!  :beer:


Title: Re: CUDA Y.A.M.Z
Post by: 3dickulus on August 20, 2014, 04:48:11 AM
for anyone hacking the source...

1.  in file calculationmanager.cpp line 309 the count vs screensize test is working nicely like this...
Code:
            if(cnt > (mBuffer->GetWidth() >> 8)+1 )
            {
                newXOffset=x;
                newYOffset=y;
                break;
            }

at a screen width of 256 this is checking for 2 adjacent pixels flagged in the previous pass as failed, any less and we get no change.
edit:better but still not quite right, logic:blob on 512 screen would be twice as wide as on a 256 screen...

2. in file approximation.cpp line 409 the glitch flag test is working nicely @ < 1.0E-12 and  > 1.0E6

I also changed it so that the next search for a reference starts looking where the last search found one... the possibilities are endless :)



Title: Re: CUDA Y.A.M.Z
Post by: knighty on August 20, 2014, 09:43:30 PM
Ooops! I forgot to include "imageformats" and "platforms" directories to the package. I'll send you the updated package ASAP.


Title: Re: CUDA Y.A.M.Z
Post by: 3dickulus on August 20, 2014, 10:18:53 PM
Ooops! I forgot to include "imageformats" and "platforms" directories to the package. I'll send you the updated package ASAP.
...and that's why I like to encourage windows users to install Qt+MinGW, with that you can just load the project, compile it and make any adjustments you want to the GUI or source code.

a real bonus is the tool kit that comes with Qt, Creator is a good IDE (handles other languages too) and Designer is really awesome for building GUIs, hardly have to type anything :)

the "imageformats" and "platforms" that I have installed on the Win7 box probably won't work so if you are up for putting together the Windows package I'll host it on my server.

I am currently fiddling with glitch detection, peanuts, large antlers and kidneys are easy but tiny antlers and single pixels are a bit more involved, v63 (currently on web site) is pretty good but needs improvement. Sometimes find a single pixel and re-render only find it's still there, I think it's because the x,y coords of a pixel are too coarse to nail the reference that may be at a sub-pixel level when working with lower screen resolutions, maybe need to try a couple of locations within a pixel to get the right ref. hack hack hack...


Title: Re: CUDA Y.A.M.Z
Post by: 3dickulus on August 21, 2014, 11:52:59 AM
The latest is SFTC64 (http://www.digilanti.org/cudabrot/) with much improved glitch detection/correction... hack hack hack  :dink:

edit: I just finished running the comparison tests and v64 is a little faster than v63 :D

I think in the CUDA version I will be able to use double double type in place of long double as GPU only does 64 bit math and not 80 bit, but that just means it will have more range before going to higher precision functions, fortunately GArPrec has DD and QD types for running on the GPU.
GCC also has libquadmath.a that has __float128 and __complex128 types.

some values from quadmath.h...
Code:
#define FLT128_MAX 1.18973149535723176508575932662800702e4932Q
#define FLT128_MIN 3.36210314311209350626267781732175260e-4932Q
#define FLT128_EPSILON 1.92592994438723585305597794258492732e-34Q
#define FLT128_DENORM_MIN 6.475175119438025110924438958227646552e-4966Q
#define FLT128_MANT_DIG 113
#define FLT128_MIN_EXP (-16381)
#define FLT128_MAX_EXP 16384
#define FLT128_DIG 33
#define FLT128_MIN_10_EXP (-4931)
#define FLT128_MAX_10_EXP 4932
...adds the "Q" designation for using this type of number.


Title: Re: CUDA Y.A.M.Z
Post by: knighty on August 23, 2014, 04:06:50 PM
(...) with that you can just load the project, compile it and make any adjustments you want to the GUI or source code.
Unfortunately, that's not that obvious for most people.  :-\

I am currently fiddling with glitch detection, peanuts, large antlers and kidneys are easy but tiny antlers and single pixels are a bit more involved, v63 (currently on web site) is pretty good but needs improvement. Sometimes find a single pixel and re-render only find it's still there, I think it's because the x,y coords of a pixel are too coarse to nail the reference that may be at a sub-pixel level when working with lower screen resolutions, maybe need to try a couple of locations within a pixel to get the right ref. hack hack hack...
Or use the "find best reference point" feature again?


Title: Re: CUDA Y.A.M.Z
Post by: 3dickulus on August 23, 2014, 11:31:08 PM
am using the rendered data by flagging pixels with a value that would never be rendered and scanning the buffer for that value, this means I can accumulate results, so for single pixels I think a 3rd routine specifically for that will be required, currently using straight line method for finding large blobs, a spiral method for finding smaller shapes, and I am thinking a more brute force method to nail down those rogue pixels, or cheating and blending single pixels with the surrounding 8.


Title: Re: CUDA Y.A.M.Z
Post by: 3dickulus on August 24, 2014, 06:13:20 AM
@knighty

your version of the glitch detection/correction test http://www.fractalforums.com/announcements-and-news/superfractalthing-arbitrary-precision-mandelbrot-set-rendering-in-java/msg64527/#msg64527
seems to be the winner :)

Code:
#define EPSILON 1.0E-9
#define BIGNUM = 1.0/EPSILON;

           if((max(fabs(c-dx),fabs(ci-dxi))/max(fabs(dx),fabs(dxi)) < EPSILON) ||
     (max(fabs(c+dx),fabs(ci+dxi))/max(fabs(dx),fabs(dxi)) > BIGNUM))

v0.65 coming soon...

EDIT:

the above works well but failed on some so I've modified it a bit, not sure if it was a typo in the email you sent me but this seems to work a little better...
Code:
#define EPSILON 1.0E-15
#define BIGNUM = 1.0/EPSILON;

           if(( min (fabs(c-dx),fabs(ci-dxi))/ max (fabs(dx),fabs(dxi)) < EPSILON) ||
    ( max (fabs(c+dx),fabs(ci+dxi))/ min (fabs(dx),fabs(dxi)) > BIGNUM))

my value of 1.0E-9 was ok for the way I had it before (v64) but not this way

still testing...


Title: Re: CUDA Y.A.M.Z
Post by: knighty on August 24, 2014, 11:40:57 PM
I did some tests and it seems that:
Code:
if(modulus<1.E-10)
Gives the same results.

It also seems like SFTC is neglecting small islands of marked pixels.


Title: Re: CUDA Y.A.M.Z
Post by: 3dickulus on August 25, 2014, 12:11:16 AM
I'll have to check that, so to be clear, check modulus var against EPSILON instead of ...
Code:
if((max(fabs(c-dx),fabs(ci-dxi))/max(fabs(dx),fabs(dxi)) < EPSILON) || 
      (max(fabs(c+dx),fabs(ci+dxi))/max(fabs(dx),fabs(dxi)) > BIGNUM))
that would speed things up a bit, maybe making room for tighter test length for a given area.

cheers  :beer:


Title: Re: CUDA Y.A.M.Z
Post by: 3dickulus on August 25, 2014, 01:59:07 AM
I did some tests and it seems that:
Code:
if(modulus<1.E-10)
Gives the same results.

It also seems like SFTC is neglecting small islands of marked pixels.
:D testing like that is how I found Pickover's stalks :D

I have set EPSILON at 1.0E-15 with the current test and it either catches more or makes fewer by iterating a bit more, v0.65 posted on my website has a better results than v0.64,
I do think there must be a way, after approximating some iterations and before iterating the final amount, to predict for trending toward EPSILON or BIGNUM,
I will be trying the above this evening and I'll post any results.

The islands missed might be due to testing for areas larger than 20 pixels and then immediately dropping down to 1 pixel and trying to find the highest iteration count for a new reference, maybe decrementing the testLength by n pixels each pass until testLength reaches 1 pixel. I also found that checking for 1 pixel and making that a reference may or may not fix that pixel.

PS frequency, amplitude, ramping, offset RGB palette dialog is nearly done  :dink:


Title: Re: CUDA Y.A.M.Z
Post by: 3dickulus on August 25, 2014, 07:09:43 AM
@knighty it does work but not as well as your other longer test, I haven't tested extensively but on the files in the glitches folder it leaves more unresolved and not flagged.
I also note that many of the files in the glitches folder just need a higher iteration_limit and most spots get resolved, it's not perfect but working quite well  :)

here's a preview of the palette widget I resurrected from my antique fractal prog, the hidden tab is the same but for offset control, the visible tab shows the frequency, amplitude and ramping (rotation angle) slider controls for RGB spreads, the mouse is used to draw lines or portions of lines to alter parts of the sin wave.


Title: Re: CUDA Y.A.M.Z
Post by: knighty on August 26, 2014, 11:50:15 PM
Wow! That palette editor looks good.

@knighty it does work but not as well as your other longer test, I haven't tested extensively but on the files in the glitches folder it leaves more unresolved and not flagged.
I think that's just a matter of epsilon.

Reason why I suggested the simple test: c is constant inside the loop. As it's name suggest it is the 'c' value of the current pixel. So the (long) test consist on checking when d is close to 0 (so we go past BIGNUM) or close to c. But whenever d is near 0, it's next iterate will necessarily be near c. So, in principle, it's not necessary to do both tests... just test when d is close to 0. :)

IIRC, in Pauldelbrot's test formula:
|zn+dn|/|zn|;
zn is the reference and dn is the delta.

This is a little bit different from your test. Some of the glitches, the most difficlut to spot because they give believable results, happen very soon in the iteration process.
Also, Some glitches are due to the series approximation.

I also note that many of the files in the glitches folder just need a higher iteration_limit and most spots get resolved, it's not perfect but working quite well  :)
Yes. Is it possible to detect automatically such cases?


Title: Re: CUDA Y.A.M.Z
Post by: 3dickulus on August 27, 2014, 05:37:21 AM
The palette editor is from my original fractal prog, FracForm, written on my Commodore Amiga using SAS-C compiler, the last incarnation was 2001, I really like it because you can get amazing palettes with a few clicks and twiddles, originally 256 colors so I'm trying to adapt it to 32 bit color.

 
Quote
Yes. Is it possible to detect automatically such cases?
:gum: :yes: :laugh: this is where my math skills prove I'm a high school drop-out :laugh: :yes: :gum:

test for insufficient iteration_limit ? I'm not sure, the only test I can think of is if exceeding then bump it up, but that could go on for ever...

I think the best I've heard is from claude..

What I do in mightymandel[1] is to compute the reference in parallel with the pixel iterations, interleaved in blocks - so I compute the next N reference values, then step all the pixels N iterations, and repeat.  This way I don't need to guess how many iterations of the reference might be needed in advance, and I don't have to store the whole reference orbit in memory at once.

[1] https://gitorious.org/maximus/mightymandel (specifically https://gitorious.org/maximus/mightymandel/source/HEAD:src/fpxx_step.c#L106 )

seems a most efficient method.

this pic is a test using a combination, basically testing all variable results inside the loop to be within the range EPSILON to BIGNUM, so checking modulus and using the long test, I know it seems redundant but I think that using just one test will still miss things as each test will have a slightly different dynamic under different conditions.


Title: Re: CUDA Y.A.M.Z
Post by: 3dickulus on August 28, 2014, 09:08:21 PM
The palette editor is in V0.66 (http://www.digilanti.org/cudabrot/) not loading and saving yet but seems to work.

My storage scheme for palette data is this: use an image where the width is the number of colors in the palette and concatenate additional palettes as lines of the image, so, in theory, you can use any line in any image as a palette, maybe even transition between each line/palette over n frames...

...just a thought  :dink:


Title: Re: CUDA Y.A.M.Z
Post by: 3dickulus on August 28, 2014, 10:01:53 PM
hmm.. seems to be some differences between the way win and lin handle window modality and GL refresh, sometimes the screen is not updated properly, try hitting Refresh again if it's blank or no change after loading a settings file or palette adjustments... hack hack hack...


Title: Re: CUDA Y.A.M.Z
Post by: knighty on August 28, 2014, 11:29:28 PM
Looks good! downloading right now...
BTW, There is a problem when trying to zoom past 1.E308 or so. That was very difficult to find where things are going wrong. It looks like extra exponent is not handled correctly in the engine part. I could fix the zoom (and pan) problem by modifying the test here:
Code:
void Details::FillInCubic( mp_real pX, mp_real pY, int aIteration_limit, double aActual_width, int aSize_extra_exponent, double aScreen_offset_x, double aScreen_offset_y )
(...)

                    // The initial size was scaled up.
                    // Now we adjust the ABC coefficients to compensate for the scale.
                    // This will affect the accuracy test, and may allow us to continue
                    do
                    {
                        extra_exponent--;
                        A *= 0.10000000149011612D;
                        (...)
                        csq *= 9.999999747378752E-5D;

                        width *= 0.10000000149011612D;
                    } while (extra_exponent>0 && (csq >=accuracy*accuracy*asq) && csq!=0.0D);
The test was extra_exponent>=0 which introduces an extra 10 scale factor. I don't think it solves everything  :-\. For example what happens if A,B and C are not scaled. Maybe, it is necessary to verify more in depth.

Feature request(s)    :embarrass:
- Make it update rendering automatically when double-clicking, loading, changing resolution... etc.
- import .kfr files. (SFTC's scale factor is just 4 times the inverse of KF's scale).
 
Some (manually) converted .kfr locations attached (and last Dinkydau's location that SFTC failed to render).


Title: Re: CUDA Y.A.M.Z
Post by: 3dickulus on August 29, 2014, 12:39:50 AM
Added Auto Refresh to the options menu ;)

I have pretty much left the internals of the engine alone, a little tinkering while converting over to ArPrec from the java version but since using long double it has been reasonably well behaved.

at the end of glwidget.cpp is where the extra_exponent is calculated before starting the engine, it was originally 1.0E-280 and after a little research, double has 52 mantissa bits and absolute max exponent of 324 and smallest accurately representable number = MaxExponent - mantissaBits = 272 so I think that value should be 1.0E-272 but I had been testing at 308 and it didn't seem to make much difference until zooming past E-2200 , I will make the adjustment you suggest re: >= and adjust extraExp test back to E-280 so it's not so close to the lower limit of double.

Quote
For example what happens if A,B and C are not scaled.
they don't need to be scaled when mSize > double MIN ? perhaps past 1.0E-272 small errors become catastrophic due to accumulation so starting the scaling there  would be a good thing ?

tnx for the zip and I'll look at kfr settings conversion.


Title: Re: CUDA Y.A.M.Z
Post by: 3dickulus on August 29, 2014, 07:29:10 AM
Sircle @ 4.0E-650
you had it set at 500,000 so got black screen ? :dink: seems to render fine with my current incarnation, the other locations rendered ok too :O

this is what it looked like with some twiddling by the new palette editor, render time EDIT: 212:42.704 @ 1024x768 (finally!)

 edit: I think it's that bit in the middle 12,868,703 iterations!?!?

          it's getting there  :dink: but seems a bit noisey


Title: Re: CUDA Y.A.M.Z
Post by: 3dickulus on August 29, 2014, 02:53:37 PM
Palette load and save work, it works so well I have to share, v0.67 (http://www.digilanti.org/cudabrot/) is on my website, now housed in 3 files, source, win exe and win exe plus DLLs so if you already have the DLLs or Qt5.3 installed just grab the exe_only file and copy SFTC.exe to the folder with the DLLs.

Also loads .kfr files :D edit:clarification ... loads kfr settings from the main menu under File->Load Settings

the palette dialog will load the first line of any image and use it as an indexed color lookup array and save the current palette as a single line image in the format of your choice. here is an image and the palette that goes with it. each pixel is a key value, they get spread over the color map at the size you choose. edit:clarification up to 16777216




Title: Re: CUDA Y.A.M.Z
Post by: 3dickulus on September 16, 2014, 06:24:57 PM
v0.73 (http://www.digilanti.org/cudabrot/) still need to code a final pass for detecting/correcting single pixels and very small areas... working on it  :dink:

The palette creator is working well, loads the color data from kfr files or image files, when an image is loaded as a palette it uses each line as a new palette for cycling.
Palette cycling mp4 (http://www.digilanti.org/cudabrot/palette.mp4), it shouldn't be too hard to set it up as per frame palette for animation. :D

Edit: saves color palette in .sft settings files  O0


Title: Re: CUDA Y.A.M.Z
Post by: Kalles Fraktaler on September 23, 2014, 05:49:31 PM
Hi 3dickulus

Never meant to ignore you, but this I have totally missed.
I just found this out from the "understanding perturbation" thread, will test it right away :)

(btw, I recall you were complaining about MM requiring a lot of MBs of Java... hmm... Qt=19MB, 47MB unpacked... ;) )


Title: Re: CUDA Y.A.M.Z
Post by: 3dickulus on September 23, 2014, 06:21:16 PM
lol @ MB, yes Qt is bulky and perhaps if I knew java better I would like it more, not meaning to criticize MM, it's a great program, not so much the MBs, it's the virtual machine, I have found  "can't do that" more often in java than in C++ and thus my preference.
for my system JRE=54.8 MB JDK 78.5 MB

lol @ ignore, I'm pretty thick skinned and living in my very own little world, internet and coding are strictly entertainment  :dink:


Title: Re: CUDA Y.A.M.Z
Post by: 3dickulus on September 23, 2014, 11:18:07 PM
O.k. for anyone watching  :dink:  I think SFTC 0.74 (http://www.digilanti.org/cudabrot/) is where I can start looking at the GPU stuff as the GUI/ and Engine/ are at a useable state. (official CPU version beta release?)
Added a final pass for detecting/correcting single pixels and very small areas, only tested at the resolutions in the Options menu with files in the glitches folder but seems to be working quite well.

My foibles and follies while merging this with cudabrot (source code on first page of this thread) will, no doubt, provide amusement and entertainment for all :D

The above link has WinExe + QtDlls + Source zips


Title: Re: CUDA Y.A.M.Z
Post by: Kalles Fraktaler on September 24, 2014, 12:13:25 AM
Nice work!

The glitch solving does not work on this location though, in resolution 1024x768:
Code:
Horizontal Size 2.265005662514156285e-72
Real position:  -1.985548413352953418245678803517026040561987431985854276276406746845711128525331973761520766949399973186380456945
Imaginary Position:  0.0000000000002743067126729694556175287646032154237263455024187599021148863659471173433723750561476242243458678403
Iteration Limit: 11256
Right side is solved correctly but not the rain deer horns at the left...
Why is the image turning grey if the resolution is changed?


Title: Re: CUDA Y.A.M.Z
Post by: 3dickulus on September 24, 2014, 12:22:36 AM
I think I fixed the gray in 0.74? (just posted a few minutes ago) didn't seem to have the problem on my Win7 box.

just testing that location now....


Title: Re: CUDA Y.A.M.Z
Post by: 3dickulus on September 24, 2014, 12:46:56 AM
....is this what it's supposed to look like? rendered@ 256/512/1024/2048 O.k.


Title: Re: CUDA Y.A.M.Z
Post by: 3dickulus on September 24, 2014, 07:30:04 AM
...in the mean time here's a little vid recorded live from my desktop (http://www.digilanti.org/cudabrot/cudabrot.mp4) of cudabrot in action demonstrating pan/zoom with mouse and keyboard using mandelbrot, julia and a hybrid formula in realtime, surely this little bit of code can crunch doubles for the SFTC engine, hack hack hack.


Title: Re: CUDA Y.A.M.Z
Post by: Kalles Fraktaler on September 24, 2014, 08:54:50 AM
The location I am referring to is zoomed in a little.
As you can see the horns at left are solid red.
Lower resolutions solve these glitches :)


Title: Re: CUDA Y.A.M.Z
Post by: 3dickulus on September 24, 2014, 09:00:24 AM
is that with 0.74 ?

edit: try adding about 1000 iterations  :dink:


Title: Re: CUDA Y.A.M.Z
Post by: 3dickulus on September 24, 2014, 01:36:18 PM
oops :embarrass: if there is less than 5 pixels to recalculate it may get stuck in a loop re-rendering the last few pixels
FIX: add line 540 in calculationmanager.cpp right before the line with "emit renderedImage()"
Code:
   if(mRemains) mPixels_to_recalc = 0; // signal finished final pass 
patching and recompiling...


Title: Re: CUDA Y.A.M.Z
Post by: 3dickulus on September 29, 2014, 11:14:24 PM
ok, so I found a few more little bugs, probably caused by getting distracted in the middle of something and then forgetting to tune it back  :embarrass:
v0.75 is up (http://www.digilanti.org/cudabrot/), less hunting and pecking, a little faster? renders all files in the glitches folder at all preset resolutions,
added some openmp stuff in IndexBuffer.cpp but that part doesn't compile for Win (yet)


Title: Re: CUDA Y.A.M.Z
Post by: 3dickulus on October 11, 2014, 04:51:11 PM
ok, so I found a few more little bugs, probably caused by getting distracted in the middle of something and then forgetting to tune it back  :embarrass: didn't notice a couple of things are different when running in the QtCreator sandbox vs direct from the desktop.
v0.76 is up (http://www.digilanti.org/cudabrot/)


Title: Re: CUDA Y.A.M.Z
Post by: 3dickulus on November 14, 2014, 07:48:19 AM
Finally! :D
A quick zoom generated with SequenceGen now a part of SFTC
http://vimeo.com/moogaloop.swf?clip_id=111808968
SequenceGen also morphs the palettes :D
http://vimeo.com/moogaloop.swf?clip_id=111804560
and loads kfr files :D added a command line version for batch rendering :D
and a little bit of CUDA code for converting iter data to image

Source code and Win exes here. (http://www.digilanti.org/cudabrot/)


Title: Re: CUDA Y.A.M.Z
Post by: 3dickulus on December 31, 2014, 01:46:35 AM
Happy New Year All ! ! !

my gift to you v0.89 of SFTC is on line (http://www.digilanti.org/cudabrot/).

Cheers! :beer:

edit: the time displayed in the gui is X2 so divide by 2 for actual time, will be fixed in next update


Title: Re: CUDA Y.A.M.Z
Post by: 3dickulus on January 27, 2015, 02:38:01 AM
v0.91 of SFTC is up and running (http://www.digilanti.org/cudabrot/).

console stats with this version from DD's Scircle @ 1024x768 :D

Code:
########################################

Initialise Calculation...
        Details Storage:106056k
        FillInCubic:8:13.493

########################################

Reference Point 1 = 4:5.578
Reference Iterations:49998840   XOff:0.0292969  YOff:0.0898437
        ReFillInCubic: 2:4.684

Reference Point 2 = 0:0.518
Reference Iterations:50240274   XOff:-0.735264  YOff:-0.239156
        ReFillInCubic: 2:5.608

Reference Point 3 = 0:0.534
Reference Iterations:50325223   XOff:-0.707224  YOff:-0.676797
        ReFillInCubic: 2:2.643

Reference Point 4 = 0:0.712
Reference Iterations:50264717   XOff:0.915509   YOff:-0.621909
        ReFillInCubic: 2:3.214

Reference Point 5 = 0:0.516
Reference Iterations:50251112   XOff:0.490445   YOff:-0.702583
        ReFillInCubic: 2:4.884

Total Render Time:212:42.704 @ 1024x768


fixed mouse coords offset and added some more cuda code, just for image & palette processing but sloooooowly moving bits to the GPU.

Cheers! :beer:

EDIT: still need to tweak the sequence generator, a little buggy but it does work, this piece of code is like a pencil drawing, started out as a few lines, features get added, refined, bits get erased and redone, I'm having fun :) use the source! (0.92)