Welcome to Fractal Forums

Fractal Software => Programming => Topic started by: 3dickulus on October 01, 2014, 03:25:45 PM




Title: OMP vs Par4All
Post by: 3dickulus on October 01, 2014, 03:25:45 PM
3Dickulus test/demo toy

Requires: CUDA SDK + Par4All + OpenMP + GCC

Run from console only, creates a 640x480 .bmp file in the current directory.

Comparing render time for M using...

standard  :  mSec 247.018
   omp      :  mSec 120.571
  cuda      :  msec 000.035

The p4a cuda version is 3,444.88 times faster than omp and 7,057.66 times faster than standard cpu code.

I'm curious if anyone else has played around with Par4All (http://www.par4all.org/download.html)

Here is a zip with 3 versions of C(m)andel (http://www.digilanti.org/cudabrot/candel.zip), source code, make.sh and linux executables, the cuda code was generated from the standard .c code with (virtually) no intervention from me. I think this might be a good way to get specific parts of SFTC crunching on the GPU.

The most interesting thing I found was that after processing with Par4All nvcc is not needed to compile the resulting code, compiles with gcc, but nvcc is required to generate the c and cpp files.




Title: Re: OMP vs Par4All
Post by: 3dickulus on October 01, 2014, 11:47:20 PM
3Dickulus test/demo toy
Based on knighty's Pertubation and 3rd degree Mandelbrot evladraw script (http://www.fractalforums.com/new-theories-and-research/pertubation-and-3rd-degree-mandelbrot/)

Same as above  O0 p(erturbed)mandel (http://www.digilanti.org/cudabrot/pandel.zip)

Comparing render time for 3rdM using...

standard  :  mSec 156.032
       omp  :  mSec  67.784
       cuda  :  msec   0.031

The p4a cuda version is 2,186 times faster than omp and 5,033 times faster than standard cpu compile with -O3

I am very impressed with Par4All  (http://www.par4all.org/download.html) :D

(I think 1 pic will do)


Title: Re: OMP vs Par4All
Post by: claude on February 19, 2017, 12:40:45 PM
par4all seems no longer maintained/supported, 2 years old last version is archived at https://github.com/Par4All/par4all

even so, I'm trying to get it working today, which is proving painful so far (the build process insists on restarting from scratch each time it fails...)

I couldn't compile your p4a'd pandel.c, because of redefinition conflicts between your embedded /usr/include/* and my own /usr/include/x86_64/gccversionblah/* that I was too dumb to figure out so far...


Title: Re: OMP vs Par4All
Post by: 3dickulus on February 19, 2017, 08:09:17 PM
p4a_launcher.cpp and p4a_accel.cpp are generated from...
Code:
p4a -vv --c99 --cuda --nvcc-flags="-gencode arch=compute_10,code=sm_10 -gencode arch=compute_20,code=sm_20" pandel.c -o pandel-cuda

Don't forget to...
Code:
source /usr/local/par4all/etc/par4all-rc.sh
...before using p4a :)

the above needs to happen before running make.sh

/usr/local is the default install location for both CUDA and p4a

the make.sh script only references CUDA and p4a include folders...

/usr/local/cuda/include
/usr/local/par4all/share/p4a_accel
... and the cuda libs folder
/usr/local/cuda/targets/x86_64-linux/lib

there should be no conflicts with these includes and lib folders as there are no refs to gcc version specific folders

it would be best to get the examples working before trying this, just to familiarize and make sure it works, it was some time ago that I did this and haven't maintained the code so I'm not sure if there have been changes to GCC or CUDA that might break p4a :-\


Title: Re: OMP vs Par4All
Post by: DarkBeam on February 19, 2017, 08:13:36 PM
Great!  :beer: 7000 times faster is really cool


Title: Re: OMP vs Par4All
Post by: 3dickulus on February 19, 2017, 08:29:06 PM
p4a is an amazing piece of work, not sure why more people (here) haven't looked into it  :-

edit:

 just looking at p4a again...

installs in /opt/par4all

in candel make.sh -I/usr/local/par4all/share/p4a_accel is some dev headers iirc

going to give it a go and see if I can get the github version to work

I see a gcc 4.45 in the tree, this probably has some specific tweaks for p4a and this particular gcc4.45 executable might have to be used to compile the resulting C code generated by p4a...


Title: Re: OMP vs Par4All
Post by: claude on February 21, 2017, 02:35:31 AM
Ok I got p4a compiled and installed.  The trick was this patch, which needs to be applied twice (!), once to the main source tree and once to the additional gcc that gets downloaded and unpacked during the build process.  Witrhout the patch, I got multiple symbol definition errors, as if the inline definitions weren't really inline....

Code:
diff --git a/packages/pips-gfc/gcc-4.4.5/gcc/toplev.h b/packages/pips-gfc/gcc-4.4.5/gcc/toplev.h
index 2324b068f7..b396ef72e4 100644
--- a/packages/pips-gfc/gcc-4.4.5/gcc/toplev.h
+++ b/packages/pips-gfc/gcc-4.4.5/gcc/toplev.h
@@ -186,6 +186,7 @@ extern int floor_log2                  (unsigned HOST_WIDE_INT);
 #  define CTZ_HWI __builtin_ctz
 # endif
 
+#if 0
 extern inline int
 floor_log2 (unsigned HOST_WIDE_INT x)
 {
@@ -197,6 +198,7 @@ exact_log2 (unsigned HOST_WIDE_INT x)
 {
   return x == (x & -x) && x ? (int) CTZ_HWI (x) : -1;
 }
+#endif
 #endif /* GCC_VERSION >= 3004 */
 
 /* Functions used to get and set GCC's notion of in what directory

needs to be applied to these files:
Code:
par4all/packages/pips-gfc/gcc-4.4.5/gcc/toplev.h
par4all/build/pips/src/Passes/fortran95/gcc-4.4.5/gcc/toplev.h

EDIT: but it doesn't work, syntax error in some pips python code...
Code:
$ p4a -vv --c99 --cuda --nvcc-flags="-gencode arch=compute_10,code=sm_10 -gencode arch=compute_20,code=sm_20" pandel.c -o pandel-cuda
Traceback (most recent call last):
  File "/home/pips/opt/p4a/bin/p4a", line 10, in <module>
    import p4a_process
  File "/home/pips/opt/p4a/lib/python2.7/site-packages/pips/p4a_process.py", line 15, in <module>
    import p4a_processor
  File "/home/pips/opt/p4a/lib/python2.7/site-packages/pips/p4a_processor.py", line 16, in <module>
    import p4a_astrad
  File "/home/pips/opt/p4a/lib/python2.7/site-packages/pips/p4a_astrad.py", line 14, in <module>
    import pyps
  File "/home/pips/opt/p4a/lib/python2.7/site-packages/pips/pyps.py", line 2, in <module>
    from pypsbase import *
  File "/home/pips/opt/p4a/lib/python2.7/site-packages/pips/pypsbase.py", line 3, in <module>
    import pypips
  File "/home/pips/opt/p4a/lib/python2.7/site-packages/pips/pypips.py", line 142
    def user_log(arg1, arg1=None, arg2=None, arg3=None, arg4=None, arg5=None, arg6=None, arg7=None, arg8=None, arg9=None, arg10=None):
SyntaxError: duplicate argument 'arg1' in function definition


Title: Re: OMP vs Par4All
Post by: 3dickulus on February 21, 2017, 03:53:33 AM
using p4a_setup.py I get this error...

p4a_setup: Command 'par4all-p4a/packages/PIPS/pips/configure --prefix=par4all-p4a/refix=/usr/local/par4all lean PKG_CONFIG_PATH=par4all-p4a/refix=/usr/local/par4all/lib/pkgconfig --enable-tpips --enable-pyps --enable-hpfc --enable-fortran95' in par4all-p4a/build/pips failed with exit code 1

...note the mangled path  :-\ looks like p4a_setup.py isn't handling passing arguments to pips/configure properly

I may resort to binary install but I would feel better if I compiled it  :sad1: going to spend free time after work  this week fiddling with this...


Title: Re: OMP vs Par4All
Post by: 3dickulus on March 03, 2017, 05:16:47 AM
yay! got it to compile

@claude I only had to apply the patch (above) once to par4all/packages/pips-gfc/gcc-4.4.5/gcc/toplev.h
   the stuff in the "build" folder gets created from the source tree

 but I had to make a couple of other changes too...

Code:
par4all-p4a/packages/PIPS/pips/src/Libs/effects-convex/utils.c
@@ -2609       ((int (*)()) compare_region_inequalities), NULL);
@@ +2609       ((int (*)()) compare_region_inequalities));

par4all-p4a/packages/PIPS/pips/src/Libs/task_parallelization/instrumentation.c
@@ +4        char *strdup(const char *s);

I really should have documented what I did to get it working the first time :angry:

going to try it out this weekend  :evil1:


Title: Re: OMP vs Par4All
Post by: 3dickulus on March 05, 2017, 02:40:54 AM
I got par4all compiled/installed and candel/make.sh works on the original files, producing executables candel candel-omp and candel-cuda but...
when I run p4a -vv --c99 --cuda candel.c -o candel-cuda it fails after generating candel.p4a.c before generating new p4a_accel.cpp and p4a_launcher.cpp
Code:
candel.p4a.c:23:19: error: redefinition of '__bswap_64'
/usr/include/bits/byteswap.h:109:1: note: previous definition of '__bswap_64' was here
...
candel.p4a.c:23:19: warning: '__bswap_64' defined but not used [-Wunused-function]
commenting out the '__bswap_64' definition in candel.p4a.c allows it to compile  but without generating new p4a_accel.cpp and p4a_launcher.cpp

the first time I did this iirc I didn't make any changes to p4a python code, it was just a couple of things in c files and setting paths

hmmm... maybe a python brain can help find the bit that inserts the '__bswap_64' definition and will let the p4a.py script continue with the process ???


Title: Re: OMP vs Par4All
Post by: claude on March 05, 2017, 03:59:11 AM
yes I gave up after getting *loads* of duplicate definitions causing compilation failures, seems all the include files of the p4a'd program get stuffed into the output C, which then causes problems when the p4a_accel.h includes some of them again.  But I might have done something stupid when editing some python files to make them run without crashing with stack trace dumps...

The syntax error I mentioned is an easy fix, at least when hackily hacked into the SWIG output (the file is generated from some spec, but I didn't dig deep to fix it properly).