3dickulus
|
|
« on: October 01, 2014, 03:25:45 PM » |
|
3Dickulus test/demo toy Requires: CUDA SDK + Par4All + OpenMP + GCC Run from console only, creates a 640x480 .bmp file in the current directory. Comparing render time for M using... standard : mSec 247.018 omp : mSec 120.571 cuda : msec 000.035 The p4a cuda version is 3,444.88 times faster than omp and 7,057.66 times faster than standard cpu code. I'm curious if anyone else has played around with Par4AllHere is a zip with 3 versions of C(m)andel, source code, make.sh and linux executables, the cuda code was generated from the standard .c code with (virtually) no intervention from me. I think this might be a good way to get specific parts of SFTC crunching on the GPU. The most interesting thing I found was that after processing with Par4All nvcc is not needed to compile the resulting code, compiles with gcc, but nvcc is required to generate the c and cpp files.
|
|
|
Logged
|
|
|
|
|
claude
Fractal Bachius
Posts: 563
|
|
« Reply #2 on: February 19, 2017, 12:40:45 PM » |
|
par4all seems no longer maintained/supported, 2 years old last version is archived at https://github.com/Par4All/par4alleven so, I'm trying to get it working today, which is proving painful so far (the build process insists on restarting from scratch each time it fails...) I couldn't compile your p4a'd pandel.c, because of redefinition conflicts between your embedded /usr/include/* and my own /usr/include/x86_64/gccversionblah/* that I was too dumb to figure out so far...
|
|
|
Logged
|
|
|
|
3dickulus
|
|
« Reply #3 on: February 19, 2017, 08:09:17 PM » |
|
p4a_launcher.cpp and p4a_accel.cpp are generated from... p4a -vv --c99 --cuda --nvcc-flags="-gencode arch=compute_10,code=sm_10 -gencode arch=compute_20,code=sm_20" pandel.c -o pandel-cuda Don't forget to... source /usr/local/par4all/etc/par4all-rc.sh ...before using p4a the above needs to happen before running make.sh /usr/local is the default install location for both CUDA and p4a the make.sh script only references CUDA and p4a include folders... /usr/local/cuda/include /usr/local/par4all/share/p4a_accel ... and the cuda libs folder /usr/local/cuda/targets/x86_64-linux/lib there should be no conflicts with these includes and lib folders as there are no refs to gcc version specific folders it would be best to get the examples working before trying this, just to familiarize and make sure it works, it was some time ago that I did this and haven't maintained the code so I'm not sure if there have been changes to GCC or CUDA that might break p4a
|
|
|
Logged
|
|
|
|
DarkBeam
Global Moderator
Fractal Senior
Posts: 2512
Fragments of the fractal -like the tip of it
|
|
« Reply #4 on: February 19, 2017, 08:13:36 PM » |
|
Great! 7000 times faster is really cool
|
|
|
Logged
|
No sweat, guardian of wisdom!
|
|
|
3dickulus
|
|
« Reply #5 on: February 19, 2017, 08:29:06 PM » |
|
p4a is an amazing piece of work, not sure why more people (here) haven't looked into it :-
edit:
just looking at p4a again...
installs in /opt/par4all
in candel make.sh -I/usr/local/par4all/share/p4a_accel is some dev headers iirc
going to give it a go and see if I can get the github version to work
I see a gcc 4.45 in the tree, this probably has some specific tweaks for p4a and this particular gcc4.45 executable might have to be used to compile the resulting C code generated by p4a...
|
|
« Last Edit: February 19, 2017, 11:49:42 PM by 3dickulus »
|
Logged
|
|
|
|
claude
Fractal Bachius
Posts: 563
|
|
« Reply #6 on: February 21, 2017, 02:35:31 AM » |
|
Ok I got p4a compiled and installed. The trick was this patch, which needs to be applied twice (!), once to the main source tree and once to the additional gcc that gets downloaded and unpacked during the build process. Witrhout the patch, I got multiple symbol definition errors, as if the inline definitions weren't really inline.... diff --git a/packages/pips-gfc/gcc-4.4.5/gcc/toplev.h b/packages/pips-gfc/gcc-4.4.5/gcc/toplev.h index 2324b068f7..b396ef72e4 100644 --- a/packages/pips-gfc/gcc-4.4.5/gcc/toplev.h +++ b/packages/pips-gfc/gcc-4.4.5/gcc/toplev.h @@ -186,6 +186,7 @@ extern int floor_log2 (unsigned HOST_WIDE_INT); # define CTZ_HWI __builtin_ctz # endif +#if 0 extern inline int floor_log2 (unsigned HOST_WIDE_INT x) { @@ -197,6 +198,7 @@ exact_log2 (unsigned HOST_WIDE_INT x) { return x == (x & -x) && x ? (int) CTZ_HWI (x) : -1; } +#endif #endif /* GCC_VERSION >= 3004 */ /* Functions used to get and set GCC's notion of in what directory
needs to be applied to these files: par4all/packages/pips-gfc/gcc-4.4.5/gcc/toplev.h par4all/build/pips/src/Passes/fortran95/gcc-4.4.5/gcc/toplev.h
EDIT: but it doesn't work, syntax error in some pips python code... $ p4a -vv --c99 --cuda --nvcc-flags="-gencode arch=compute_10,code=sm_10 -gencode arch=compute_20,code=sm_20" pandel.c -o pandel-cuda Traceback (most recent call last): File "/home/pips/opt/p4a/bin/p4a", line 10, in <module> import p4a_process File "/home/pips/opt/p4a/lib/python2.7/site-packages/pips/p4a_process.py", line 15, in <module> import p4a_processor File "/home/pips/opt/p4a/lib/python2.7/site-packages/pips/p4a_processor.py", line 16, in <module> import p4a_astrad File "/home/pips/opt/p4a/lib/python2.7/site-packages/pips/p4a_astrad.py", line 14, in <module> import pyps File "/home/pips/opt/p4a/lib/python2.7/site-packages/pips/pyps.py", line 2, in <module> from pypsbase import * File "/home/pips/opt/p4a/lib/python2.7/site-packages/pips/pypsbase.py", line 3, in <module> import pypips File "/home/pips/opt/p4a/lib/python2.7/site-packages/pips/pypips.py", line 142 def user_log(arg1, arg1=None, arg2=None, arg3=None, arg4=None, arg5=None, arg6=None, arg7=None, arg8=None, arg9=None, arg10=None): SyntaxError: duplicate argument 'arg1' in function definition
|
|
« Last Edit: February 21, 2017, 02:43:24 AM by claude, Reason: error »
|
Logged
|
|
|
|
3dickulus
|
|
« Reply #7 on: February 21, 2017, 03:53:33 AM » |
|
using p4a_setup.py I get this error... p4a_setup: Command 'par4all-p4a/packages/PIPS/pips/configure --prefix=par4all-p4a/refix=/usr/local/par4all lean PKG_CONFIG_PATH=par4all-p4a/refix=/usr/local/par4all/lib/pkgconfig --enable-tpips --enable-pyps --enable-hpfc --enable-fortran95' in par4all-p4a/build/pips failed with exit code 1 ...note the mangled path looks like p4a_setup.py isn't handling passing arguments to pips/configure properly I may resort to binary install but I would feel better if I compiled it going to spend free time after work this week fiddling with this...
|
|
|
Logged
|
|
|
|
3dickulus
|
|
« Reply #8 on: March 03, 2017, 05:16:47 AM » |
|
yay! got it to compile @claude I only had to apply the patch (above) once to par4all/packages/pips-gfc/gcc-4.4.5/gcc/toplev.h the stuff in the "build" folder gets created from the source tree but I had to make a couple of other changes too... par4all-p4a/packages/PIPS/pips/src/Libs/effects-convex/utils.c @@ -2609 ((int (*)()) compare_region_inequalities), NULL); @@ +2609 ((int (*)()) compare_region_inequalities));
par4all-p4a/packages/PIPS/pips/src/Libs/task_parallelization/instrumentation.c @@ +4 char *strdup(const char *s); I really should have documented what I did to get it working the first time going to try it out this weekend
|
|
« Last Edit: March 03, 2017, 05:23:45 AM by 3dickulus »
|
Logged
|
|
|
|
3dickulus
|
|
« Reply #9 on: March 05, 2017, 02:40:54 AM » |
|
I got par4all compiled/installed and candel/make.sh works on the original files, producing executables candel candel-omp and candel-cuda but... when I run p4a -vv --c99 --cuda candel.c -o candel-cuda it fails after generating candel.p4a.c before generating new p4a_accel.cpp and p4a_launcher.cppcandel.p4a.c:23:19: error: redefinition of '__bswap_64' /usr/include/bits/byteswap.h:109:1: note: previous definition of '__bswap_64' was here ... candel.p4a.c:23:19: warning: '__bswap_64' defined but not used [-Wunused-function] commenting out the '__bswap_64' definition in candel.p4a.c allows it to compile but without generating new p4a_accel.cpp and p4a_launcher.cppthe first time I did this iirc I didn't make any changes to p4a python code, it was just a couple of things in c files and setting paths hmmm... maybe a python brain can help find the bit that inserts the '__bswap_64' definition and will let the p4a.py script continue with the process
|
|
|
Logged
|
|
|
|
claude
Fractal Bachius
Posts: 563
|
|
« Reply #10 on: March 05, 2017, 03:59:11 AM » |
|
yes I gave up after getting *loads* of duplicate definitions causing compilation failures, seems all the include files of the p4a'd program get stuffed into the output C, which then causes problems when the p4a_accel.h includes some of them again. But I might have done something stupid when editing some python files to make them run without crashing with stack trace dumps...
The syntax error I mentioned is an easy fix, at least when hackily hacked into the SWIG output (the file is generated from some spec, but I didn't dig deep to fix it properly).
|
|
|
Logged
|
|
|
|
|