END OF AN ERA, FRACTALFORUMS.COM IS CONTINUED ON FRACTALFORUMS.ORG

it was a great time but no longer maintainable by c.Kleinhuis contact him for any data retrieval,
thanks and see you perhaps in 10 years again

this forum will stay online for reference
News: Visit the official fractalforums.com Youtube Channel
 
*
Welcome, Guest. Please login or register. March 28, 2024, 05:04:43 PM


Login with username, password and session length


The All New FractalForums is now in Public Beta Testing! Visit FractalForums.org and check it out!


Pages: [1]   Go Down
  Print  
Share this topic on DiggShare this topic on FacebookShare this topic on GoogleShare this topic on RedditShare this topic on StumbleUponShare this topic on Twitter
Author Topic: OMP vs Par4All  (Read 3368 times)
Description: A test of cuda, omp and standard M-set code
0 Members and 1 Guest are viewing this topic.
3dickulus
Global Moderator
Fractal Senior
******
Posts: 1558



WWW
« on: October 01, 2014, 03:25:45 PM »

3Dickulus test/demo toy

Requires: CUDA SDK + Par4All + OpenMP + GCC

Run from console only, creates a 640x480 .bmp file in the current directory.

Comparing render time for M using...

standard  :  mSec 247.018
   omp      :  mSec 120.571
  cuda      :  msec 000.035

The p4a cuda version is 3,444.88 times faster than omp and 7,057.66 times faster than standard cpu code.

I'm curious if anyone else has played around with Par4All

Here is a zip with 3 versions of C(m)andel, source code, make.sh and linux executables, the cuda code was generated from the standard .c code with (virtually) no intervention from me. I think this might be a good way to get specific parts of SFTC crunching on the GPU.

The most interesting thing I found was that after processing with Par4All nvcc is not needed to compile the resulting code, compiles with gcc, but nvcc is required to generate the c and cpp files.




* cpu-M.png (41.07 KB, 640x480 - viewed 388 times.)

* omp-M.png (41.28 KB, 640x480 - viewed 402 times.)

* cuda-M.png (41.5 KB, 640x480 - viewed 386 times.)
Logged

Resistance is fertile...
You will be illuminated!

                            #B^] https://en.wikibooks.org/wiki/Fractals/fragmentarium
3dickulus
Global Moderator
Fractal Senior
******
Posts: 1558



WWW
« Reply #1 on: October 01, 2014, 11:47:20 PM »

3Dickulus test/demo toy
Based on knighty's Pertubation and 3rd degree Mandelbrot evladraw script

Same as above  afro p(erturbed)mandel

Comparing render time for 3rdM using...

standard  :  mSec 156.032
       omp  :  mSec  67.784
       cuda  :  msec   0.031

The p4a cuda version is 2,186 times faster than omp and 5,033 times faster than standard cpu compile with -O3

I am very impressed with Par4All cheesy

(I think 1 pic will do)


* 3rdM.png (35.47 KB, 640x480 - viewed 408 times.)
« Last Edit: October 02, 2014, 12:36:32 AM by 3dickulus » Logged

Resistance is fertile...
You will be illuminated!

                            #B^] https://en.wikibooks.org/wiki/Fractals/fragmentarium
claude
Fractal Bachius
*
Posts: 563



WWW
« Reply #2 on: February 19, 2017, 12:40:45 PM »

par4all seems no longer maintained/supported, 2 years old last version is archived at https://github.com/Par4All/par4all

even so, I'm trying to get it working today, which is proving painful so far (the build process insists on restarting from scratch each time it fails...)

I couldn't compile your p4a'd pandel.c, because of redefinition conflicts between your embedded /usr/include/* and my own /usr/include/x86_64/gccversionblah/* that I was too dumb to figure out so far...
Logged
3dickulus
Global Moderator
Fractal Senior
******
Posts: 1558



WWW
« Reply #3 on: February 19, 2017, 08:09:17 PM »

p4a_launcher.cpp and p4a_accel.cpp are generated from...
Code:
p4a -vv --c99 --cuda --nvcc-flags="-gencode arch=compute_10,code=sm_10 -gencode arch=compute_20,code=sm_20" pandel.c -o pandel-cuda

Don't forget to...
Code:
source /usr/local/par4all/etc/par4all-rc.sh
...before using p4a smiley

the above needs to happen before running make.sh

/usr/local is the default install location for both CUDA and p4a

the make.sh script only references CUDA and p4a include folders...

/usr/local/cuda/include
/usr/local/par4all/share/p4a_accel
... and the cuda libs folder
/usr/local/cuda/targets/x86_64-linux/lib

there should be no conflicts with these includes and lib folders as there are no refs to gcc version specific folders

it would be best to get the examples working before trying this, just to familiarize and make sure it works, it was some time ago that I did this and haven't maintained the code so I'm not sure if there have been changes to GCC or CUDA that might break p4a undecided
Logged

Resistance is fertile...
You will be illuminated!

                            #B^] https://en.wikibooks.org/wiki/Fractals/fragmentarium
DarkBeam
Global Moderator
Fractal Senior
******
Posts: 2512


Fragments of the fractal -like the tip of it


« Reply #4 on: February 19, 2017, 08:13:36 PM »

Great!  A Beer Cup 7000 times faster is really cool
Logged

No sweat, guardian of wisdom!
3dickulus
Global Moderator
Fractal Senior
******
Posts: 1558



WWW
« Reply #5 on: February 19, 2017, 08:29:06 PM »

p4a is an amazing piece of work, not sure why more people (here) haven't looked into it  :-

edit:

 just looking at p4a again...

installs in /opt/par4all

in candel make.sh -I/usr/local/par4all/share/p4a_accel is some dev headers iirc

going to give it a go and see if I can get the github version to work

I see a gcc 4.45 in the tree, this probably has some specific tweaks for p4a and this particular gcc4.45 executable might have to be used to compile the resulting C code generated by p4a...
« Last Edit: February 19, 2017, 11:49:42 PM by 3dickulus » Logged

Resistance is fertile...
You will be illuminated!

                            #B^] https://en.wikibooks.org/wiki/Fractals/fragmentarium
claude
Fractal Bachius
*
Posts: 563



WWW
« Reply #6 on: February 21, 2017, 02:35:31 AM »

Ok I got p4a compiled and installed.  The trick was this patch, which needs to be applied twice (!), once to the main source tree and once to the additional gcc that gets downloaded and unpacked during the build process.  Witrhout the patch, I got multiple symbol definition errors, as if the inline definitions weren't really inline....

Code:
diff --git a/packages/pips-gfc/gcc-4.4.5/gcc/toplev.h b/packages/pips-gfc/gcc-4.4.5/gcc/toplev.h
index 2324b068f7..b396ef72e4 100644
--- a/packages/pips-gfc/gcc-4.4.5/gcc/toplev.h
+++ b/packages/pips-gfc/gcc-4.4.5/gcc/toplev.h
@@ -186,6 +186,7 @@ extern int floor_log2                  (unsigned HOST_WIDE_INT);
 #  define CTZ_HWI __builtin_ctz
 # endif
 
+#if 0
 extern inline int
 floor_log2 (unsigned HOST_WIDE_INT x)
 {
@@ -197,6 +198,7 @@ exact_log2 (unsigned HOST_WIDE_INT x)
 {
   return x == (x & -x) && x ? (int) CTZ_HWI (x) : -1;
 }
+#endif
 #endif /* GCC_VERSION >= 3004 */
 
 /* Functions used to get and set GCC's notion of in what directory

needs to be applied to these files:
Code:
par4all/packages/pips-gfc/gcc-4.4.5/gcc/toplev.h
par4all/build/pips/src/Passes/fortran95/gcc-4.4.5/gcc/toplev.h

EDIT: but it doesn't work, syntax error in some pips python code...
Code:
$ p4a -vv --c99 --cuda --nvcc-flags="-gencode arch=compute_10,code=sm_10 -gencode arch=compute_20,code=sm_20" pandel.c -o pandel-cuda
Traceback (most recent call last):
  File "/home/pips/opt/p4a/bin/p4a", line 10, in <module>
    import p4a_process
  File "/home/pips/opt/p4a/lib/python2.7/site-packages/pips/p4a_process.py", line 15, in <module>
    import p4a_processor
  File "/home/pips/opt/p4a/lib/python2.7/site-packages/pips/p4a_processor.py", line 16, in <module>
    import p4a_astrad
  File "/home/pips/opt/p4a/lib/python2.7/site-packages/pips/p4a_astrad.py", line 14, in <module>
    import pyps
  File "/home/pips/opt/p4a/lib/python2.7/site-packages/pips/pyps.py", line 2, in <module>
    from pypsbase import *
  File "/home/pips/opt/p4a/lib/python2.7/site-packages/pips/pypsbase.py", line 3, in <module>
    import pypips
  File "/home/pips/opt/p4a/lib/python2.7/site-packages/pips/pypips.py", line 142
    def user_log(arg1, arg1=None, arg2=None, arg3=None, arg4=None, arg5=None, arg6=None, arg7=None, arg8=None, arg9=None, arg10=None):
SyntaxError: duplicate argument 'arg1' in function definition
« Last Edit: February 21, 2017, 02:43:24 AM by claude, Reason: error » Logged
3dickulus
Global Moderator
Fractal Senior
******
Posts: 1558



WWW
« Reply #7 on: February 21, 2017, 03:53:33 AM »

using p4a_setup.py I get this error...

p4a_setup: Command 'par4all-p4a/packages/PIPS/pips/configure --prefix=par4all-p4a/refix=/usr/local/par4all lean PKG_CONFIG_PATH=par4all-p4a/refix=/usr/local/par4all/lib/pkgconfig --enable-tpips --enable-pyps --enable-hpfc --enable-fortran95' in par4all-p4a/build/pips failed with exit code 1

...note the mangled path  undecided looks like p4a_setup.py isn't handling passing arguments to pips/configure properly

I may resort to binary install but I would feel better if I compiled it  sad going to spend free time after work  this week fiddling with this...
Logged

Resistance is fertile...
You will be illuminated!

                            #B^] https://en.wikibooks.org/wiki/Fractals/fragmentarium
3dickulus
Global Moderator
Fractal Senior
******
Posts: 1558



WWW
« Reply #8 on: March 03, 2017, 05:16:47 AM »

yay! got it to compile

@claude I only had to apply the patch (above) once to par4all/packages/pips-gfc/gcc-4.4.5/gcc/toplev.h
   the stuff in the "build" folder gets created from the source tree

 but I had to make a couple of other changes too...

Code:
par4all-p4a/packages/PIPS/pips/src/Libs/effects-convex/utils.c
@@ -2609       ((int (*)()) compare_region_inequalities), NULL);
@@ +2609       ((int (*)()) compare_region_inequalities));

par4all-p4a/packages/PIPS/pips/src/Libs/task_parallelization/instrumentation.c
@@ +4        char *strdup(const char *s);

I really should have documented what I did to get it working the first time angry

going to try it out this weekend  evil
« Last Edit: March 03, 2017, 05:23:45 AM by 3dickulus » Logged

Resistance is fertile...
You will be illuminated!

                            #B^] https://en.wikibooks.org/wiki/Fractals/fragmentarium
3dickulus
Global Moderator
Fractal Senior
******
Posts: 1558



WWW
« Reply #9 on: March 05, 2017, 02:40:54 AM »

I got par4all compiled/installed and candel/make.sh works on the original files, producing executables candel candel-omp and candel-cuda but...
when I run p4a -vv --c99 --cuda candel.c -o candel-cuda it fails after generating candel.p4a.c before generating new p4a_accel.cpp and p4a_launcher.cpp
Code:
candel.p4a.c:23:19: error: redefinition of '__bswap_64'
/usr/include/bits/byteswap.h:109:1: note: previous definition of '__bswap_64' was here
...
candel.p4a.c:23:19: warning: '__bswap_64' defined but not used [-Wunused-function]
commenting out the '__bswap_64' definition in candel.p4a.c allows it to compile  but without generating new p4a_accel.cpp and p4a_launcher.cpp

the first time I did this iirc I didn't make any changes to p4a python code, it was just a couple of things in c files and setting paths

hmmm... maybe a python brain can help find the bit that inserts the '__bswap_64' definition and will let the p4a.py script continue with the process huh?
Logged

Resistance is fertile...
You will be illuminated!

                            #B^] https://en.wikibooks.org/wiki/Fractals/fragmentarium
claude
Fractal Bachius
*
Posts: 563



WWW
« Reply #10 on: March 05, 2017, 03:59:11 AM »

yes I gave up after getting *loads* of duplicate definitions causing compilation failures, seems all the include files of the p4a'd program get stuffed into the output C, which then causes problems when the p4a_accel.h includes some of them again.  But I might have done something stupid when editing some python files to make them run without crashing with stack trace dumps...

The syntax error I mentioned is an easy fix, at least when hackily hacked into the SWIG output (the file is generated from some spec, but I didn't dig deep to fix it properly).
Logged
Pages: [1]   Go Down
  Print  
 
Jump to:  


Powered by MySQL Powered by PHP Powered by SMF 1.1.21 | SMF © 2015, Simple Machines

Valid XHTML 1.0! Valid CSS! Dilber MC Theme by HarzeM
Page created in 0.252 seconds with 25 queries. (Pretty URLs adds 0.01s, 2q)