knighty
Fractal Iambus
Posts: 819
|
|
« Reply #45 on: December 01, 2015, 02:09:34 PM » |
|
Thank you for the tests (and the switches.) I gess you tested the program with default value of period. You can change it by passing the period as a command line argument (for a period of 25 : MandelRoots 25 ... beweare that's more than 16 millions roots. Using double float you can go for 31 period which means more than 1 billion roots... if it works correctly) The numbers I gave were with a 32bit build. -Ofast gives 25 times faster results than -O3. for a 64bit buil its "only" about 3 times faster. Using long double or floats there is no big difference between -O3 and -Ofast and they are much slower than double. I don't think vectorization have something to do with this discreapancy because the evaldraw script is almost as fast as -Ofast (with doubles). Evaldraw's compiler, which is a JIT, does'nt do any vectorization optimization. It uses SSE though. @ Adam Majewski: Thank you for the tip. I didn't know about the -std=c++11 switch. @ zebastian: Thank you! I gess the problem comes from store() which is called from findRootsRecurse(). store() stores the root in a vector array. It is not thread safe at all. Maybe using multiple instances of MRootsCalc class ?
|
|
|
Logged
|
|
|
|
Adam Majewski
|
|
« Reply #46 on: December 01, 2015, 04:20:01 PM » |
|
g++ -Ofast -std=c++11 m.cpp -fopenmp -Wall a@zalman:~/cpp/mroots/m2$ time ./a.out tab size: 1024 nbrRoots (expected) = 512 Searchig for roots... *** Error in `./a.out': double free or corruption (!prev): 0x0000000000dd5080 *** Przerwane (core dumped)
real 0m0.096s user 0m0.025s sys 0m0.073s a@zalman:~/cpp/mroots/m2$ g++ m.cpp -std=c++11 -fopenmp -Wall a@zalman:~/cpp/mroots/m2$ time ./a.out tab size: 1024 nbrRoots (expected) = 512 Searchig for roots... roots found (with duplicates): 1017 Removing duplicates... roots found: 243 243 time taken : 0.069925 s Saving results to save.txt
|
|
|
Logged
|
|
|
|
quaz0r
Fractal Molossus
Posts: 652
|
|
« Reply #47 on: December 01, 2015, 08:05:32 PM » |
|
Evaldraw's compiler, which is a JIT, does'nt do any vectorization optimization. It uses SSE though. not sure what you mean? sse is a vector unit
|
|
|
Logged
|
|
|
|
|
quaz0r
Fractal Molossus
Posts: 652
|
|
« Reply #49 on: December 01, 2015, 09:07:34 PM » |
|
vectors are most definitely not thread safe. if one was to write proper threaded code you would definitely use a thread synchronization utility like a semaphore or mutex for such things. if you are just going to throw an openmp pragma around a block of code, there is also openmp stuff for specifying how to synchronize things.
|
|
|
Logged
|
|
|
|
lycium
|
|
« Reply #50 on: December 01, 2015, 10:02:28 PM » |
|
A simple way around this is to make N vectors (for N threads) and then just write into the vector indexed by omp_get_thread_num(). Combine after processing is done.
|
|
|
Logged
|
|
|
|
quaz0r
Fractal Molossus
Posts: 652
|
|
« Reply #51 on: December 01, 2015, 10:05:31 PM » |
|
my own preference is to just use a normal C array and write into that. it is thread safe as long as your threads stick to their own designated ranges of the array.
|
|
|
Logged
|
|
|
|
lycium
|
|
« Reply #52 on: December 01, 2015, 10:09:58 PM » |
|
it is thread safe as long as your threads stick to their own designated ranges of the array.
Somewhat circular argument Besides, the idea is to not have mutex locking, atomics, etc.
|
|
|
Logged
|
|
|
|
knighty
Fractal Iambus
Posts: 819
|
|
« Reply #53 on: December 01, 2015, 10:26:04 PM » |
|
Somewhat circular argument Besides, the idea is to not have mutex locking, atomics, etc. Simple! Obvious! Genius!
|
|
|
Logged
|
|
|
|
quaz0r
Fractal Molossus
Posts: 652
|
|
« Reply #54 on: December 01, 2015, 10:29:12 PM » |
|
Somewhat circular argument Besides, the idea is to not have mutex locking, atomics, etc. sorry, what? i was simply pointing out that multiple threads writing to different sections of one array is thread safe and does not require any thread synchronization techniques, like mutexes or atomics as you said. simple, fast, and does not require any other screwing around like creating multiple vectors and then combining them later.
|
|
|
Logged
|
|
|
|
lycium
|
|
« Reply #55 on: December 01, 2015, 10:33:24 PM » |
|
sorry, what?
The statement is roughly, "C arrays are thread safe as long as you use them in a thread-safe way" So my suggestion is basically a way to do that, with whatever structure you like (C array, vector, whatever), without any contention or locking. Note also that we don't know the number of roots ahead of time, so the simple C array would turn into something like std::vector anyway.
|
|
|
Logged
|
|
|
|
quaz0r
Fractal Molossus
Posts: 652
|
|
« Reply #56 on: December 01, 2015, 10:46:42 PM » |
|
Note also that we don't know the number of roots ahead of time, so the simple C array would turn into something like std::vector anyway. ah, indeed that puts a damper on using a C array The statement is roughly, "C arrays are thread safe as long as you use them in a thread-safe way" i guess im just confused as to why you feel the need to be hostile about this...lol...it wasnt a "circular argument." i was simply pointing out that writing to C arrays is not thread UNsafe in the way that writing to std containers like std::vector is. did you already know that? thats wonderful. i thought this was a discussion board where people discuss things, my bad. besides, a lot of people here tend to be mathematicians first, with the code writing often happening in a more by-gee and by-golly sort of manner, so fostering code-related discussion seems like a reasonable thing to me
|
|
|
Logged
|
|
|
|
lycium
|
|
« Reply #57 on: December 01, 2015, 10:50:23 PM » |
|
Awwww, I wasn't being hostile I think? No hostility intended - I hope you see what I mean about circular argument (argument not as in "having a heated argument", rather "point being made"). Actually I would say calling "my" approach "screwing around" is more hostile than suggesting a circular argument is circular...PS. God knows I prefer low-level arrays too, where possible But there's nothing to fear in std::vector, it's indistinguishable (compiles to just a MOV) from an array if you use resize() and only index in valid range, with range checks turned off (in MSVC).
|
|
« Last Edit: December 01, 2015, 10:58:11 PM by lycium »
|
Logged
|
|
|
|
knighty
Fractal Iambus
Posts: 819
|
|
« Reply #58 on: December 01, 2015, 10:59:27 PM » |
|
oops! wrong quote. I was actually referring to quazor's idea: getting rid of the syncronization. Doing it using multiple arrays also solves the problem. The problem is not that we don't the number of roots (because we do: 2^(m_maxiter) but it is that the number of duplicates is not obvious to estimate (potentially a root can be reported up to 8 times because of the affine arithmetics and rounding errors).
|
|
|
Logged
|
|
|
|
lycium
|
|
« Reply #59 on: December 01, 2015, 11:01:42 PM » |
|
*sigh* I'd better leave this discussion
|
|
|
Logged
|
|
|
|
|