Logo by Cyclops - Contribute your own Logo!


it was a great time but no longer maintainable by c.Kleinhuis contact him for any data retrieval,
thanks and see you perhaps in 10 years again

this forum will stay online for reference
News: Support us via Flattr FLATTR Link
Welcome, Guest. Please login or register. June 27, 2019, 03:21:13 PM

Login with username, password and session length

The All New FractalForums is now in Public Beta Testing! Visit FractalForums.org and check it out!

Pages: 1 [2]   Go Down
Share this topic on DiggShare this topic on FacebookShare this topic on GoogleShare this topic on RedditShare this topic on StumbleUponShare this topic on Twitter
Author Topic: bignum benchmark?  (Read 5555 times)
0 Members and 1 Guest are viewing this topic.
David Makin
Global Moderator
Fractal Senior
Posts: 2286

Makin' Magic Fractals
« Reply #15 on: September 25, 2011, 02:38:04 PM »

In case it helps here's a quote from Damien Jones' "Whoosh", though it's considerably deeper than 1e-100:

"The final frame of this movie needs 35 decimals, and has a minimum iteration count of more than 1.6 million (the maximum calculated was just under ten million)."

'35 decimals' is not merely ~1e-35?  huh?

Duh ! Thanks Marius - I really should concentrate more ! It looked deeper than some of the 1e-100 zooms I've seen before, but probably because they cheated, being in dull and uninteresting locations that are quick to calculate wink
Post corrected smiley

The meaning and purpose of life is to give life purpose and meaning.

"Makin' Magic Music" on Jango
Forums Freshman
Posts: 13

« Reply #16 on: September 28, 2011, 04:48:04 PM »

There is a trick to eliminate the x*y multiplication in your code: 2*x*y = (x+y)*(x+y) - x*x - y*y, that is 3 squares...

Hi! Thanks for your suggestion!

A fast revisit to my code, to find out how can I improve it with your hint, turned out that the sqr() function is also unoptimal. It should accumulate the symmetric (a5*a8=a8*a5) multiplications and then shift it left by 1, rather that accumulate and add the previously calculated sum. (This optimization was done by my little optimizer, but at least automatically)

So let's say the x*y bigmul uses 4+4 instructions in it's basic 'cell' (1 mul_lo or mul_hi, plus an add, then a carry check and finnally an add for the accumulated carry values)
And the s*s bigmul uses 4+3 instructions (the smaller '3' cell is only reusing a previous multiplication result)

Then x*x, y*y, x*y costs (4+3)*2+(4+4) = 22 (based on the limbcount^2).
Further optimizing with the trick: x*x, y*y, (x+y)^2 costs (4+3)*3 = 21
And finally making sqr() with the proper optimization (not with the dumb automatic optimizer I used cheesy) can reduce it to 4*3=12 (around 2x faster)

(note: the basic computing 'cell' in my long multiplication algo is this:
1. calculate a*b (return hi or lo 32bits)
2. add it to 'Accum'
3. compare new 'Accum' with previous Accum value (result is -1 when old>new, otherwise 0)
4. add the result of the comparison(3.) to the 'CarryAccum' register. )

Well, in theory this could be cool cheesy
Wish I have more time to play with it...
Botond Kósa
Fractal Lover
Posts: 233

« Reply #17 on: October 06, 2011, 10:00:22 AM »

1. calculate a*b (return hi or lo 32bits)
2. add it to 'Accum'
3. compare new 'Accum' with previous Accum value (result is -1 when old>new, otherwise 0)
4. add the result of the comparison(3.) to the 'CarryAccum' register. )

Another possible way of speeding up your basic computing cell is to substitute the comparison in step 3 with some bitwise operations and shifting. For example, wher using signed integers,
if(old>new) then result = -1; else result = 0;
can be replaced with
result = (new-old)>>31;
which does the same, since if new-old is negative, its two's complement representation starts with 1, otherwise with 0. Shifting it right by 31 bits preserves the sign of the number, resulting in 0 for nonnegative, -1 for negative numbers (with all bits set to 1). Although this is 2 operations, it will be faster than the IF statement. Comparisons are slow because they cause a conditional branch in the code, and despite the advanced branch predictors in modern CPUs, they can cause a pipeline flush, which degrades performance.

Check out my Mandelbrot set explorer:
Fractal Phenom
Posts: 443

« Reply #18 on: November 03, 2014, 12:17:52 PM »


myself and an associate have implemented maths with big numbers in CUDA. We use warp synchronous programming, i.e. 32 threads in combination process the maths for one number. The minimum system requirement is an nVidia GPU with Compute Capability 3.0 (Kepler and later).

Our current approach is a 1024 bit unsigned integer implementation, using limbs of 32 bit in each thread. We have implemented bit shifts, comparison, addition, multiplication (with 2048 bit result split into lo and hi parts), division. I also started working on a 2048 bit version using 64 bit integers per thread. I think going up to 4096 bits should be feasible too - afterwards it gets increasingly difficult to scale this up.

We've also done modular exponentiation with the Montgomery multiplication (for crypto related applications like e.g. dealing with large prime numbers).

It would be lovely to do some deep Mandelbrot zooms with these libraries, but I fear in its current form we can't do it unless we track the sign bit of each operation separately.
« Last Edit: November 03, 2014, 12:21:51 PM by cbuchner1 » Logged
Pages: 1 [2]   Go Down
Jump to:  

Related Topics
Subject Started by Replies Views Last post
A processor benchmark for Mandelbulb 3d Mandelbulb 3d « 1 2 3 » ant123 44 12704 Last post December 10, 2017, 09:44:52 PM
by AtomicNixon
Fractal Extreme Performance Benchmark Fractal eXtreme stardust4ever 0 6568 Last post March 21, 2012, 03:21:06 PM
by stardust4ever
Benchmark-file for Mandelbulb3D, Mandelbulber, etc. Let's collaborate on something! VanlindtMarc 3 1028 Last post August 04, 2013, 12:26:52 PM
by taurus
squaring benchmark Complex Numbers claude 13 668 Last post April 11, 2016, 09:43:51 AM
by Kalles Fraktaler

Powered by MySQL Powered by PHP Powered by SMF 1.1.21 | SMF © 2015, Simple Machines

Valid XHTML 1.0! Valid CSS! Dilber MC Theme by HarzeM
Page created in 0.107 seconds with 27 queries. (Pretty URLs adds 0.009s, 2q)