Double-double-precision is a number format that uses two double-precision numbers to represent a single number.
(Technically, this technique could use any multiple of floating point numbers, even of different kind)The first double of the number is called the high part, the other the low part. The high part encodes the main portion of the number, eg. 3.141592653589793, and the low part encodes the exact residue of (π - hi), which is
π = 3.14159265358979323846264338327950288...
- hi = 3.14159265358979311599796346854419
-----------------------------------------
lo = 0.00000000000000012246467991473531
As two double precision numbers are used together, the combination does not have the same precision as an IEEE 128-bit floating-point number, even though it occupies the same space. This is because they store two signs, two exponents and two mantissas.
Number of bits:
Type | Sign | Exponent | significand | total |
Half precision (IEEE 754r) | 1 | 5 | 10 | 16 |
Single | 1 | 8 | 23 | 32 |
Double | 1 | 11 | 52 | 64 |
Double-double | 2 | 22 | 104 | 128 |
Quad | 1 | 15 | 112 | 128 |
Even though the double-double has less precision than the quad-precision, it
is able to store numbers Quad-precision can't. Consider this number:
1.00000000000000000000000000000000001which in Quad-precision would just be 1.0, but can be expressed in double-double precision by the tuple (1.0, 1.0
-45})
Arithmetic with Double-double precision is not straight-forward, but is it nonetheless a lot faster than pure software emulation (eg. arbitrary-precision).
Consider these two basic functions in Java:
public DoubleDouble add(DoubleDouble y)
{
double a, b, c, d, e, f;
e = this.hi + y.hi;
d = this.hi - e;
a = this.lo + y.lo;
f = this.lo - a;
d = ((this.hi - (d + e)) + (d + y.hi)) + a;
b = e + d;
c = ((this.lo - (f + a)) + (f + y.lo)) + (d + (e - b));
a = b + c;
return new DoubleDouble(a, c + (b - a));
}
public DoubleDouble mul(DoubleDouble y)
{
double a, b, c, d, e;
a = 0x08000001 * this.hi;
a += this.hi - a;
b = this.hi - a;
c = 0x08000001 * y.hi;
c += y.hi - c;
d = y.hi - c;
e = this.hi * y.hi;
c = (((a * c - e) + (a * d + b * c)) + b * d) + (this.lo * y.hi + this.hi * y.lo);
a = e + c;
return new DoubleDouble(a, c + (e - a));
}In particular, after some (sub-)operations (bold lines) and at the very end, the numbers
must be 'normalized'. This is the process of trying to store the complete result (hi+lo) in the hi part, and then subtracting the hi part from the original number, storing the remainder in the lo part. This introduces a lot of odd structures in the code using parenthesis. When these structures are simplified according to mathematical rules, the functions will actually break.
Beware of this when choosing optimization flags.I used the (somewhat buggy) ''QD'' library in C++ from the site
http://crd.lbl.gov/~dhbailey/mpdist/index.html as a reference, learned the underlying technique, and fixed many bugs and improved efficiency.
ps. I think Windows Calculator (WinXP and newer) uses Double-double precision. The numbers I get for every operation I can think off, have the same amount of digits as my Double-double implementation generates, which is a rather odd coincidence if it uses another kind of emulation.
I've attached my up-to-date Java library, but for those who think about giving the original C version a shot;
It's not recommended. Just take a look at the included the improvement notes I recorded as I was porting it.