An Improved Algorithm for hypot(a,b)
Carlos F. Borges

TL;DR
This paper introduces a new, faster, and more accurate algorithm for computing the hypotenuse of two floating point numbers, improving upon existing library functions in Julia and C.
Contribution
The paper presents four novel algorithms for hypot computation and compares their performance to standard library implementations.
Findings
New algorithms outperform existing functions in speed and accuracy
Simulation results demonstrate significant improvements
Enhanced hypot computation benefits scientific and engineering applications
Abstract
We develop a fast and accurate algorithm for evaluating for two floating point numbers and . Library functions that perform this computation are generally named {\tt hypot(a,b)}. We will compare four approaches that we will develop in this paper to the current resident library function that is delivered with Julia 1.1 and to the code that has been distributed with the C math library for decades. We will demonstrate the performance of our algorithms by simulation.
| Method | One ulp errors (%) | Two ulp errors (%) |
|---|---|---|
| Julia1.1_hypot() | 35.08 | 0.16 |
| clib_hypot() | 12.91 | 0 |
| Naive (Unfused) | 16.70 | 0 |
| Naive (Fused) | 13.02 | 0 |
| Corrected (Unfused) | 0.54 | 0 |
| Corrected (Fused) | 0 | 0 |
| Naive | Corrected | |||||
| N | Julia1.1_hypot() | clib_hypot() | Unfused | Fused | Unfused | Fused |
| 0 | 29.2282102 | 11.9488456 | 15.5629732 | 12.6486134 | 0.8005489 | 0 |
| 1 | 36.6616214 | 13.8082644 | 17.1661734 | 13.6513319 | 0.6212053 | 0 |
| 2 | 39.193388 | 13.1111126 | 17.087418 | 13.048634 | 0.5320131 | 0 |
| 3 | 39.9510329 | 13.0036249 | 17.2513067 | 12.9993049 | 0.2760227 | 0 |
| 4 | 40.1496371 | 12.9966526 | 17.307515 | 12.9960858 | 0.0799361 | 0 |
| 5 | 40.2052568 | 12.9972241 | 17.3246431 | 12.9970895 | 0.0212722 | 0 |
| 6 | 40.2194854 | 12.9962604 | 17.3277278 | 12.996271 | 0.005506 | 0 |
| 7 | 40.2215552 | 12.9957037 | 17.3283368 | 12.9957233 | 0.0013865 | 0 |
| 8 | 40.2199042 | 12.995122 | 17.3271957 | 12.9951272 | 0.0003476 | 0 |
| 9 | 40.2189412 | 12.9970594 | 17.3293785 | 12.997061 | 8.73E-05 | 0 |
| 10 | 40.2212047 | 12.9959983 | 17.3285051 | 12.9959978 | 2.31E-05 | 0 |
| 11 | 40.2206441 | 12.9956756 | 17.3285992 | 12.9956754 | 6.00E-06 | 0 |
| 12 | 40.1675266 | 12.9963116 | 17.3284901 | 12.9963118 | 7.00E-07 | 0 |
| 13 | 40.7464532 | 12.9970054 | 17.3285922 | 12.9970032 | 7.00E-07 | 0 |
| 14 | 44.5853178 | 12.9967234 | 17.3279259 | 12.9967244 | 0 | 0 |
| 15 | 45.3009456 | 12.997981 | 17.3296289 | 12.99798 | 0 | 0 |
| 16 | 45.3487089 | 12.9976653 | 17.3282103 | 12.9976649 | 0 | 0 |
| 17 | 45.3496228 | 12.9961767 | 17.3260454 | 12.9961782 | 0 | 0 |
| 18 | 45.351359 | 12.9986208 | 17.3302856 | 12.9986212 | 0 | 0 |
| 19 | 45.3513136 | 12.9964465 | 17.3285827 | 12.9964469 | 0 | 0 |
| 20 | 45.3510285 | 12.9965624 | 17.3302195 | 12.9965623 | 0 | 0 |
| 21 | 45.3503113 | 12.9941995 | 17.3281136 | 12.9942002 | 0 | 0 |
| 22 | 45.3454522 | 12.9965376 | 17.3279103 | 12.996537 | 0 | 0 |
| 23 | 45.3151619 | 12.9971009 | 17.3284377 | 12.9971044 | 0 | 0 |
| 24 | 45.1841774 | 13.0202852 | 17.3391219 | 13.0202826 | 0 | 0 |
| 25 | 46.6619492 | 13.2828222 | 17.261221 | 13.2828215 | 0 | 0 |
| 26 | 58.5652028 | 16.8179593 | 21.0366657 | 13.7850471 | 0 | 0 |
| 27 | 0 | 4.0541284 | 4.3904588 | 2.1950835 | 0 | 0 |
| 28 | 0 | 0 | 0 | 0 | 0 | 0 |
| 29 | 0 | 0 | 0 | 0 | 0 | 0 |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNumerical Methods and Algorithms
An Improved Algorithm for hypot(a,b)
Carlos F. Borges
Department of Applied Mathematics
Naval Postgraduate School
Monterey CA 93943
(Date: April 30, 2019.)
Abstract.
We develop a fast and accurate algorithm for evaluating for two floating point numbers and . Library functions that perform this computation are generally named hypot(a,b). We will compare four approaches that we will develop in this paper to the current resident library function that is delivered with Julia 1.1 and to the code that has been distributed with the C math library for decades. We will demonstrate the performance of our algorithms by simulation.
Key words and phrases:
hypot(), floating point, fused multiply-add, IEEE 754
2000 Mathematics Subject Classification:
Primary 65Y04
Given two floating point numbers and we wish to compute quickly and accurately. Library functions that perform this computation are generally named hypot(a,b). And although the IEEE 754 standard suggests (but does not require) that computer languages provide a version that gives a correctly rounded answer, we are not aware of any current language in wide use that does so.
We note that the problem is trivial if either operand is zero and further that the magnitudes of the operands are important but the signs are not. Therefore, we shall confine our in depth analysis to the case where . Enforcing this condition will be an initialization step when we develop an actual code. This paper will be restricted to the case where all floating point calculations are done in IEEE 754 compliant arithmetic using round-to-nearest rounding, although many of the results can be extended to other formats under proper conditions.
1. Existing approaches
The need to compute is common in numerical computation and so there are a few existing standard codes for doing so. A common textbook approach is the widely known stable algorithm for this computation that is used in Julia 1.1 which we shall call Julia1.1_hypot(a,b). It essentially amounts to
Algorithm 1**.**
Julia1.1_hypot()* *
if* a == 0 then *
* h = 0 *
else
* r = b/a *
* h = asqrt(1+rr) *
end* *if
This approach avoids unecessary overflow/underflow that might occur in an interim calculation when very large/small inputs are squared. In the non-trivial case it accomplishes this by rescaling so that the quantity that must be squared is and this cannot lead to an overflow/underflow error prior to the calculation of the square root. Overflow is only possible if the true value of is beyond the range of the floating point format.
The downside of this approach is that it rescales even when rescaling is unecessary and that rescaling generates additional rounding error. We will see in our experiements that this approach sometimes give answers that are as far as two ulps from the correctly rounded answer.
The second existing approach is the code that has been delivered with the standard C math library since at least 1991. The code appears in appendix B and we shall refer to it as clib_hypot(). It deals with widely varying operands and prevents intermediate overflow/underflow using methods similar to those we will use in ours.111There is a clear flaw in the threshold used for widely varying operands and it is much broader than necessary. It works by using a variety of tricks to artificially extend the precision of the computed so that they get a correctly rounded value (or very nearly so). This is then passed to the sqrt() function. What we will observe in the testing is that all this additional work can be effectively superseded by a single fused multiply-add.
2. Mathematical Preliminaries
We begin with a few definitions. We shall denote by the set of all strictly positive floating point numbers in the normalized range of the current format. We define to be a function such that is the element of that is closest to provided lies within the normalized range of the current format. We define machine epsilon, which we will denote by , to be the difference between and the closest larger element of . For example, in IEEE 754 double precision . We define and to be, respectively, the largest and smallest elements of .
We will also need the following three lemmas. The first two both relate to floating point computations.
Lemma 1**.**
Let lie within the normalized range of the current format. Then
[TABLE]
for some satisfying
This lemma is very well known and can be found in [1]. A lesser known but related lemma is:222To see why it’s true simply consider the operation 1.0*(1.0 - eps/3.9).
Lemma 2**.**
Let be any normalized floating point number. Then
[TABLE]
for any satisfying .
And finally, this analytical lemma will be useful in what follows and is easily established by examining the Maclaurin series expansion of :
Lemma 3**.**
If then
3. Operands with widely differing magnitudes
To see what happens when have widely differing magnitudes observe that
[TABLE]
where the inequality follows by replacing with in lemma 3. Now, if
[TABLE]
then lemma 2 guarantees that . We can rewrite the inequality and accept as the correctly rounded answer whenever . Note that this form will also yield the exact answer whenever . If we test for this case first in our code then we will not have to separately check for zero operands in the initialization phase.
4. Operands without widely differing magnitudes
When the operands do not have widely differing magnitudes then our approach will be to compute with some care and use the built-in sqrt() function. There could be an unecessary floating point exception if the calculation of leads to an overflow/underflow/denormalization and we need to avoid that where possible. Clearly, there can be no overflow if and there can be no underflow/denormalization provided that . If one of these is violated333At this stage of the process it is not possible to violate both because such a pair of operands would have widely differing magnitudes. we can avoid the exception by rescaling the operands, doing the calculation, and then scaling back. When such rescaling is necessary there are two options. Some codes use the standard library functions frexp() and ldexp() to directly change the exponents and hence effect the rescaling without incurring rounding errors. The only downside of this is that it can be a bit more costly and it is not as direct. A better approach is to rescale by multiplying by a power of the base. This has the same effect as directly changing the exponents (i.e. there is no rounding error) but it leverages the hardware floating point operations which are generally much faster. For this to work the rescaling constant needs to be an appropriate integer power of the base. The precise range of appropriate values is dependent on the particular floating point format but one relatively format independent method for finding an appropriate rescaling value is to use eps(sqrt(floatmin(T))) where T is the floating point type. This will give a rescaling whose value is smaller than and whose reciprocal is larger than . This can be used to rescale and then scale back the operands with no rounding error. Moreover, the cumbersome calculation of this constant should be taken care of at compile time so there is no additional cost.
A little rounding error analysis will be very useful at this point. Let be the result of our floating point calculation of . Because this calculation will be done in floating point there must be some , whose value we do not yet know, such that
[TABLE]
Now let . Since we are assuming that the sqrt() function is correctly rounded lemma 1 implies that for some . Putting this all together and invoking lemma 3 leads to
[TABLE]
The first thing we should observe is that if we were able to compute a correctly rounded then . Plugging in above we see that the relative error in the computed could be as large as
[TABLE]
and hence the computed , after rounding, could be as much as one different from the correctly rounded true value.
In a similar fashion if then the relative error in the computed could be as large as
[TABLE]
and hence the computed , after rounding, could not be more than one different from the correctly rounded true value. Careful rounding error analysis of the computation shows that the computed value satisfies
[TABLE]
and hence our computed answer must be within one ulp of the correctly rounded one. An excellent treatment can be found in section 5.3 of [2].
4.1. Naive Approaches
At this point we propose two naive algorithms for the computation in the event that the operands are not widely separated. First is the direct calculation
Algorithm 2**.**
Naive (Unfused)
h = sqrt(aa+bb)* *
And the second uses the fused multiply-add operation which allows us to get a more accurate calculation of .
Algorithm 3**.**
Naive (Fused)
h = sqrt(fma(a,a,bb)) *
4.2. Differential Correction
As we noted at the end of section 4 the correctly rounded square root of the correctly rounded can still be off by as much as one ulp. This hints at the possibility that working harder to compute more accurately may not be the best path to a better answer. That leads us to this final approach which we shall see has a great deal of merit.
A subtle and highly underappreciated phenomenon of floating point computation is that formally invertible functions are often not one-to-one in correctly rounded floating point. The square root function is a perfect example. If we let then it is clear that sqrt() maps the finite set onto a proper subset of itself and hence is not one-to-one by the pigeonhole principle.
Why is this is important? Because it tells us that we should not be looking for a correction that
[TABLE]
but rather we should be looking for a correction such that if is the computed hypoteneuse we have
[TABLE]
If we had such an then we could use the Maclaurin series expansion
[TABLE]
truncated to first order to correct our computed hypoteneuse value.444We note that our testing indicates that there is no point in taking the Maclaurin series correction past the first order term.
Setting allows us to find that and although we cannot generally compute exactly, we do want to do so very carefully. If we let then the reader can verify that
[TABLE]
Note that because and are nearly equal numbers there is no rounding error in computing . The precise arrangement and evaluation order of this formula is critical to get the best performance over the full range of values.
The form above gives good performance over the full range of operands but we can do better if we switch approaches based on the sizes of the arguments. In particular, if then we will set and use
[TABLE]
otherwise, we will set and use
[TABLE]
in both cases is computed without rounding error. It is worth noting that there are many other ways to compute and one can break this problem into lots of tiny pieces and do better but we feel that this approach is the best tradeoff between accuracy and efficient code. We can now use this to apply a correction to the computed value of which leads to our third algorithm which uses only the standard floating point operations.
Algorithm 4**.**
Corrected (Unfused)
h = sqrt(aa+bb)* *
if* h <= 2ay* then *
* delta = h-ay *
* x = ax(2delta - ax)+(delta-2(ax-ay))delta *
else
* delta = h-ax *
* x = 2delta*(ax-2ay) + (4delta-ay)ay + deltadelta* *
end* *if
h = h - x/(2h) *
Finally, if it happens that we are on an architecture that supports the fused multiply-add then it is possible to compute the correction far more accurately. This leads to our final code:
Algorithm 5**.**
Corrected (Fused)
h = sqrt(fma(a,a,bb)) *
h_sq = hh *
a_sq = aa *
x = fma(-b,b,h_sq-a_sq) + fma(h,h,-h_sq) - fma(a,a,-a_sq)* *
h = h - x/(2h) *
5. Testing
We now test our proposed algorithms against the two existing approaches. As a baseline for testing purposes we will use the big float format in Julia to do an arbitrary precision calculation of the hypoteneuse and then cast the result to a double precision floating point number to get a correctly rounded value that we shall call . We note that in some exceedingly rare situations it may not be true that this yields the correctly rounded value (due to the effects of double rounding) but it is certainly good enough for our purposes.
Our first test will consist of creating random pairs of double precision floating point numbers where both are distributed according to a standard normal distribution. We will run all of our algorithms on each pair as well as computing . We summarize the percentage of times each algorithm differed from by exactly one ulp and exactly two ulps in table 1.
We immediately observe several things. First of all, the Julia1.1_hypot() code which is a very widely known approach to the problem performs rather horribly. The willingness to add rounding error to escape intermediate overflow/underflow problems destroys accuracy. Not only does it have the worst performance of all the approaches it is the only one that can produce two ulp errors (although this is admittedly rare). Second, the clib_hypot() and the Naive (Fused) approaches yield very comparable results. Both are basically trying to improve accuracy by computing with additional precision and so it is not surprising that they behave similarly even though the get the additional precision in different ways. Unfortunately, both suffer from the effects of iterated rounding. Finally, and most importantly, we see very clearly the superiority of the corrected approach (whether fused or unfused). When leveraging the fused multiply-add we get correctly rounded results every time. And the unfused version runs almost as well.
A further experiment will more completely display the differences in these algoritms. This time we track their accuracy over a range of relative scales of the operands. For this experiment we will use uniformly distributed operands. In particular, we will let and . In effect, our test pairs will be floating point numbers whose significands are uniformly distributed, but whose exponents differ by . We will run random pairs for each value and will compute the percentage of incorrectly rounded results for each of the six approaches. The results appear graphically in figure 1 as well as numerically in table 2. One can see the stunning difference in performance between the approaches. Note that in table 2 we observe that we are getting true correctly rounded results more than of the time for all relative scales for the unfused version of the corrected approach.
Julia code for all of the algorithms proposed in this paper appear in Appendix A.
6. Conclusions
Both the naive (unfused) and naive (fused) algorithms are clear improvements on the Julia1.1_hypot() code. Moreover, with its more accurate calculation of the naive (fused) algorithm gives results that are directly comparable to the results from clib_hypot() with far less work. In many ways, this demonstrates the limits of trying to do better by just computing more accurately. However, the hands down winner in all of this is the corrected code (both versions). It requires substantially less work than the clib_hypot() code but delivers vastly superior results.
Acknowledgments
The author would like to thank Prof. Lucas Wilcox of the Naval Postgraduate School for invaluable help with the subtleties of the Julia programming language, and also Claude-Pierre Jeannerod of INRIA for several very helpful comments on the analysis.
Appendix A: Julia Source Code for the Tested Algorithms
function MyHypot1(x::T,y::T) where T<:AbstractFloat # Naive (Unfused) #Return Inf if either or both imputs is Inf (Compliance with IEEE754) if isinf(x) || isinf(y) return convert(T,Inf) end
# Order the operands
ax,ay = abs(x), abs(y)
if ay > ax
ax,ay = ay,ax
end
# Widely varying operands
if ay < ax*sqrt(eps(T)/2) #Note: This also gets ay == 0
return ax
end
# Operands do not vary widely
scale = eps(sqrt(floatmin(T))) #Rescaling constant
if ax > sqrt(floatmax(T)/2)
ax = ax*scale
ay = ay*scale
scale = inv(scale)
elseif ay < sqrt(floatmin(T))
ax = ax/scale
ay = ay/scale
else
scale = one(scale)
end
sqrt(ax*ax+ay*ay)*scale
end
function MyHypot2(x::T,y::T) where T<:AbstractFloat # Naive (Fused) #Return Inf if either or both imputs is Inf (Compliance with IEEE754) if isinf(x) || isinf(y) return convert(T,Inf) end
# Order the operands
ax,ay = abs(x), abs(y)
if ay > ax
ax,ay = ay,ax
end
# Widely varying operands
if ay <= ax*sqrt(eps(T)/2) #Note: This also gets ay == 0
return ax
end
# Operands do not vary widely
scale = eps(sqrt(floatmin(T))) #Rescaling constant
if ax > sqrt(floatmax(T)/2)
ax = ax*scale
ay = ay*scale
scale = inv(scale)
elseif ay < sqrt(floatmin(T))
ax = ax/scale
ay = ay/scale
else
scale = one(scale)
end
sqrt(muladd(ax,ax,ay*ay))*scale
end
function MyHypot3(x::T,y::T) where T<:AbstractFloat # Corrected (Unfused) #Return Inf if either or both imputs is Inf (Compliance with IEEE754) if isinf(x) || isinf(y) return convert(T,Inf) end
# Order the operands
ax,ay = abs(x), abs(y)
if ay > ax
ax,ay = ay,ax
end
# Widely varying operands
if ay <= ax*sqrt(eps(T)/2) #Note: This also gets ay == 0
return ax
end
# Operands do not vary widely
scale = eps(sqrt(floatmin(T))) #Rescaling constant
if ax > sqrt(floatmax(T)/2)
ax = ax*scale
ay = ay*scale
scale = inv(scale)
elseif ay < sqrt(floatmin(T))
ax = ax/scale
ay = ay/scale
else
scale = one(scale)
end
h = sqrt(ax*ax+ay*ay)
# This is a well balanced single branch code
# delta = h-ax
#h -= ( delta*(2*(ax-ay)) + (2*delta-ay)*ay + delta*delta)/(2*h)
# End single branch
# This is a well balanced twopar branch code
if h <= 2*ay
delta = h-ay
h -= (ax*(2*delta - ax)+(delta-2*(ax-ay))*delta)/(2*h)
else
delta = h-ax
h -= ( 2*delta*(ax-2*ay) + (4*delta-ay)*ay + delta*delta)/(2*h)
end
# End two branch code
h*scale
end
function MyHypot4(x::T,y::T) where T<:AbstractFloat # Corrected (Fused) #Return Inf if either or both imputs is Inf (Compliance with IEEE754) if isinf(x) || isinf(y) return convert(T,Inf) end
# Order the operands
ax,ay = abs(x), abs(y)
if ay > ax
ax,ay = ay,ax
end
# Widely varying operands
if ay <= ax*sqrt(eps(T)/2) #Note: This also gets ay == 0
return ax
end
# Operands do not vary widely
scale = eps(sqrt(floatmin(T))) #Rescaling constant
if ax > sqrt(floatmax(T)/2)
ax = ax*scale
ay = ay*scale
scale = inv(scale)
elseif ay < sqrt(floatmin(T))
ax = ax/scale
ay = ay/scale
else
scale = one(scale)
end
h = sqrt(fma(ax,ax,ay*ay))
h_sq = h*h
ax_sq = ax*ax
x = fma(-ay,ay,h_sq-ax_sq) + fma(h,h,-h_sq) - fma(ax,ax,-ax_sq)
h-=x/(2*h)
h*scale
end
Appendix B: C Source Code for clib_hypot()
/* @(#)e_hypot.c 1.3 95/01/18 / /
- ====================================================
- Copyright (C) 1993 by Sun Microsystems, Inc. All rights reserved.
- Developed at SunSoft, a Sun Microsystems, Inc. business.
- Permission to use, copy, modify, and distribute this
- software is freely granted, provided that this notice
- is preserved.
- ==================================================== */
#include "cdefs-compat.h" //__FBSDID("");
/* __ieee754_hypot(x,y) *
- Method : ΨIf (assume round-to-nearest) z=xx+y*y *Ψhas error less than sqrt(2)/2 ulp, than *Ψsqrt(z) has error less than 1 ulp (exercise).
ΨSo, compute sqrt(xx+yy) with some care as Ψfollows to get the error below 1 ulp: * ΨAssume x>y>0; Ψ(if possible, set rounding to round-to-nearest) Ψ1. if x > 2y use ΨΨx1x1+(yy+(x2(x+x1))) for xx+yy Ψwhere x1 = x with lower 32 bits cleared, x2 = x-x1; else Ψ2. if x <= 2y use ΨΨt1y1+((x-y)(x-y)+(t1y2+t2y)) *Ψwhere t1 = 2x with lower 32 bits cleared, t2 = 2x-t1, *Ψy1= y with lower 32 bits chopped, y2 = y-y1. *ΨΨ *ΨNOTE: scaling may be necessary if some argument is too *Ψ large or too tiny *
- Special cases: *Ψhypot(x,y) is INF if x or y is +INF or -INF; else *Ψhypot(x,y) is NAN if x or y is NAN.
- Accuracy:
- Ψhypot(x,y) returns sqrt(x^2+y^2) with error less
- Ψthan 1 ulps (units in the last place) */
#include <float.h> #include <openlibm_math.h>
#include "math_private.h"
OLM_DLLEXPORT double __ieee754_hypot(double x, double y) { Ψdouble a,b,t1,t2,y1,y2,w; Ψint32_t j,k,ha,hb;
ΨGET_HIGH_WORD(ha,x); Ψha &= 0x7fffffff; ΨGET_HIGH_WORD(hb,y); Ψhb &= 0x7fffffff; Ψif(hb > ha) {a=y;b=x;j=ha; ha=hb;hb=j;} else {a=x;b=y;} Ψa = fabs(a); Ψb = fabs(b); Ψif((ha-hb)>0x3c00000) {return a+b;} /* x/y > 260 / Ψk=0; Ψif(ha > 0x5f300000) {Ψ/ a>2500 / Ψ if(ha >= 0x7ff00000) {Ψ/ Inf or NaN / Ψ u_int32_t low; Ψ / Use original arg order iff result is NaN; quieten sNaNs. / Ψ w = fabs(x+0.0)-fabs(y+0.0); Ψ GET_LOW_WORD(low,a); Ψ if(((ha&0xfffff)|low)==0) w = a; Ψ GET_LOW_WORD(low,b); Ψ if(((hb^0x7ff00000)|low)==0) w = b; Ψ return w; Ψ } Ψ / scale a and b by 2**-600 / Ψ ha -= 0x25800000; hb -= 0x25800000;Ψk += 600; Ψ SET_HIGH_WORD(a,ha); Ψ SET_HIGH_WORD(b,hb); Ψ} Ψif(hb < 0x20b00000) {Ψ/ b < 2**-500 / Ψ if(hb <= 0x000fffff) {Ψ/ subnormal b or 0 / Ψ u_int32_t low; ΨΨGET_LOW_WORD(low,b); ΨΨif((hb|low)==0) return a; ΨΨt1=0; ΨΨSET_HIGH_WORD(t1,0x7fd00000);Ψ/ t1=2^1022 / ΨΨb = t1; ΨΨa = t1; ΨΨk -= 1022; Ψ } else {ΨΨ/ scale a and b by 2^600 / Ψ ha += 0x25800000; Ψ/ a = 2^600 / ΨΨhb += 0x25800000;Ψ/ b = 2^600 / ΨΨk -= 600; ΨΨSET_HIGH_WORD(a,ha); ΨΨSET_HIGH_WORD(b,hb); Ψ } Ψ} / medium size a and b / Ψw = a-b; Ψif (w>b) { Ψ t1 = 0; Ψ SET_HIGH_WORD(t1,ha); Ψ t2 = a-t1; Ψ w = sqrt(t1t1-(b(-b)-t2(a+t1))); Ψ} else { Ψ a = a+a; Ψ y1 = 0; Ψ SET_HIGH_WORD(y1,hb); Ψ y2 = b - y1; Ψ t1 = 0; Ψ SET_HIGH_WORD(t1,ha+0x00100000); Ψ t2 = a - t1; Ψ w = sqrt(t1y1-(w(-w)-(t1y2+t2b))); Ψ} Ψif(k!=0) { Ψ u_int32_t high; Ψ t1 = 1.0; Ψ GET_HIGH_WORD(high,t1); Ψ SET_HIGH_WORD(t1,high+(k<<20)); Ψ return t1*w; Ψ} else return w; }
#if LDBL_MANT_DIG == 53 __weak_reference(hypot, hypotl); #endif
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] M. L. Overton , Numerical Computing with IEEE Floating Point Arithmetic , Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 2001.
- 2[2] C-P. Jeannerod and S. M. Rump , On Relative Errors of Floating-Point Operations: Optimal Bounds and Applications , Mathematics of Computation, American Mathematical Society, 2018, 87, pp. 803–819.
