GPU-accelerated generation of correctly-rounded elementary functions
Pierre Fortin (LIP6), Mourad Gouicem (LIP6), Stef Graillat (LIP6)

TL;DR
This paper presents a GPU-accelerated method for efficiently solving the Table Maker's Dilemma in double precision, significantly speeding up the generation of correctly-rounded elementary functions.
Contribution
It introduces a new parallel search algorithm and a hybrid CPU-GPU approach, achieving up to 53.4x speedup over sequential CPU execution.
Findings
Speedups up to 53.4x on GPU over CPU
More efficient parallel search algorithm
Faster generation of polynomial approximations
Abstract
The IEEE 754-2008 standard recommends the correct rounding of some elementary functions. This requires to solve the Table Maker's Dilemma which implies a huge amount of CPU computation time. We consider in this paper accelerating such computations, namely Lefe'vre algorithm on Graphics Processing Units (GPUs) which are massively parallel architectures with a partial SIMD execution (Single Instruction Multiple Data). We first propose an analysis of the Lef\`evre hard-to-round argument search using the concept of continued fractions. We then propose a new parallel search algorithm much more efficient on GPU thanks to its more regular control flow. We also present an efficient hybrid CPU-GPU deployment of the generation of the polynomial approximations required in Lef\`evre algorithm. In the end, we manage to obtain overall speedups up to 53.4x on one GPU over a sequential CPU execution,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
