Least Squares on GPUs in Multiple Double Precision
Jan Verschelde

TL;DR
This paper demonstrates that high-performance linear algebra computations in multiple double precision on GPUs can achieve teraflop performance, with efficiency gains due to high CGMA ratios and optimized code execution.
Contribution
It introduces a GPU-accelerated approach using CAMPARY for multiple double precision linear algebra, achieving teraflop performance at large matrix sizes and analyzing cost overhead factors.
Findings
Teraflop performance achieved on 1024x1024 matrices in double double precision.
Back substitution reaches 1 TFLOP at large dimensions in quad double precision.
Cost overhead factors are lower than predicted, benefiting from high CGMA ratios.
Abstract
This paper describes the application of the code generated by the CAMPARY software to accelerate the solving of linear systems in the least squares sense on Graphics Processing Units (GPUs), in double double, quad double, and octo double precision. The goal is to use accelerators to offset the cost overhead caused by multiple double precision arithmetic. For the blocked Householder QR and the back substitution, of interest are those dimensions at which teraflop performance is attained. The other interesting question is the cost overhead factor that appears each time the precision is doubled. Experimental results are reported on five different NVIDIA GPUs, with a particular focus on the P100 and the V100, both capable of teraflop performance. Thanks to the high Compute to Global Memory Access (CGMA) ratios of multiple double arithmetic, teraflop performance is already attained running…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMatrix Theory and Algorithms · Numerical Methods and Algorithms · Polynomial and algebraic computation
