Fast Arbitrary Precision Floating Point on FPGA

Johannes de Fine Licht; Christopher A. Pattison; Alexandros Nikolaos; Ziogas; David Simmons-Duffin; Torsten Hoefler

arXiv:2204.06256·cs.DC·April 14, 2022·1 cites

Fast Arbitrary Precision Floating Point on FPGA

Johannes de Fine Licht, Christopher A. Pattison, Alexandros Nikolaos, Ziogas, David Simmons-Duffin, Torsten Hoefler

PDF

Open Access 1 Repo

TL;DR

This paper presents a FPGA-based implementation of arbitrary precision floating point multiplication using deep pipelining and Karatsuba decomposition, achieving significant speedups over CPU implementations and enabling efficient acceleration of numerical codes.

Contribution

It introduces a novel FPGA architecture for APFP multiplication with recursive Karatsuba decomposition, providing high throughput and a flexible, open-source high-level software interface.

Findings

01

9.8x speedup for 512-bit multiplication on FPGA

02

5.3x speedup for 1024-bit multiplication on FPGA

03

10x speedup in matrix multiplication over CPU cluster

Abstract

Numerical codes that require arbitrary precision floating point (APFP) numbers for their core computation are dominated by elementary arithmetic operations due to the super-linear complexity of multiplication in the number of mantissa bits. APFP computations on conventional software-based architectures are made exceedingly expensive by the lack of native hardware support, requiring elementary operations to be emulated using instructions operating on machine-word-sized blocks. In this work, we show how APFP multiplication on compile-time fixed-precision operands can be implemented as deep FPGA pipelines with a recursively defined Karatsuba decomposition on top of native DSP multiplication. When comparing our design implemented on an Alveo U250 accelerator to a dual-socket 36-core Xeon node running the GNU Multiple Precision Floating-Point Reliable (MPFR) library, we achieve a 9.8x…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

spcl/apfp
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNumerical Methods and Algorithms · Parallel Computing and Optimization Techniques · Low-power high-performance VLSI design