Wanted: Floating-Point Add Round-off Error instruction

Marat Dukhan; Richard Vuduc; Jason Riedy

arXiv:1603.00491·cs.NA·March 3, 2016·1 cites

Wanted: Floating-Point Add Round-off Error instruction

Marat Dukhan, Richard Vuduc, Jason Riedy

PDF

Open Access

TL;DR

This paper introduces a new floating-point instruction, FPADDRE, that computes round-off errors to enhance high-precision arithmetic performance on modern processors, significantly speeding up double-double computations.

Contribution

The paper proposes the FPADDRE instruction for computing round-off errors, enabling faster high-precision floating-point arithmetic on various processors.

Findings

01

Up to 55% reduction in latency for double-double addition.

02

Up to 103% increase in throughput for double-double addition.

03

Up to 2x speedup on high-precision benchmarks.

Abstract

We propose a new instruction (FPADDRE) that computes the round-off error in floating-point addition. We explain how this instruction benefits high-precision arithmetic operations in applications where double precision is not sufficient. Performance estimates on Intel Haswell, Intel Skylake, and AMD Steamroller processors, as well as Intel Knights Corner co-processor, demonstrate that such an instruction would improve the latency of double-double addition by up to 55% and increase double-double addition throughput by up to 103%, with smaller, but non-negligible benefits for double-double multiplication. The new instruction delivers up to 2x speedups on three benchmarks that use high-precision floating-point arithmetic: double-double matrix-matrix multiplication, compensated dot product, and polynomial evaluation via the compensated Horner scheme.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNumerical Methods and Algorithms · Low-power high-performance VLSI design · Parallel Computing and Optimization Techniques