Accelerating BLAS and LAPACK via Efficient Floating Point Architecture Design
Farhad Merchant, Anupam Chattopadhyay, Soumyendu Raha, S K Nandy,, Ranjani Narayan

TL;DR
This paper analyzes and optimizes the floating point unit micro-architecture to accelerate BLAS and LAPACK, achieving significant performance improvements in Gflops/W and Gflops/mm^2.
Contribution
It introduces a theoretical framework for pipeline depth optimization of floating point units and presents a simple PE design that outperforms existing implementations.
Findings
PE design improves performance by up to 2.1x in Gflops/mm^2
Theoretical analysis guides optimal pipeline depth selection
Performance gains achieved in Gflops/W and Gflops/mm^2
Abstract
Basic Linear Algebra Subprograms (BLAS) and Linear Algebra Package (LAPACK) form basic building blocks for several High Performance Computing (HPC) applications and hence dictate performance of the HPC applications. Performance in such tuned packages is attained through tuning of several algorithmic and architectural parameters such as number of parallel operations in the Directed Acyclic Graph of the BLAS/LAPACK routines, sizes of the memories in the memory hierarchy of the underlying platform, bandwidth of the memory, and structure of the compute resources in the underlying platform. In this paper, we closely investigate the impact of the Floating Point Unit (FPU) micro-architecture for performance tuning of BLAS and LAPACK. We present theoretical analysis for pipeline depth of different floating point operations like multiplier, adder, square root, and divider followed by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Numerical Methods and Algorithms · Advanced Data Storage Technologies
