Leveraging Hardware-Aware Computation in Mixed-Precision Matrix Multiply: A Tile-Centric Approach
Qiao Zhang, Rabab Alomairy, Dali Wang, Zhuowei Gu, Qinglei Cao

TL;DR
This paper presents a hardware-aware, tile-centric mixed-precision GEMM framework that adapts precision at fine granularity, significantly improving performance and energy efficiency across diverse high-performance computing architectures.
Contribution
It introduces an adaptive mixed-precision GEMM approach supported by the PaRSEC runtime, enabling efficient workload balancing on multiple architectures.
Findings
Scales well on ARM, Nvidia, and AMD architectures.
Improves performance and energy efficiency.
Supports fine-grained mixed-precision computation.
Abstract
General Matrix Multiplication (GEMM) is a critical operation underpinning a wide range of applications in high-performance computing (HPC) and artificial intelligence (AI). The emergence of hardware optimized for low-precision arithmetic necessitates a reevaluation of numerical algorithms to leverage mixed-precision computations, achieving improved performance and energy efficiency. This research introduces an adaptive mixed-precision GEMM framework that supports different precision formats at fine-grained tile/block levels. We utilize the PaRSEC runtime system to balance workloads across various architectures. The performance scales well on ARM CPU-based Fugaku supercomputer, Nvidia GPU-based A100 DGX, and AMD GPU-based Frontier supercomputer. This research aims to enhance computational efficiency and accuracy by bridging algorithmic advancements and hardware innovations, driving…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Interconnection Networks and Systems · Cellular Automata and Applications
