AutoTSMM: An Auto-tuning Framework for Building High-Performance Tall-and-Skinny Matrix-Matrix Multiplication on CPUs
Chendi Li, Haipeng Jia, Hang Cao, Jianyu Yao, Boqian Shi, Chunyang, Xiang, Jinbo Sun, Pengqi Lu, Yunquan Zhang

TL;DR
AutoTSMM is an auto-tuning framework that optimizes tall-and-skinny matrix-matrix multiplication on CPUs, significantly improving performance over existing methods by selecting optimal kernels and execution plans.
Contribution
It introduces a novel auto-tuning framework specifically designed for efficient tall-and-skinny matrix multiplication on CPUs, addressing a gap in existing optimization techniques.
Findings
AutoTSMM achieves competitive performance with state-of-the-art methods.
It outperforms all conventional matrix-matrix multiplication implementations.
The framework effectively selects optimal kernels and execution plans for tall-and-skinny matrices.
Abstract
In recent years, general matrix-matrix multiplication with non-regular-shaped input matrices has been widely used in many applications like deep learning and has drawn more and more attention. However, conventional implementations are not suited for non-regular-shaped matrix-matrix multiplications, and few works focus on optimizing tall-and-skinny matrix-matrix multiplication on CPUs. This paper proposes an auto-tuning framework, AutoTSMM, to build high-performance tall-and-skinny matrix-matrix multiplication. AutoTSMM selects the optimal inner kernels in the install-time stage and generates an execution plan for the pre-pack tall-and-skinny matrix-matrix multiplication in the runtime stage. Experiments demonstrate that AutoTSMM achieves competitive performance comparing to state-of-the-art tall-and-skinny matrix-matrix multiplication. And, it outperforms all conventional matrix-matrix…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
