Optimizing Irregular-Shaped Matrix-Matrix Multiplication on Multi-Core DSPs
Shangfei Yin, Qinglin Wang, Ruochen Hao, Tianyang Zhou and, Songzhu Mei, Jie Liu

TL;DR
This paper introduces ftIMM, an optimized implementation for irregular-shaped GEMMs on multi-core DSPs, achieving significant performance improvements over traditional methods and CPU libraries in high-performance computing applications.
Contribution
The paper presents ftIMM, a novel, auto-tuned implementation supporting irregular GEMMs on multi-core DSPs, with automatic micro-kernel generation and parallelization strategies.
Findings
Up to 7.2x performance improvement over traditional GEMM implementations.
Up to 3.1x higher efficiency than open-source CPU libraries.
Effective support for three types of irregular-shaped GEMMs.
Abstract
General Matrix Multiplication (GEMM) has a wide range of applications in scientific simulation and artificial intelligence. Although traditional libraries can achieve high performance on large regular-shaped GEMMs, they often behave not well on irregular-shaped GEMMs, which are often found in new algorithms and applications of high-performance computing (HPC). Due to energy efficiency constraints, low-power multi-core digital signal processors (DSPs) have become an alternative architecture in HPC systems. Targeting multi-core DSPs in FT-m7032, a prototype CPU-DSPs heterogeneous processor for HPC, an efficient implementation - ftIMM - for three types of irregular-shaped GEMMs is proposed. FtIMM supports automatic generation of assembly micro-kernels, two parallelization strategies, and auto-tuning of block sizes and parallelization strategies. The experiments show that ftIMM can get…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Advanced Data Storage Technologies · Interconnection Networks and Systems
