Compiler-Level Matrix Multiplication Optimization for Deep Learning
Huaqing Zhang, Xiaolin Cheng, Hui Zang, Dae Hoon Park

TL;DR
This paper introduces two novel compiler algorithms for optimizing GEMM operations in deep learning, achieving significant performance improvements and faster search times compared to existing methods.
Contribution
It presents two new algorithms based on heuristic search and reinforcement learning for GEMM optimization, outperforming current state-of-the-art techniques.
Findings
Achieved 24% and 40% savings in GEMM computation time.
Explored only 0.1% of the search space.
Demonstrated potential for broader operator-level optimizations.
Abstract
An important linear algebra routine, GEneral Matrix Multiplication (GEMM), is a fundamental operator in deep learning. Compilers need to translate these routines into low-level code optimized for specific hardware. Compiler-level optimization of GEMM has significant performance impact on training and executing deep learning models. However, most deep learning frameworks rely on hardware-specific operator libraries in which GEMM optimization has been mostly achieved by manual tuning, which restricts the performance on different target hardware. In this paper, we propose two novel algorithms for GEMM optimization based on the TVM framework, a lightweight Greedy Best First Search (G-BFS) method based on heuristic search, and a Neighborhood Actor Advantage Critic (N-A2C) method based on reinforcement learning. Experimental results show significant performance improvement of the proposed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Numerical Methods and Algorithms · Low-power high-performance VLSI design
