MACO: Exploring GEMM Acceleration on a Loosely-Coupled Multi-core Processor
Bingcai Sui, Junzhong Shen, Caixia Sun, Junhui Wang, Zhong Zheng, and, Wei Guo

TL;DR
MACO is a new multi-core architecture optimized for GEMM workloads, combining a tile-based ISA and hardware techniques to achieve high efficiency and scalability for large-scale matrix computations and deep learning tasks.
Contribution
The paper introduces MACO, a flexible, scalable multi-core architecture with novel hardware and ISA optimizations specifically designed for GEMM and deep learning workloads.
Findings
Achieves 90% efficiency across multiple cores
Reaches up to 1.1 TFLOPS in deep neural network evaluations
Demonstrates good scalability and adaptability for large-scale GEMM workloads
Abstract
General-purpose processor vendors have integrated customized accelerator in their products due to the widespread use of General Matrix-Matrix Multiplication (GEMM) kernels. However, it remains a challenge to further improve the flexibilityand scalability of these GEMM-enhanced processors to cater to the emerging large-scale GEMM workloads. In this paper we propose MACO, a novel loosely-coupled multi-core general-purpose architecture optimized for GEMM-related applications. To enhance the programmability and flexibility of MACO, the paper introduces a tile-based instruction set architecture. Additionally, the paper presents techniques such as hardware-assisted data prefetching and locking, and predictive address translation to further enhance the computational efficiency of MACO for GEMM workloads. The experimental results demonstrate that MACO exhibits good scalability, achieving an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Embedded Systems Design Techniques · Distributed and Parallel Computing Systems
