Programming Bare-Metal Accelerators with Heterogeneous Threading Models: A Case Study of Matrix-3000
Jianbin Fang, Peng Zhang, Chun Huang, Tao Tang, Kai Lu, Ruibo Wang,, Zheng Wang

TL;DR
This paper presents the development of a programming model, compiler, and libraries for the Matrix-3000 system, enabling effective programming of its complex heterogeneous accelerators for exascale supercomputing.
Contribution
It introduces a new software stack with low-level and high-level programming interfaces tailored for Matrix-3000's complex architecture, including an OpenCL compiler.
Findings
Successfully deployed on an exascale prototype system.
Provided native support for bare-metal accelerators.
Facilitated programming with a new software stack.
Abstract
As the hardware industry moves towards using specialized heterogeneous many-cores to avoid the effects of the power wall, software developers are finding it hard to deal with the complexity of these systems. This article shares our experience when developing a programming model and its supporting compiler and libraries for Matrix-3000, which is designed for next-generation exascale supercomputers but has a complex memory hierarchy and processor organization. To assist its software development, we developed a software stack from scratch that includes a low-level programming interface and a high-level OpenCL compiler. Our low-level programming model offers native programming support for using the bare-metal accelerators of Matrix-3000, while the high-level model allows programmers to use the OpenCL programming standard. We detail our design choices and highlight the lessons learned from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Embedded Systems Design Techniques · Distributed and Parallel Computing Systems
