MG3MConv: Multi-Grained Matrix-Multiplication-Mapping Convolution Algorithm toward the SW26010 Processor
Zheng Wu

TL;DR
This paper introduces MG3MConv, a high-performance, adaptable convolution algorithm optimized for the SW26010 processor, achieving significant efficiency improvements over existing GPU-based solutions across various CNN models.
Contribution
Proposes MG3MConv, a novel multi-grained matrix-multiplication-mapping convolution algorithm tailored for SW26010, with architecture-specific optimizations and diversified task mapping schemes.
Findings
Achieves up to 84.78% hardware efficiency on SW26010.
Outperforms cuDNN in most convolution scenarios.
Reaches 67.04% efficiency on VGG, surpassing cuDNN and swDNN.
Abstract
As the core of artificial intelligence applications, the research of convolution has become a hot topic in high performance computing. With the rapid development of the emerging SW26010 processor in artificial intelligence, there is an urgent need for high-performance convolution algorithms on the processor. However, the current support of convolution on SW26010 is still rudimentary. The only studies provide sufficient runtime peak performance but lack the adaptability to various convolution scenes. To perfect convolution algorithms on SW26010, we propose a multi-grained matrix-multiplication-mapping convolution algorithm called MG3MConv, which targets the architectural features of SW26010. MG3MConv supports diversified mapping schemes of convolution tasks based on the concept of the thread block proposed in this paper. All the architecture-oriented optimization methods are elaborately…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Brain Tumor Detection and Classification
