swTVM: Towards Optimized Tensor Code Generation for Deep Learning on   Sunway Many-Core Processor

Mingzhen Li; Changxi Liu; Jianjin Liao; Xuegui Zheng; Hailong Yang,; Rujun Sun; Jun Xu; Lin Gan; Guangwen Yang; Zhongzhi Luan; Depei Qian

arXiv:1904.07404·cs.LG·July 12, 2022·1 cites

swTVM: Towards Optimized Tensor Code Generation for Deep Learning on Sunway Many-Core Processor

Mingzhen Li, Changxi Liu, Jianjin Liao, Xuegui Zheng, Hailong Yang,, Rujun Sun, Jun Xu, Lin Gan, Guangwen Yang, Zhongzhi Luan, Depei Qian

PDF

Open Access

TL;DR

swTVM extends the TVM compiler to support Sunway many-core processors, optimizing deep learning code generation by leveraging architecture features, resulting in significant performance improvements over existing frameworks.

Contribution

This work introduces swTVM, the first compiler extension for Sunway processors that enables efficient, architecture-aware deep learning code generation with cross-compilation support.

Findings

01

Achieves 1.79x speedup over state-of-the-art frameworks on Sunway.

02

Supports ahead-of-time compilation tailored for Sunway architecture.

03

Demonstrates effective utilization of Sunway's core groups, DMA, and local memory.

Abstract

The flourish of deep learning frameworks and hardware platforms has been demanding an efficient compiler that can shield the diversity in both software and hardware in order to provide application portability. Among the existing deep learning compilers, TVM is well known for its efficiency in code generation and optimization across diverse hardware devices. In the meanwhile, the Sunway many-core processor renders itself as a competitive candidate for its attractive computational power in both scientific computing and deep learning workloads. This paper combines the trends in these two directions. Specifically, we propose swTVM that extends the original TVM to support ahead-of-time compilation for architecture requiring cross-compilation such as Sunway. In addition, we leverage the architecture features during the compilation such as core group for massive parallelism, DMA for high…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Advanced Data Storage Technologies · Network Packet Processing and Optimization

Methods1x1 Convolution · Local Response Normalization · Grouped Convolution · *Communicated@Fast*How Do I Communicate to Expedia? · Dropout · Dense Connections · Max Pooling · Softmax · How do I speak to a person at Expedia?-/+/ · Convolution