SmartMem: Layout Transformation Elimination and Adaptation for Efficient   DNN Execution on Mobile

Wei Niu; Md Musfiqur Rahman Sanim; Zhihao Shu; Jiexiong Guan; Xipeng; Shen; Miao Yin; Gagan Agrawal; Bin Ren

arXiv:2404.13528·cs.LG·April 23, 2024

SmartMem: Layout Transformation Elimination and Adaptation for Efficient DNN Execution on Mobile

Wei Niu, Md Musfiqur Rahman Sanim, Zhihao Shu, Jiexiong Guan, Xipeng, Shen, Miao Yin, Gagan Agrawal, Bin Ren

PDF

TL;DR

SmartMem is a framework that reduces layout transformations in DNNs, especially transformers, enabling faster inference on mobile devices by optimizing tensor layouts and memory usage.

Contribution

It introduces a comprehensive method for eliminating layout transformations and developing efficient memory layouts, significantly improving DNN inference speed on mobile devices.

Findings

01

Outperforms 5 state-of-the-art frameworks on 18 neural networks.

02

Achieves an average speedup of 2.8× over DNNFusion.

03

Attains 6.9× and 7.9× speedups over TVM and MNN, respectively.

Abstract

This work is motivated by recent developments in Deep Neural Networks, particularly the Transformer architectures underlying applications such as ChatGPT, and the need for performing inference on mobile devices. Focusing on emerging transformers (specifically the ones with computationally efficient Swin-like architectures) and large models (e.g., Stable Diffusion and LLMs) based on transformers, we observe that layout transformations between the computational operators cause a significant slowdown in these applications. This paper presents SmartMem, a comprehensive framework for eliminating most layout transformations, with the idea that multiple operators can use the same tensor layout through careful choice of layout and implementation of operations. Our approach is based on classifying the operators into four groups, and considering combinations of producer-consumer edges between the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsAttention Is All You Need · Sparse Evolutionary Training · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Absolute Position Encodings · Dropout · Dense Connections · Label Smoothing · Residual Connection · Softmax