3U-EdgeAI: Ultra-Low Memory Training, Ultra-Low BitwidthQuantization,   and Ultra-Low Latency Acceleration

Yao Chen; Cole Hawkins; Kaiqi Zhang; Zheng Zhang; Cong Hao

arXiv:2105.06250·cs.LG·May 14, 2021·1 cites

3U-EdgeAI: Ultra-Low Memory Training, Ultra-Low BitwidthQuantization, and Ultra-Low Latency Acceleration

Yao Chen, Cole Hawkins, Kaiqi Zhang, Zheng Zhang, Cong Hao

PDF

Open Access

TL;DR

This paper presents a comprehensive approach for edge AI by combining ultra-low memory training, ultra-low bitwidth quantization, and ultra-low latency acceleration to enable efficient on-device deep learning.

Contribution

It introduces a novel tensor-based training method, a state-of-the-art quantization technique, and a co-designed accelerator for edge AI applications.

Findings

01

Memory reduction during training by orders of magnitude

02

Achieved state-of-the-art accuracy with ultra-low bitwidth quantization

03

Designed an ultra-low latency DNN accelerator

Abstract

The deep neural network (DNN) based AI applications on the edge require both low-cost computing platforms and high-quality services. However, the limited memory, computing resources, and power budget of the edge devices constrain the effectiveness of the DNN algorithms. Developing edge-oriented AI algorithms and implementations (e.g., accelerators) is challenging. In this paper, we summarize our recent efforts for efficient on-device AI development from three aspects, including both training and inference. First, we present on-device training with ultra-low memory usage. We propose a novel rank-adaptive tensor-based tensorized neural network model, which offers orders-of-magnitude memory reduction during training. Second, we introduce an ultra-low bitwidth quantization method for DNN model compression, achieving the state-of-the-art accuracy under the same compression ratio. Third, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTensor decomposition and applications · Parallel Computing and Optimization Techniques · Sparse and Compressive Sensing Techniques