3U-EdgeAI: Ultra-Low Memory Training, Ultra-Low BitwidthQuantization, and Ultra-Low Latency Acceleration
Yao Chen, Cole Hawkins, Kaiqi Zhang, Zheng Zhang, Cong Hao

TL;DR
This paper presents a comprehensive approach for edge AI by combining ultra-low memory training, ultra-low bitwidth quantization, and ultra-low latency acceleration to enable efficient on-device deep learning.
Contribution
It introduces a novel tensor-based training method, a state-of-the-art quantization technique, and a co-designed accelerator for edge AI applications.
Findings
Memory reduction during training by orders of magnitude
Achieved state-of-the-art accuracy with ultra-low bitwidth quantization
Designed an ultra-low latency DNN accelerator
Abstract
The deep neural network (DNN) based AI applications on the edge require both low-cost computing platforms and high-quality services. However, the limited memory, computing resources, and power budget of the edge devices constrain the effectiveness of the DNN algorithms. Developing edge-oriented AI algorithms and implementations (e.g., accelerators) is challenging. In this paper, we summarize our recent efforts for efficient on-device AI development from three aspects, including both training and inference. First, we present on-device training with ultra-low memory usage. We propose a novel rank-adaptive tensor-based tensorized neural network model, which offers orders-of-magnitude memory reduction during training. Second, we introduce an ultra-low bitwidth quantization method for DNN model compression, achieving the state-of-the-art accuracy under the same compression ratio. Third, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTensor decomposition and applications · Parallel Computing and Optimization Techniques · Sparse and Compressive Sensing Techniques
