Beyond Low-rank Decomposition: A Shortcut Approach for Efficient On-Device Learning
Le-Trung Nguyen, Ael Quelennec, Van-Tam Nguyen, Enzo Tartaglione

TL;DR
This paper introduces a shortcut approach for on-device learning that significantly reduces activation memory and training computational costs, enhancing efficiency under resource constraints.
Contribution
It presents a novel shortcut method as an alternative to low-rank decomposition for reducing memory and computation in on-device training.
Findings
Activation memory reduced up to 120.09 times
Training FLOPs decreased up to 1.86 times
Effective on traditional benchmarks
Abstract
On-device learning has emerged as a promising direction for AI development, particularly because of its potential to reduce latency issues and mitigate privacy risks associated with device-server communication, while improving energy efficiency. Despite these advantages, significant memory and computational constraints still represent major challenges for its deployment. Drawing on previous studies on low-rank decomposition methods that address activation memory bottlenecks in backpropagation, we propose a novel shortcut approach as an alternative. Our analysis and experiments demonstrate that our method can reduce activation memory usage, even up to compared to vanilla training, while also reducing overall training FLOPs up to when evaluated on traditional benchmarks.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsIoT and Edge/Fog Computing · Advanced Neural Network Applications · Stochastic Gradient Optimization Techniques
