TrainDeeploy: Hardware-Accelerated Parameter-Efficient Fine-Tuning of Small Transformer Models at the Extreme Edge
Run Wang, Victor J.B. Jung, Philip Wiese, Francesco Conti, Alessio Burrello, Luca Benini

TL;DR
TrainDeeploy is a comprehensive framework enabling efficient on-device training of small Transformer and CNN models on ultra-low-power edge devices, supporting parameter-efficient methods like LoRA to reduce resource usage.
Contribution
It introduces the first complete on-device training pipeline for extreme-edge SoCs supporting both CNNs and Transformers with multiple training strategies.
Findings
Achieves up to 11 images/sec fine-tuning on a RISC-V SoC.
LoRA reduces memory usage by 23% and parameters by 15x.
Supports both CNN and Transformer models with efficient training.
Abstract
On-device tuning of deep neural networks enables long-term adaptation at the edge while preserving data privacy. However, the high computational and memory demands of backpropagation pose significant challenges for ultra-low-power, memory-constrained extreme-edge devices. These challenges are further amplified for attention-based models due to their architectural complexity and computational scale. We present TrainDeeploy, a framework that unifies efficient inference and on-device training on heterogeneous ultra-low-power System-on-Chips (SoCs). TrainDeeploy provides the first complete on-device training pipeline for extreme-edge SoCs supporting both Convolutional Neural Networks (CNNs) and Transformer models, together with multiple training strategies such as selective layer-wise fine-tuning and Low-Rank Adaptation (LoRA). On a RISC-V-based heterogeneous SoC, we demonstrate the first…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Memory and Neural Computing · Ferroelectric and Negative Capacitance Devices
