Dancing along Battery: Enabling Transformer with Run-time Reconfigurability on Mobile Devices
Yuhong Song, Weiwen Jiang, Bingbing Li, Panjie Qi, Qingfeng Zhuge,, Edwin Hsing-Mean Sha, Sakyasingha Dasgupta, Yiyu Shi, Caiwen Ding

TL;DR
This paper introduces RT3, a framework that combines hardware and software reconfiguration techniques to enable efficient, energy-saving Transformer model execution on mobile devices with real-time switching capabilities.
Contribution
RT3 is the first to integrate hybrid block-structured and pattern pruning with run-time reconfigurability for Transformer models on mobile devices.
Findings
RT3 achieves over 4x battery life extension with less than 1% accuracy loss.
RT3 switches pattern sets within 45ms to meet real-time constraints.
The framework effectively supports dynamic hardware conditions on resource-constrained devices.
Abstract
A pruning-based AutoML framework for run-time reconfigurability, namely RT3, is proposed in this work. This enables Transformer-based large Natural Language Processing (NLP) models to be efficiently executed on resource-constrained mobile devices and reconfigured (i.e., switching models for dynamic hardware conditions) at run-time. Such reconfigurability is the key to save energy for battery-powered mobile devices, which widely use dynamic voltage and frequency scaling (DVFS) technique for hardware reconfiguration to prolong battery life. In this work, we creatively explore a hybrid block-structured pruning (BP) and pattern pruning (PP) for Transformer-based models and first attempt to combine hardware and software reconfiguration to maximally save energy for battery-powered mobile devices. Specifically, RT3 integrates two-level optimizations: First, it utilizes an efficient BP as the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGreen IT and Sustainability · Advanced Software Engineering Methodologies · Machine Learning in Materials Science
MethodsPruning · Linear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Label Smoothing · Byte Pair Encoding · Residual Connection · Weight Decay · Refunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention
