RLRC: Reinforcement Learning-based Recovery for Compressed Vision-Language-Action Models
Yuxuan Chen, Xiao Li

TL;DR
This paper introduces RLRC, a three-stage recovery method for compressed vision-language-action models that significantly reduces memory and latency while maintaining or improving task success rates, enabling efficient on-device deployment.
Contribution
The paper presents RLRC, a novel three-stage recovery approach for compressed VLAs, combining structured pruning, reinforcement learning, and quantization, with extensive empirical validation.
Findings
RLRC achieves up to 8x memory reduction.
RLRC improves inference throughput by 2.3x.
RLRC outperforms existing compression methods.
Abstract
Vision-Language-Action models (VLA) have demonstrated remarkable capabilities and promising potential in solving complex robotic manipulation tasks. However, their substantial parameter sizes and high inference latency pose significant challenges for real-world deployment, particularly on resource-constrained robotic platforms. To address this issue, we begin by conducting an extensive empirical study to explore the effectiveness of model compression techniques when applied to VLAs. Building on the insights gained from these preliminary experiments, we propose RLRC, a three-stage recovery method for compressed VLAs, including structured pruning, performance recovery based on SFT and RL, and further quantization. RLRC achieves up to an 8x reduction in memory usage and a 2.3x improvement in inference throughput, while maintaining or even surpassing the original VLA's task success rate.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsShrink and Fine-Tune
