EfficientVLA: Training-Free Acceleration and Compression for Vision-Language-Action Models

Yantai Yang; Yuhao Wang; Zichen Wen; Luo Zhongwei; Chang Zou; Zhipeng Zhang; Chuan Wen; Linfeng Zhang

arXiv:2506.10100·cs.CV·June 13, 2025

EfficientVLA: Training-Free Acceleration and Compression for Vision-Language-Action Models

Yantai Yang, Yuhao Wang, Zichen Wen, Luo Zhongwei, Chang Zou, Zhipeng Zhang, Chuan Wen, Linfeng Zhang

PDF

Open Access

TL;DR

EfficientVLA introduces a holistic, training-free framework that reduces computational costs of vision-language-action models by pruning, optimizing visual tokens, and caching features, enabling faster inference with minimal accuracy loss.

Contribution

It presents a novel, comprehensive approach to accelerate VLA models without retraining, addressing multiple bottlenecks simultaneously.

Findings

01

Achieves 1.93X inference speedup on CogACT

02

Reduces FLOPs to 28.9% of original

03

Only 0.6% success rate drop in benchmark

Abstract

Vision-Language-Action (VLA) models, particularly diffusion-based architectures, demonstrate transformative potential for embodied intelligence but are severely hampered by high computational and memory demands stemming from extensive inherent and inference-time redundancies. While existing acceleration efforts often target isolated inefficiencies, such piecemeal solutions typically fail to holistically address the varied computational and memory bottlenecks across the entire VLA pipeline, thereby limiting practical deployability. We introduce EfficientVLA, a structured and training-free inference acceleration framework that systematically eliminates these barriers by cohesively exploiting multifaceted redundancies. EfficientVLA synergistically integrates three targeted strategies: (1) pruning of functionally inconsequential layers from the language module, guided by an analysis of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Generative Adversarial Networks and Image Synthesis

MethodsPruning · Sparse Evolutionary Training