AC^2-VLA: Action-Context-Aware Adaptive Computation in Vision-Language-Action Models for Efficient Robotic Manipulation

Wenda Yu; Tianshi Wang; Fengling Li; Jingjing Li; Lei Zhu

arXiv:2601.19634·cs.RO·January 28, 2026

AC^2-VLA: Action-Context-Aware Adaptive Computation in Vision-Language-Action Models for Efficient Robotic Manipulation

Wenda Yu, Tianshi Wang, Fengling Li, Jingjing Li, Lei Zhu

PDF

Open Access

TL;DR

AC^2-VLA introduces an action-context-aware adaptive computation framework that significantly reduces inference latency and computational cost in vision-language-action models for robotic manipulation, maintaining high task success rates.

Contribution

The paper proposes a novel adaptive computation framework that conditions on action context, enabling efficient inference in VLA models for robotics, with a new training scheme for structured sparsification.

Findings

01

Achieves up to 1.79× speedup in inference.

02

Reduces FLOPs to 29.4% of dense baseline.

03

Maintains comparable task success rates.

Abstract

Vision-Language-Action (VLA) models have demonstrated strong performance in robotic manipulation, yet their closed-loop deployment is hindered by the high latency and compute cost of repeatedly running large vision-language backbones at every timestep. We observe that VLA inference exhibits structured redundancies across temporal, spatial, and depth dimensions, and that most existing efficiency methods ignore action context, despite its central role in embodied tasks. To address this gap, we propose Action-Context-aware Adaptive Computation for VLA models (AC^2-VLA), a unified framework that conditions computation on current visual observations, language instructions, and previous action states. Based on this action-centric context, AC^2-VLA adaptively performs cognition reuse across timesteps, token pruning, and selective execution of model components within a unified mechanism. To…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Robot Manipulation and Learning · Reinforcement Learning in Robotics