VLA-ATTC: Adaptive Test-Time Compute for VLA Models with Relative Action Critic Model
Wenhao Li, Xiu Su, Dan Niu, Yichao Cao, Hongyan Xu, Zhe Qu, Lei Fan, Shan You, Chang Xu

TL;DR
VLA-ATTC enhances vision-language-action models with adaptive test-time computation and a novel relative action critic, significantly reducing failure rates in complex embodied manipulation tasks.
Contribution
It introduces an uncertainty-based cognitive clutch and a relative action critic for improved decision-making in VLA models, with automated data curation and efficiency strategies.
Findings
Reduced failure rate of SOTA model PI0.5 by over 50% on LIBERO-LONG.
Introduced a relative action critic that simplifies learning objectives.
Developed an automated data pipeline for preference pair generation.
Abstract
Vision-Language-Action (VLA) models have demonstrated remarkable capabilities and generalization in embodied manipulation. However, their decision-making relies on a fast, instinctive process that lacks deliberation. This strategy often leads to suboptimal or catastrophic actions when facing complex or ambiguous scenarios that require greater consideration. In this paper, we introduce \textbf{VLA-ATTC}, a framework that endows VLA models with adaptive test-time compute (TTC). VLA-ATTC employs an uncertainty-based ``cognitive clutch'' to dynamically transition from reflexive execution to a TTC deliberation phase when necessary. During TTC phase, a novel \textbf{Relative Action Critic} (RAC) model identifies the optimal action from generated candidates via pairwise comparisons. This relative mechanism replaces unstable absolute value estimation, significantly simplifying the learning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
