SCALE: Self-uncertainty Conditioned Adaptive Looking and Execution for Vision-Language-Action Models

Hyeonbeom Choi; Daechul Ahn; Youhan Lee; Taewook Kang; Seongwon Cho; Jonghyun Choi

arXiv:2602.04208·cs.RO·February 5, 2026

SCALE: Self-uncertainty Conditioned Adaptive Looking and Execution for Vision-Language-Action Models

Hyeonbeom Choi, Daechul Ahn, Youhan Lee, Taewook Kang, Seongwon Cho, Jonghyun Choi

PDF

Open Access

TL;DR

SCALE is an inference strategy for vision-language-action models that adaptively modulates perception and action based on self-uncertainty, improving robustness without extra training or multiple passes.

Contribution

It introduces SCALE, a novel single-pass, training-free method that jointly adjusts perception and action using self-uncertainty, addressing perceptual ambiguity in VLAs.

Findings

01

Outperforms existing TTS methods on benchmarks

02

Enhances robustness under perceptual ambiguity

03

Maintains single-pass efficiency

Abstract

Vision-Language-Action (VLA) models have emerged as a promising paradigm for general-purpose robotic control, with test-time scaling (TTS) gaining attention to enhance robustness beyond training. However, existing TTS methods for VLAs require additional training, verifiers, and multiple forward passes, making them impractical for deployment. Moreover, they intervene only at action decoding while keeping visual representations fixed-insufficient under perceptual ambiguity, where reconsidering how to perceive is as important as deciding what to do. To address these limitations, we propose SCALE, a simple inference strategy that jointly modulates visual perception and action based on 'self-uncertainty', inspired by uncertainty-driven exploration in Active Inference theory-requiring no additional training, no verifier, and only a single forward pass. SCALE broadens exploration in both…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Robot Manipulation and Learning · Domain Adaptation and Few-Shot Learning