ReconVLA: An Uncertainty-Guided and Failure-Aware Vision-Language-Action Framework for Robotic Control
Lingling Chen, Zongyao Lyu, William J. Beksi

TL;DR
ReconVLA introduces a conformal prediction-based framework for vision-language-action models, providing calibrated uncertainty estimates and failure detection to enhance robotic control reliability in real-world scenarios.
Contribution
It applies conformal prediction to VLA models for the first time, enabling uncertainty-guided control and failure detection without retraining the original models.
Findings
Conformalized action predictions improve failure anticipation.
The method reduces catastrophic errors in robotic tasks.
Uncertainty estimates correlate with task success.
Abstract
Vision-language-action (VLA) models have emerged as generalist robotic controllers capable of mapping visual observations and natural language instructions to continuous action sequences. However, VLAs provide no calibrated measure of confidence in their action predictions, thus limiting their reliability in real-world settings where uncertainty and failures must be anticipated. To address this problem we introduce ReconVLA, a reliable conformal model that produces uncertainty-guided and failure-aware control signals. Concretely, our approach applies conformal prediction directly to the action token outputs of pretrained VLA policies, yielding calibrated uncertainty estimates that correlate with execution quality and task success. Furthermore, we extend conformal prediction to the robot state space to detect outliers or unsafe states before failures occur, providing a simple yet…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
