TL;DR
DreamAvoid is a novel framework that enables vision-language-action models to anticipate and avoid failures during critical phases of manipulation tasks by using test-time dreaming and boundary learning.
Contribution
It introduces a test-time dreaming approach with a Dream Trigger, Action Proposer, and Dream Evaluator to improve failure avoidance in VLA models.
Findings
Significantly reduces failure rates in real-world manipulation tasks.
Improves success rates on simulation benchmarks.
Effectively anticipates failures through boundary-aware dreaming.
Abstract
Vision-Language-Action (VLA) models are often brittle in fine-grained manipulation, where minor action errors during the critical phases can rapidly escalate into irrecoverable failures. Since existing VLA models rely predominantly on successful demonstrations for training, they lack an explicit awareness of failure during these critical phases. To address this, we propose DreamAvoid, a critical-phase test-time dreaming framework that enables VLA models to anticipate and avoid failures. We also introduce an autonomous boundary learning paradigm to refine the system's understanding of the subtle boundary between success and failure. Specifically, we (1) utilize a Dream Trigger to determine whether the execution has entered a critical phase, (2) sample multiple candidate action chunks from the VLA via an Action Proposer, and (3) employ a Dream Evaluator, jointly trained on mixed data…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
