Sentinel-VLA: A Metacognitive VLA Model with Active Status Monitoring for Dynamic Reasoning and Error Recovery
Wenhao Li, Xiu Su, Yichao Cao, Hongyan Xu, Xiaobo Xia, Shan You, Yi Chen, Chang Xu

TL;DR
Sentinel-VLA is a novel vision-language-action model with active status monitoring and self-evolving learning, significantly improving robustness and success rates in embodied manipulation tasks.
Contribution
The paper introduces Sentinel-VLA with an active sentinel module, a self-evolving continual learning algorithm, and orthogonal parameter updates, advancing VLA model robustness and adaptability.
Findings
Boosts task success rate by over 30% compared to SOTA.
Automatically generates and annotates training data for 44 tasks.
Demonstrates effective self-evolving continual learning in real-world experiments.
Abstract
Vision-language-action (VLA) models have advanced the field of embodied manipulation by harnessing broad world knowledge and strong generalization. However, current VLA models still face several key challenges, including limited reasoning capability, lack of status monitoring, and difficulty in self-correction. In this paper, we introduce \textbf{Sentinel-VLA}, a metacognitive VLA model equipped with an active ``sentinel'' module to monitor real-time execution status. Only when necessary, such as during initial planning or upon detecting an error, the model triggers a dynamic reasoning or formulate error recovery solutions. This on-demand reasoning mechanism ensures robust decision-making while minimizing computational overhead. Notably, all training data (spanning 44 tasks and over 2.6 million transitions) is automatically generated and annotated through our designed pipeline. We also…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
