RationalVLA: A Rational Vision-Language-Action Model with Dual System
Wenxuan Song, Jiayi Chen, Wenxue Li, Xu He, Han Zhao, Can Cui, Pengxiang Ding Shiyan Su, Feilong Tang, Xuelian Cheng, Donglin Wang, Zongyuan Ge, Xinhu Zheng, Zhe Liu, Hesheng Wang, Haoang Li

TL;DR
RationalVLA is a dual-system model that improves robotic manipulation by understanding, reasoning, and rejecting infeasible natural language instructions, demonstrated on a new challenging benchmark with diverse defective commands.
Contribution
The paper introduces RAMA, a new benchmark with over 14,000 samples of defective instructions, and proposes RationalVLA, a dual vision-language-action model that effectively handles ambiguous and infeasible commands.
Findings
RationalVLA achieves 14.5% higher success rate on RAMA.
It effectively rejects infeasible instructions.
It maintains competitive performance on standard tasks.
Abstract
A fundamental requirement for real-world robotic deployment is the ability to understand and respond to natural language instructions. Existing language-conditioned manipulation tasks typically assume that instructions are perfectly aligned with the environment. This assumption limits robustness and generalization in realistic scenarios where instructions may be ambiguous, irrelevant, or infeasible. To address this problem, we introduce RAtional MAnipulation (RAMA), a new benchmark that challenges models with both unseen executable instructions and defective ones that should be rejected. In RAMA, we construct a dataset with over 14,000 samples, including diverse defective instructions spanning six dimensions: visual, physical, semantic, motion, safety, and out-of-context. We further propose the Rational Vision-Language-Action model (RationalVLA). It is a dual system for robotic arms…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies · Cognitive Science and Mapping · Multi-Agent Systems and Negotiation
