MobileVLA-R1: Reinforcing Vision-Language-Action for Mobile Robots
Ting Huang, Dongjian Li, Rui Yang, Zeyu Zhang, Zida Yang, Hao Tang

TL;DR
MobileVLA-R1 introduces a unified framework for grounding natural language instructions into continuous control for quadruped robots, combining structured reasoning and reinforcement learning to improve stability and generalization in real-world tasks.
Contribution
The paper presents MobileVLA-R1, a novel vision-language-action framework with a large-scale reasoning dataset and a two-stage training paradigm, advancing the integration of reasoning and control in mobile robots.
Findings
Achieved approximately 5% performance improvement over baselines.
Demonstrated robust real-world deployment on quadruped robots.
Enhanced reasoning consistency and control stability in complex environments.
Abstract
Grounding natural-language instructions into continuous control for quadruped robots remains a fundamental challenge in vision language action. Existing methods struggle to bridge high-level semantic reasoning and low-level actuation, leading to unstable grounding and weak generalization in the real world. To address these issues, we present MobileVLA-R1, a unified vision-language-action framework that enables explicit reasoning and continuous control for quadruped robots. We construct MobileVLA-CoT, a large-scale dataset of multi-granularity chain-of-thought (CoT) for embodied trajectories, providing structured reasoning supervision for alignment. Built upon this foundation, we introduce a two-stage training paradigm that combines supervised CoT alignment with GRPO reinforcement learning to enhance reasoning consistency, control stability, and long-horizon execution. Extensive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Reinforcement Learning in Robotics · Social Robot Interaction and HRI
