MobileVLA-R1: Reinforcing Vision-Language-Action for Mobile Robots

Ting Huang; Dongjian Li; Rui Yang; Zeyu Zhang; Zida Yang; Hao Tang

arXiv:2511.17889·cs.RO·November 25, 2025

MobileVLA-R1: Reinforcing Vision-Language-Action for Mobile Robots

Ting Huang, Dongjian Li, Rui Yang, Zeyu Zhang, Zida Yang, Hao Tang

PDF

Open Access 1 Datasets

TL;DR

MobileVLA-R1 introduces a unified framework for grounding natural language instructions into continuous control for quadruped robots, combining structured reasoning and reinforcement learning to improve stability and generalization in real-world tasks.

Contribution

The paper presents MobileVLA-R1, a novel vision-language-action framework with a large-scale reasoning dataset and a two-stage training paradigm, advancing the integration of reasoning and control in mobile robots.

Findings

01

Achieved approximately 5% performance improvement over baselines.

02

Demonstrated robust real-world deployment on quadruped robots.

03

Enhanced reasoning consistency and control stability in complex environments.

Abstract

Grounding natural-language instructions into continuous control for quadruped robots remains a fundamental challenge in vision language action. Existing methods struggle to bridge high-level semantic reasoning and low-level actuation, leading to unstable grounding and weak generalization in the real world. To address these issues, we present MobileVLA-R1, a unified vision-language-action framework that enables explicit reasoning and continuous control for quadruped robots. We construct MobileVLA-CoT, a large-scale dataset of multi-granularity chain-of-thought (CoT) for embodied trajectories, providing structured reasoning supervision for alignment. Built upon this foundation, we introduce a two-stage training paradigm that combines supervised CoT alignment with GRPO reinforcement learning to enhance reasoning consistency, control stability, and long-horizon execution. Extensive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

AIGeeksGroup/MobileVLA-CoT
dataset· 144 dl
144 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Reinforcement Learning in Robotics · Social Robot Interaction and HRI