Beyond Imitation: Reinforcement Learning-Based Sim-Real Co-Training for VLA Models

Liangzhi Shi; Shuaihang Chen; Feng Gao; Yinuo Chen; Kang Chen; Tonghe Zhang; Hongzhi Zang; Weinan Zhang; Chao Yu; Yu Wang

arXiv:2602.12628·cs.RO·March 9, 2026

Beyond Imitation: Reinforcement Learning-Based Sim-Real Co-Training for VLA Models

Liangzhi Shi, Shuaihang Chen, Feng Gao, Yinuo Chen, Kang Chen, Tonghe Zhang, Hongzhi Zang, Weinan Zhang, Chao Yu, Yu Wang

PDF

Open Access

TL;DR

This paper introduces an RL-based sim-real co-training framework for vision-language-action models that improves real-world performance, generalization, and data efficiency by combining supervised fine-tuning with reinforcement learning in simulation.

Contribution

It proposes a two-stage RL-co-training method that enhances sim-to-real transfer for VLA models, surpassing traditional supervised fine-tuning approaches.

Findings

01

+24% success on OpenVLA tasks

02

+20% success on π_{0.5} tasks

03

Improved generalization and data efficiency

Abstract

Simulation offers a scalable and low-cost way to enrich vision-language-action (VLA) training, reducing reliance on expensive real-robot demonstrations. However, most sim-real co-training methods rely on supervised fine-tuning (SFT), which treats simulation as a static source of demonstrations and does not exploit large-scale closed-loop interaction. Consequently, real-world gains and generalization are often limited. In this paper, we propose an \underline{\textit{RL}}-based sim-real \underline{\textit{Co}}-training \modify{(RL-Co)} framework that leverages interactive simulation while preserving real-world capabilities. Our method follows a generic two-stage design: we first warm-start the policy with SFT on a mixture of real and simulated demonstrations, then fine-tune it with reinforcement learning in simulation while adding an auxiliary supervised loss on real-world data to anchor…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Reinforcement Learning in Robotics · Domain Adaptation and Few-Shot Learning