SOP: A Scalable Online Post-Training System for Vision-Language-Action Models

Mingjie Pan; Siyuan Feng; Qinglin Zhang; Xinchen Li; Jianheng Song; Chendi Qu; Yi Wang; Chuankang Li; Ziyu Xiong; Zhi Chen; Yi Liu; Jianlan Luo

arXiv:2601.03044·cs.RO·January 7, 2026

SOP: A Scalable Online Post-Training System for Vision-Language-Action Models

Mingjie Pan, Siyuan Feng, Qinglin Zhang, Xinchen Li, Jianheng Song, Chendi Qu, Yi Wang, Chuankang Li, Ziyu Xiong, Zhi Chen, Yi Liu, Jianlan Luo

PDF

Open Access

TL;DR

This paper presents SOP, a scalable online system for post-training vision-language-action models that enables real-time, multi-robot, multi-task learning directly in the physical environment, significantly improving performance and scalability.

Contribution

The paper introduces SOP, a novel online, distributed post-training framework for VLA models that integrates real-world interaction with scalable, multi-task learning in physical robots.

Findings

01

SOP improves VLA model performance across various tasks.

02

Post-training within hours of real-world interaction is feasible.

03

Performance scales near-linearly with the number of robots.

Abstract

Vision-language-action (VLA) models achieve strong generalization through large-scale pre-training, but real-world deployment requires expert-level task proficiency in addition to broad generality. Existing post-training approaches for VLA models are typically offline, single-robot, or task-specific, limiting effective on-policy adaptation and scalable learning from real-world interaction. We introduce a Scalable Online Post-training (SOP) system that enables online, distributed, multi-task post-training of generalist VLA models directly in the physical world. SOP tightly couples execution and learning through a closed-loop architecture in which a fleet of robots continuously streams on-policy experience and human intervention signals to a centralized cloud learner, and asynchronously receives updated policies. This design supports prompt on-policy correction, scales experience…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Reinforcement Learning in Robotics · Domain Adaptation and Few-Shot Learning