RLinf-USER: A Unified and Extensible System for Real-World Online Policy Learning in Embodied AI

Hongzhi Zang; Shu'ang Yu; Hao Lin; Tianxing Zhou; Zefang Huang; Zhen Guo; Xin Xu; Jiakai Zhou; Yuze Sheng; Shizhe Zhang; Feng Gao; Wenhao Tang; Yufeng Yue; Quanlu Zhang; Xinlei Chen; Chao Yu; Yu Wang

arXiv:2602.07837·cs.RO·February 13, 2026

RLinf-USER: A Unified and Extensible System for Real-World Online Policy Learning in Embodied AI

Hongzhi Zang, Shu'ang Yu, Hao Lin, Tianxing Zhou, Zefang Huang, Zhen Guo, Xin Xu, Jiakai Zhou, Yuze Sheng, Shizhe Zhang, Feng Gao, Wenhao Tang, Yufeng Yue, Quanlu Zhang, Xinlei Chen, Chao Yu, Yu Wang

PDF

Open Access

TL;DR

USER is a comprehensive system that facilitates real-world online policy learning for embodied AI by integrating hardware management, adaptive communication, and flexible learning frameworks, enabling scalable, long-horizon experiments across diverse robots and models.

Contribution

The paper introduces USER, a novel unified system that addresses hardware heterogeneity, communication challenges, and long-term training needs for real-world online policy learning in embodied AI.

Findings

01

Enables multi-robot coordination in real-world settings.

02

Supports heterogeneous manipulators and large vision-language models.

03

Facilitates long-horizon asynchronous training with robust crash recovery.

Abstract

Online policy learning directly in the physical world is a promising yet challenging direction for embodied intelligence. Unlike simulation, real-world systems cannot be arbitrarily accelerated, cheaply reset, or massively replicated, which makes scalable data collection, heterogeneous deployment, and long-horizon effective training difficult. These challenges suggest that real-world policy learning is not only an algorithmic issue but fundamentally a systems problem. We present USER, a Unified and extensible SystEm for Real-world online policy learning. USER treats physical robots as first-class hardware resources alongside GPUs through a unified hardware abstraction layer, enabling automatic discovery, management, and scheduling of heterogeneous robots. To address cloud-edge communication, USER introduces an adaptive communication plane with tunneling-based networking, distributed…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Software-Defined Networks and 5G · IoT and Edge/Fog Computing