JoyAI-RA 0.1: A Foundation Model for Robotic Autonomy

Tianle Zhang; Zhihao Yuan; Dafeng Chi; Peidong Liu; Dongwei Li; Kejun Hu; Likui Zhang; Junnan Nie; Ziming Wei; Zengjue Chen; Yili Tang; Jiayi Li; Zhiyuan Xiang; Mingyang Li; Tianci Luo; Hanwen Wan; Ao Li; Linbo Zhai; Zhihao Zhan; Xiaodong Bai; Jiakun Cai; Peng Cao; Kangliang Chen; Siang Chen; Yixiang Dai; Shuai Di; Yicheng Gong; Chenguang Gui; Yucheng Guo; Peng Hao; Qingrong He; Haoyang Huang; Kunrui Huang; Zhixuan Huang; Shibo Jin; Yixiang Jin; Anson Li; Dongjiang Li; Jiawei Li; Ruodai Li; Yihang Li; Yuzhen Li; Jiaming Liang; Fangsheng Liu; Jing Long; Mingxi Luo; Xing Pan; Hui Shen; Xiaomeng Tian; Daming Wang; Song Wang; Junwu Xiong; Hang Xu; Wanting Xu; Zhengcheng Yu; He Zhang; Jiyao Zhang; Lin Zhao; Chen Zhou; Nan Duan; Yuzheng Zhuang; Liang Lin

arXiv:2604.20100·cs.RO·April 24, 2026

JoyAI-RA 0.1: A Foundation Model for Robotic Autonomy

Tianle Zhang, Zhihao Yuan, Dafeng Chi, Peidong Liu, Dongwei Li, Kejun Hu, Likui Zhang, Junnan Nie, Ziming Wei, Zengjue Chen, Yili Tang, Jiayi Li, Zhiyuan Xiang, Mingyang Li, Tianci Luo, Hanwen Wan, Ao Li, Linbo Zhai, Zhihao Zhan, Xiaodong Bai, Jiakun Cai, Peng Cao

PDF

TL;DR

JoyAI-RA is a foundation model designed to improve robotic manipulation and generalization across different robot embodiments using diverse multi-source data and a unified training framework.

Contribution

It introduces a multi-source pretraining framework that bridges embodiment gaps, enhancing cross-embodiment generalization in robotic manipulation tasks.

Findings

01

Outperforms state-of-the-art methods in simulation and real-world benchmarks.

02

Effectively bridges embodiment gaps between human manipulation and robotic control.

03

Enhances cross-embodiment behavior learning through heterogeneous data integration.

Abstract

Robotic autonomy in open-world environments is fundamentally limited by insufficient data diversity and poor cross-embodiment generalization. Existing robotic datasets are often limited in scale and task coverage, while relatively large differences across robot embodiments impede effective behavior knowledge transfer. To address these challenges, we propose JoyAI-RA, a vision-language-action (VLA) embodied foundation model tailored for generalizable robotic manipulation. JoyAI-RA presents a multi-source multi-level pretraining framework that integrates web data, large-scale egocentric human manipulation videos, simulation-generated trajectories, and real-robot data. Through training on heterogeneous multi-source data with explicit action-space unification, JoyAI-RA effectively bridges embodiment gaps, particularly between human manipulation and robotic control, thereby enhancing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.