Thousand-GPU Large-Scale Training and Optimization Recipe for AI-Native Cloud Embodied Intelligence Infrastructure
Yongjian Guo, Yunxuan Ma, Haoran Sun, Zhong Guan, Shuai Di, Jing Long, Wanting Xu, Xiaodong Bai, Wen Huang, Yucheng Guo, Chen Zhou, Qiming Yang, Mingxi Luo, Tianyun Zhao, Hedan Yang, Song Wang, Xiaomeng Tian, Xiaolong Xiang, Zhen Sun, Yu Wei, Luqiao Wang, Yuzhen Li, Chenfeng Gu

TL;DR
This paper presents a thousand-GPU distributed training platform for embodied intelligence, achieving significant speedups through system optimization, novel techniques, and infrastructure integration, advancing towards AI-native cloud embodied intelligence.
Contribution
It introduces the first industry-scale thousand-GPU training platform for embodied intelligence, with systematic bottleneck solutions and innovative speedup techniques.
Findings
40-fold reduction in training time for large models
188% speed increase via variable-length FlashAttention and Data Packing
Achieved end-to-end validation on thousand-GPU clusters
Abstract
Embodied intelligence is a key step towards Artificial General Intelligence (AGI), yet its development faces multiple challenges including data, frameworks, infrastructure, and evaluation systems. To address these issues, we have, for the first time in the industry, launched a cloud-based, thousand-GPU distributed training platform for embodied intelligence, built upon the widely adopted LeRobot framework, and have systematically overcome bottlenecks across the entire pipeline. At the data layer, we have restructured the data pipeline to optimize the flow of embodied training data. In terms of training, for the GR00T-N1.5 model, utilizing thousand-GPU clusters and data at the scale of hundreds of millions, the single-round training time has been reduced from 15 hours to just 22 minutes, achieving a 40-fold speedup. At the model layer, by combining variable-length FlashAttention and Data…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Big Data and Digital Economy · IoT and Edge/Fog Computing
