RLDX-1 Technical Report
Dongyoung Kim, Huiwon Jang, Myungkyu Koo, Suhyeok Jang, Taeyoung Kim, Beomjun Kim, Byungjun Yoon, Changsung Jang, Daewon Choi, Dongsu Han, Donguk Lee, Heeseung Kwon, Hojin Jeon, Jaehyun Kang, Jaekyoung Bae, Jihyuk Lee, Jimin Lee, John Won, Joonwoo Ahn, Junhyeong Park

TL;DR
RLDX-1 is a versatile robotic policy that integrates multiple modalities to outperform recent vision-language-action models in complex real-world manipulation tasks, including humanoid control.
Contribution
The paper introduces RLDX-1, a novel multi-stream transformer architecture with system-level optimizations for dexterous manipulation, surpassing existing models in simulation and real-world tests.
Findings
RLDX-1 achieves 86.8% success in ALLEX humanoid tasks.
It outperforms recent models like π_{0.5} and GR00T N1.6 in benchmarks.
RLDX-1 demonstrates reliable control of high-DoF humanoid robots.
Abstract
While Vision-Language-Action models (VLAs) have shown remarkable progress toward human-like generalist robotic policies through the versatile intelligence (i.e. broad scene understanding and language-conditioned generalization) inherited from pre-trained Vision-Language Models, they still struggle with complex real-world tasks requiring broader functional capabilities (e.g. motion awareness, long-term memory, and physical sensing). To address this, we introduce RLDX-1, a general-purpose robotic policy for dexterous manipulation built on the Multi-Stream Action Transformer (MSAT), an architecture that unifies these capabilities by integrating heterogeneous modalities through modality-specific streams with cross-modal joint self-attention. RLDX-1 further combines this architecture with system-level design choices, including data synthesis for rare manipulation scenarios, learning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗RLWRLD/RLDX-1-PTmodel· 270 dl· ♡ 11270 dl♡ 11
- 🤗RLWRLD/RLDX-1-FT-ROBOCASAmodel· 87 dl· ♡ 187 dl♡ 1
- 🤗RLWRLD/RLDX-1-MT-ALLEXmodel· 69 dl· ♡ 169 dl♡ 1
- 🤗RLWRLD/RLDX-1-VLMmodel· 2.2k dl2.2k dl
- 🤗RLWRLD/RLDX-1-FT-SIMPLER-WIDOWXmodel· 39 dl· ♡ 139 dl♡ 1
- 🤗RLWRLD/RLDX-1-FT-SIMPLER-GOOGLEmodel· 41 dl· ♡ 141 dl♡ 1
- 🤗RLWRLD/RLDX-1-FT-GR1model· 52 dl· ♡ 152 dl♡ 1
- 🤗RLWRLD/RLDX-1-MT-DROIDmodel· 70 dl· ♡ 170 dl♡ 1
- 🤗RLWRLD/RLDX-1-FT-RC365model· 29 dl· ♡ 129 dl♡ 1
- 🤗RLWRLD/RLDX-1-FT-LIBEROmodel· 75 dl· ♡ 175 dl♡ 1
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
