RoboCodeX: Multimodal Code Generation for Robotic Behavior Synthesis
Yao Mu, Junting Chen, Qinglong Zhang, Shoufa Chen, Qiaojun Yu,, Chongjian Ge, Runjian Chen, Zhixuan Liang, Mengkang Hu, Chaofan Tao, Peize, Sun, Haibao Yu, Chao Yang, Wenqi Shao, Wenhai Wang, Jifeng Dai, Yu Qiao,, Mingyu Ding, Ping Luo

TL;DR
RoboCodeX introduces a tree-structured multimodal code generation framework that enhances robotic behavior synthesis by decomposing instructions and applying specialized reasoning, achieving state-of-the-art results across multiple robotic tasks.
Contribution
The paper presents RoboCodeX, a novel multimodal code generation framework with a hierarchical structure and a new reasoning dataset for improved robotic control and generalization.
Findings
Achieves state-of-the-art performance on manipulation and navigation tasks
Demonstrates effective generalization across different robotics platforms
Outperforms existing methods in both simulation and real-world experiments
Abstract
Robotic behavior synthesis, the problem of understanding multimodal inputs and generating precise physical control for robots, is an important part of Embodied AI. Despite successes in applying multimodal large language models for high-level understanding, it remains challenging to translate these conceptual understandings into detailed robotic actions while achieving generalization across various scenarios. In this paper, we propose a tree-structured multimodal code generation framework for generalized robotic behavior synthesis, termed RoboCodeX. RoboCodeX decomposes high-level human instructions into multiple object-centric manipulation units consisting of physical preferences such as affordance and safety constraints, and applies code generation to introduce generalization ability across various robotics platforms. To further enhance the capability to map conceptual and perceptual…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Reinforcement Learning in Robotics · Speech and dialogue systems
