RoboMatrix: A Skill-centric Hierarchical Framework for Scalable Robot   Task Planning and Execution in Open-World

Weixin Mao; Weiheng Zhong; Zhou Jiang; Dong Fang; Zhongyue Zhang,; Zihan Lan; Haosheng Li; Fan Jia; Tiancai Wang; Haoqiang Fan; Osamu Yoshie

arXiv:2412.00171·cs.RO·March 26, 2025

RoboMatrix: A Skill-centric Hierarchical Framework for Scalable Robot Task Planning and Execution in Open-World

Weixin Mao, Weiheng Zhong, Zhou Jiang, Dong Fang, Zhongyue Zhang,, Zihan Lan, Haosheng Li, Fan Jia, Tiancai Wang, Haoqiang Fan, Osamu Yoshie

PDF

Open Access 1 Repo 1 Datasets

TL;DR

RoboMatrix introduces a hierarchical, skill-centric framework utilizing large language models and a unified vision-language-action model to enable scalable, generalizable robot task planning and execution in open-world environments.

Contribution

It presents RoboMatrix, the first unified vision-language-action model, and demonstrates skill composition for improved generalization in robot task execution.

Findings

01

50% higher success rate on unseen tasks

02

Effective skill composition enables generalization

03

Unified VLA model integrates movement and manipulation

Abstract

Existing robot policies predominantly adopt the task-centric approach, requiring end-to-end task data collection. This results in limited generalization to new tasks and difficulties in pinpointing errors within long-horizon, multi-stage tasks. To address this, we propose RoboMatrix, a skill-centric hierarchical framework designed for scalable robot task planning and execution in open-world environments. RoboMatrix extracts general meta-skills from diverse complex tasks, enabling the completion of unseen tasks through skill composition. Its architecture consists of a high-level scheduling layer that utilizes large language models (LLMs) for task decomposition, an intermediate skill layer housing meta-skill models, and a low-level hardware layer for robot control. A key innovation of our work is the introduction of the first unified vision-language-action (VLA) model capable of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

waynemao/robomatrix
pytorchOfficial

Datasets

RoboMatrix/RoboMatrix
dataset· 119 dl
119 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobot Manipulation and Learning · Reinforcement Learning in Robotics · Robotic Path Planning Algorithms

MethodsADaptive gradient method with the OPTimal convergence rate