MetaWorld-X: Hierarchical World Modeling via VLM-Orchestrated Experts for Humanoid Loco-Manipulation

Yutong Shen; Hangxu Liu; Penghui Liu; Jiashuo Luo; Yongkang Zhang; Rex Morvley; Chen Jiang; Jianwei Zhang; Lei Zhang

arXiv:2603.08572·cs.RO·March 10, 2026

MetaWorld-X: Hierarchical World Modeling via VLM-Orchestrated Experts for Humanoid Loco-Manipulation

Yutong Shen, Hangxu Liu, Penghui Liu, Jiashuo Luo, Yongkang Zhang, Rex Morvley, Chen Jiang, Jianwei Zhang, Lei Zhang

PDF

Open Access

TL;DR

MetaWorld-X introduces a hierarchical control framework for humanoid robots that combines specialized experts, imitation learning, and vision-language guidance to improve naturalness, stability, and generalization in complex loco-manipulation tasks.

Contribution

The paper presents a novel hierarchical world model with expert decomposition, imitation-based training, and VLM-guided expert routing for humanoid control, addressing limitations of monolithic policies.

Findings

01

Enhanced motion naturalness and stability.

02

Improved generalization to complex tasks.

03

Effective semantic-driven expert composition.

Abstract

Learning natural, stable, and compositionally generalizable whole-body control policies for humanoid robots performing simultaneous locomotion and manipulation (loco-manipulation) remains a fundamental challenge in robotics. Existing reinforcement learning approaches typically rely on a single monolithic policy to acquire multiple skills, which often leads to cross-skill gradient interference and motion pattern conflicts in high-degree-of-freedom systems. As a result, generated behaviors frequently exhibit unnatural movements, limited stability, and poor generalization to complex task compositions. To address these limitations, we propose MetaWorld-X, a hierarchical world model framework for humanoid control. Guided by a divide-and-conquer principle, our method decomposes complex control problems into a set of specialized expert policies (Specialized Expert Policies, SEP). Each expert…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Robotic Locomotion and Control · Robot Manipulation and Learning