Generalizable Hierarchical Skill Learning via Object-Centric Representation

Haibo Zhao; Yu Qi; Boce Hu; Yizhe Zhu; Ziyan Chen; Heng Tian; Xupeng Zhu; Owen Howell; Haojie Huang; Robin Walters; Dian Wang; Robert Platt

arXiv:2510.21121·cs.RO·October 27, 2025

Generalizable Hierarchical Skill Learning via Object-Centric Representation

Haibo Zhao, Yu Qi, Boce Hu, Yizhe Zhu, Ziyan Chen, Heng Tian, Xupeng Zhu, Owen Howell, Haojie Huang, Robin Walters, Dian Wang, Robert Platt

PDF

Open Access

TL;DR

The paper introduces GSL, a hierarchical skill learning framework that leverages object-centric representations and foundation models to enhance robot manipulation generalization and sample efficiency across diverse tasks.

Contribution

GSL is a novel hierarchical policy framework that uses object-centric skills and foundation models to improve generalization and sample efficiency in robot manipulation.

Findings

01

GSL trained with 3 demonstrations outperforms baselines with 90 demonstrations in simulation.

02

GSL surpasses baselines trained with 10 times more data in real-world experiments.

03

Significant improvements in generalization across unseen objects and task variations.

Abstract

We present Generalizable Hierarchical Skill Learning (GSL), a novel framework for hierarchical policy learning that significantly improves policy generalization and sample efficiency in robot manipulation. One core idea of GSL is to use object-centric skills as an interface that bridges the high-level vision-language model and the low-level visual-motor policy. Specifically, GSL decomposes demonstrations into transferable and object-canonicalized skill primitives using foundation models, ensuring efficient low-level skill learning in the object frame. At test time, the skill-object pairs predicted by the high-level agent are fed to the low-level module, where the inferred canonical actions are mapped back to the world frame for execution. This structured yet flexible design leads to substantial improvements in sample efficiency and generalization of our method across unseen spatial…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Robot Manipulation and Learning · Multimodal Machine Learning Applications