DeCo: Task Decomposition and Skill Composition for Zero-Shot Generalization in Long-Horizon 3D Manipulation

Zixuan Chen; Junhui Yin; Yangtao Chen; Jing Huo; Pinzhuo Tian; Jieqi Shi; Yiwen Hou; Yinchuan Li; Yang Gao

arXiv:2505.00527·cs.RO·February 17, 2026

DeCo: Task Decomposition and Skill Composition for Zero-Shot Generalization in Long-Horizon 3D Manipulation

Zixuan Chen, Junhui Yin, Yangtao Chen, Jing Huo, Pinzhuo Tian, Jieqi Shi, Yiwen Hou, Yinchuan Li, Yang Gao

PDF

TL;DR

DeCo is a modular framework that decomposes and recomposes skills for zero-shot generalization in long-horizon 3D manipulation tasks, significantly improving success rates in novel scenarios.

Contribution

DeCo introduces a task decomposition and skill composition framework that enhances zero-shot generalization in long-horizon manipulation tasks using vision-language models.

Findings

01

DeCo improves success rates by up to 66.67% on novel tasks.

02

DeCo enables zero-shot transfer to real-world tasks with 53.33% success.

03

DeCo outperforms baseline models in compositional generalization.

Abstract

Generalizing language-conditioned multi-task imitation learning (IL) models to novel long-horizon 3D manipulation tasks is challenging. To address this, we propose DeCo (Task Decomposition and Skill Composition), a model-agnostic framework that enhances zero-shot generalization to compositional long-horizon manipulation tasks. DeCo decomposes IL demonstrations into modular atomic tasks based on gripper-object interactions, creating a dataset that enables models to learn reusable skills. At inference, DeCo uses a vision-language model (VLM) to parse high-level instructions, retrieve relevant skills, and dynamically schedule their execution. A spatially-aware skill-chaining module ensures smooth, collision-free transitions between skills. We introduce DeCoBench, a benchmark designed to evaluate compositional generalization in long-horizon manipulation tasks. DeCo improves the success rate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.