Goal2Skill: Long-Horizon Manipulation with Adaptive Planning and Reflection

Zhen Liu; Xinyu Ning; Zhe Hu; Xinxin Xie; Weize Li; Zhipeng Tang; Chongyu Wang; Zejun Yang; Hanlin Wang; Yitong Liu; Zhongzhu Pu

arXiv:2604.13942·cs.RO·April 16, 2026

Goal2Skill: Long-Horizon Manipulation with Adaptive Planning and Reflection

Zhen Liu, Xinyu Ning, Zhe Hu, Xinxin Xie, Weize Li, Zhipeng Tang, Chongyu Wang, Zejun Yang, Hanlin Wang, Yitong Liu, Zhongzhu Pu

PDF

TL;DR

Goal2Skill introduces a dual-system framework for long-horizon embodied manipulation, combining high-level planning with low-level visuomotor control to improve robustness and success rates in complex tasks.

Contribution

The paper presents a novel dual-system approach that separates semantic planning from motor execution, enabling memory-aware reasoning and adaptive recovery in long-horizon tasks.

Findings

01

Achieved a 32.4% success rate on RMBench tasks, outperforming the 9.8% of the best baseline.

02

Structured memory and closed-loop recovery significantly improve task success.

03

The framework effectively handles partial observability, occlusions, and multi-stage dependencies.

Abstract

Recent vision-language-action (VLA) systems have demonstrated strong capabilities in embodied manipulation. However, most existing VLA policies rely on limited observation windows and end-to-end action prediction, which makes them brittle in long-horizon, memory-dependent tasks with partial observability, occlusions, and multi-stage dependencies. Such tasks require not only precise visuomotor control, but also persistent memory, adaptive task decomposition, and explicit recovery from execution failures. To address these limitations, we propose a dual-system framework for long-horizon embodied manipulation. Our framework explicitly separates high-level semantic reasoning from low-level motor execution. A high-level planner, implemented as a VLM-based agentic module, maintains structured task memory and performs goal decomposition, outcome verification, and error-driven correction. A…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.