Long-Horizon Manipulation via Trace-Conditioned VLA Planning

Isabella Liu; An-Chieh Cheng; Rui Yan; Geng Chen; Ri-Zhao Qiu; Xueyan Zou; Sha Yi; Hongxu Yin; Xiaolong Wang; Sifei Liu

arXiv:2604.21924·cs.RO·April 24, 2026

Long-Horizon Manipulation via Trace-Conditioned VLA Planning

Isabella Liu, An-Chieh Cheng, Rui Yan, Geng Chen, Ri-Zhao Qiu, Xueyan Zou, Sha Yi, Hongxu Yin, Xiaolong Wang, Sifei Liu

PDF

1 Repo

TL;DR

The paper introduces LoHo-Manip, a modular framework that enhances long-horizon vision-language-action manipulation by combining a task manager with visual trace planning, improving robustness and success in complex tasks.

Contribution

It presents a novel decoupled planning and execution approach using a trace-conditioned VLA system for scalable long-horizon manipulation tasks.

Findings

01

Significant improvements in long-horizon success rates.

02

Enhanced robustness and generalization in manipulation tasks.

03

Effective replanning without hand-crafted recovery logic.

Abstract

Long-horizon manipulation remains challenging for vision-language-action (VLA) policies: real tasks are multi-step, progress-dependent, and brittle to compounding execution errors. We present LoHo-Manip, a modular framework that scales short-horizon VLA execution to long-horizon instruction following via a dedicated task-management VLM. The manager is decoupled from the executor and is invoked in a receding-horizon manner: given the current observation, it predicts a progress-aware remaining plan that combines (i) a subtask sequence with an explicit done + remaining split as lightweight language memory, and (ii) a visual trace -- a compact 2D keypoint trajectory prompt specifying where to go and what to approach next. The executor VLA is adapted to condition on the rendered trace, thereby turning long-horizon decision-making into repeated local control by following the trace. Crucially,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://www.liuisabella.com/LoHoManip
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.