Act2Goal: From World Model To General Goal-conditioned Policy

Pengfei Zhou; Liliang Chen; Shengcong Chen; Di Chen; Wenzhi Zhao; Rongjun Jin; Guanghui Ren; Jianlan Luo

arXiv:2512.23541·cs.RO·December 30, 2025

Act2Goal: From World Model To General Goal-conditioned Policy

Pengfei Zhou, Liliang Chen, Shengcong Chen, Di Chen, Wenzhi Zhao, Rongjun Jin, Guanghui Ren, Jianlan Luo

PDF

Open Access

TL;DR

Act2Goal introduces a goal-conditioned manipulation policy combining a visual world model with multi-scale temporal control, enabling robust, long-horizon robotic tasks with strong zero-shot generalization and rapid online adaptation.

Contribution

The paper presents Act2Goal, a novel approach integrating a visual world model with multi-scale temporal hashing for improved long-horizon manipulation.

Findings

01

Achieves success rates from 30% to 90% on out-of-distribution tasks.

02

Enables rapid autonomous improvement through reward-free online adaptation.

03

Demonstrates strong zero-shot generalization to new objects and environments.

Abstract

Specifying robotic manipulation tasks in a manner that is both expressive and precise remains a central challenge. While visual goals provide a compact and unambiguous task specification, existing goal-conditioned policies often struggle with long-horizon manipulation due to their reliance on single-step action prediction without explicit modeling of task progress. We propose Act2Goal, a general goal-conditioned manipulation policy that integrates a goal-conditioned visual world model with multi-scale temporal control. Given a current observation and a target visual goal, the world model generates a plausible sequence of intermediate visual states that captures long-horizon structure. To translate this visual plan into robust execution, we introduce Multi-Scale Temporal Hashing (MSTH), which decomposes the imagined trajectory into dense proximal frames for fine-grained closed-loop…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobot Manipulation and Learning · Reinforcement Learning in Robotics · Multimodal Machine Learning Applications