AGWM: Affordance-Grounded World Models for Environments with Compositional Prerequisites

Qinshi Zhang (1); Weipeng Deng (2); Zhihan Jiang (3); Jiaming Qu (4); Qianren Li (5); Weitao Xu (5); Ray LC (5) ((1) University of California; San Diego; (2) University of Hong Kong; (3) Columbia University; (4) Amazon; (5) City University of Hong Kong)

arXiv:2605.06841·cs.AI·May 11, 2026

AGWM: Affordance-Grounded World Models for Environments with Compositional Prerequisites

Qinshi Zhang (1), Weipeng Deng (2), Zhihan Jiang (3), Jiaming Qu (4), Qianren Li (5), Weitao Xu (5), Ray LC (5) ((1) University of California, San Diego, (2) University of Hong Kong, (3) Columbia University, (4) Amazon, (5) City University of Hong Kong)

PDF

TL;DR

This paper introduces AGWM, a world model that explicitly tracks action prerequisites as a DAG to improve multi-step predictions and generalization in environments with dynamic action affordances.

Contribution

The paper proposes AGWM, a novel affordance-grounded world model that captures dynamic action prerequisites using a DAG structure, addressing limitations of standard models.

Findings

01

Lower multi-step prediction error in experiments

02

Better generalization to novel configurations

03

Enhanced interpretability of action dependencies

Abstract

In model-based learning, the agent learns behaviors by simulating trajectories based on world model predictions. Standard world models typically learn a stationary transition function that maps states and actions to next states, when an action and an outcome frequently co-occur in training data, the model tends to internalize this correlation as a general causal rule while ignoring action preconditions. In interactive environments, however, agent actions can reshape the future affordance space. At each timestep, an action may becomes executable only after its prerequisites are met, or non-executable when they are destroyed. We term such events structure-changing events (SC events). As a result, a conventional world model often fails to determine whether a given action is executable in the current state, especially in multi-step predictions. Each imagined step is conditioned on an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.