Current Agents Fail to Leverage World Model as Tool for Foresight

Cheng Qian; Emre Can Acikgoz; Bingxuan Li; Xiusi Chen; Yuji Zhang; Bingxiang He; Qinyu Luo; Dilek Hakkani-T\"ur; Gokhan Tur; Yunzhu Li; Heng Ji

arXiv:2601.03905·cs.AI·January 9, 2026

Current Agents Fail to Leverage World Model as Tool for Foresight

Cheng Qian, Emre Can Acikgoz, Bingxuan Li, Xiusi Chen, Yuji Zhang, Bingxiang He, Qinyu Luo, Dilek Hakkani-T\"ur, Gokhan Tur, Yunzhu Li, Heng Ji

PDF

Open Access

TL;DR

Current vision-language agents underutilize generative world models for foresight, often misusing or ignoring simulation capabilities, highlighting the need for better strategies to leverage these models for improved anticipatory reasoning.

Contribution

This paper empirically evaluates how current agents leverage generative world models, revealing significant underuse and misuse, and identifies key bottlenecks in their strategic utilization.

Findings

01

Few agents invoke simulation (less than 1%)

02

Approximately 15% misuse predicted rollouts

03

Performance can degrade by up to 5% when simulation is used or enforced

Abstract

Agents built on vision-language models increasingly face tasks that demand anticipating future states rather than relying on short-horizon reasoning. Generative world models offer a promising remedy: agents could use them as external simulators to foresee outcomes before acting. This paper empirically examines whether current agents can leverage such world models as tools to enhance their cognition. Across diverse agentic and visual question answering tasks, we observe that some agents rarely invoke simulation (fewer than 1%), frequently misuse predicted rollouts (approximately 15%), and often exhibit inconsistent or even degraded performance (up to 5%) when simulation is available or enforced. Attribution analysis further indicates that the primary bottleneck lies in the agents' capacity to decide when to simulate, how to interpret predicted outcomes, and how to integrate foresight…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI) · Social Robot Interaction and HRI