World Modelling Improves Language Model Agents

Shangmin Guo; Omar Darwiche Domingues; Rapha\"el Avalos; Aaron Courville; Florian Strub

arXiv:2506.02918·cs.AI·September 22, 2025

World Modelling Improves Language Model Agents

Shangmin Guo, Omar Darwiche Domingues, Rapha\"el Avalos, Aaron Courville, Florian Strub

PDF

TL;DR

This paper introduces DyMo, a dynamics modeling approach that enhances language models with internal environment prediction, improving tool use success and reliability in stateful environments without extensive trial-based testing.

Contribution

The paper presents DyMo, a novel method for augmenting LLMs with internal environment modeling and integrates it with self-verification sampling to improve tool use and reliability.

Findings

01

DyMo improves success rates on the Berkeley Function Calling Leaderboard V2.

02

DyMo reduces hallucinations in language model outputs.

03

Integration with SVS enhances reliability and allows models to refuse unreliable outputs.

Abstract

Tool use in stateful environments presents unique challenges for large language models (LLMs), where existing test-time compute strategies relying on repeated trials in the environment are impractical. We propose dynamics modelling (DyMo), a method that augments LLMs with a state prediction capability alongside function calling during post-training. This enables LLMs to predict the future states of their actions through an internal environment model. On the Berkeley Function Calling Leaderboard V2, DyMo improves success rates and significantly reduces hallucinations. We further integrate the internal environment model into self-verification sampling (SVS), and show that this substantially improves pass^k over number of trials k, and allows the model to refuse unreliable outputs. Together, DyMo and SVS greatly enhance the effectiveness and reliability of LLMs for tool use. We believe…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.