Language Agents Meet Causality -- Bridging LLMs and Causal World Models
John Gkountouras, Matthias Lindemann, Phillip Lippe, Efstratios, Gavves, Ivan Titov

TL;DR
This paper introduces a framework combining causal representation learning with large language models to improve causally-aware reasoning and planning, especially for complex and long-horizon tasks.
Contribution
It presents a novel integration of causal world models with LLMs, enabling flexible, causally-informed reasoning and planning in natural language.
Findings
Outperforms LLMs in causal inference tasks
Enhances planning accuracy over longer horizons
Demonstrates effectiveness across diverse environments
Abstract
Large Language Models (LLMs) have recently shown great promise in planning and reasoning applications. These tasks demand robust systems, which arguably require a causal understanding of the environment. While LLMs can acquire and reflect common sense causal knowledge from their pretraining data, this information is often incomplete, incorrect, or inapplicable to a specific environment. In contrast, causal representation learning (CRL) focuses on identifying the underlying causal structure within a given environment. We propose a framework that integrates CRLs with LLMs to enable causally-aware reasoning and planning. This framework learns a causal world model, with causal variables linked to natural language expressions. This mapping provides LLMs with a flexible interface to process and generate descriptions of actions and states in text form. Effectively, the causal world model acts…
Peer Reviews
Decision·ICLR 2025 Poster
- The integration of CRL and LLM planning is novel and interesting, it is straight forward to integrate it with multiple other LLM-based search algorithms, not only RAP. - The paper investigates the form of action representation in the casual world model, and provides detailed results on them.
- Reasoning via Planning is not the sota method, [1] combines LLM as a world model and LLM as a policy model, with MCTS, should be better than RAP - Though not discussed in RAP, other literature [2] [3] suggest increasing the number of LLM calls will substantially improve the search algorithm results. How is the overhead and accuracy of LLM+CRL planning vs. increasing LLM calls with tree-search or iterative prompting? - I'm concerned about the ability of the system to work on more complex visua
- The paper proposed a method to learn a causal world model from a sequence of image states and text action descriptions, and demonstrated superior performance in the accuracy of the learned world model. - Presentation of the paper is clear and easy to follow.
1. Some components of the framework would not be available in more realistic environments, e.g. 1) a set of annotated images with ground-truth causal variables is used in training, which is likely not available as we may not know the causal variables for more realistic environments; 2) a rule-based state description generator may not be available for complex environments where we don't know what are the true causal variables. 2. Given the simplicity of the environments, and that the proposed met
The exploration of text-based action representations is particularly interesting, as it shows potential advantages in low-data regimes. I particularly liked the connection to the RL decision making problem. The results indicate that the proposed framework can potentially improve performance in causal inference and planning tasks, which is valuable for the broader ICLR community.
Please see the questions
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Multi-Agent Systems and Negotiation
MethodsCausal inference
