ADAM: An Embodied Causal Agent in Open-World Environments
Shu Yu, Chaochao Lu

TL;DR
ADAM is an embodied causal agent designed for open-world environments like Minecraft, capable of autonomous navigation, causal knowledge learning, and complex task execution with high interpretability and robustness.
Contribution
This paper introduces ADAM, a novel embodied agent that constructs causal graphs from scratch, enhancing interpretability and generalization in open-world tasks.
Findings
ADAM constructs nearly perfect causal graphs from scratch.
ADAM maintains performance without prior knowledge.
ADAM demonstrates robustness and generalization in open-world environments.
Abstract
In open-world environments like Minecraft, existing agents face challenges in continuously learning structured knowledge, particularly causality. These challenges stem from the opacity inherent in black-box models and an excessive reliance on prior knowledge during training, which impair their interpretability and generalization capability. To this end, we introduce ADAM, An emboDied causal Agent in Minecraft, that can autonomously navigate the open world, perceive multimodal contexts, learn causal world knowledge, and tackle complex tasks through lifelong learning. ADAM is empowered by four key components: 1) an interaction module, enabling the agent to execute actions while documenting the interaction processes; 2) a causal model module, tasked with constructing an ever-growing causal graph from scratch, which enhances interpretability and diminishes reliance on prior knowledge; 3) a…
Peer Reviews
Decision·ICLR 2025 Poster
1. The figures in the paper are well-done and enhance clarity, making the content easier to understand. 2. The experiments conducted in Minecraft show a higher success rate than those achieved by Voyager.
1. The paper appears to be hastily prepared, as it contains numerous typos and minor errors, such as inconsistencies between “Fig.” and “Figure” references and improper usage of quotation marks in Table 4’s caption. I recommend that the authors carefully review and correct these issues. 2. The major issue lies in the extensive use of pretrained language models that already incorporate substantial knowledge of Minecraft. Since language models may internally form a comprehensive causal graph of th
- The method seems to be the first approach that combines casual inference with LLM agents in code-based action spaces, which is a potentially very important direction for future research. - The method performs quite well for inferring causal graphs on Minecraft, and it seems to provide a way for agents to take advantage of those causal graphs.
- The presentation of the method is quite high-level and does not help the reader understand how the method actually works in practice. When the different "modules" are introduced, it is not clear a priori what they actually are. Are they just prompts and specifications to a GPT4 model? If so, it could be beneficial to show one of the prompts earlier, to guide the understanding of the rest of the paper. - The comparisons in the paper are unclear, due to the choice of a specific action space that
1. The incorporation of causal discovery methods in a modular framework is a novel in LLM-based embodied exploration, and it does not rely on privileged information unlike prior work 2. The paper demonstrate strong empirical results with well-designed experiments, that led to significantly faster discovery of skills in Minecraft. Performance in modified environments where prior knowledge is invalid did not degrade performance too much demonstrates causal learning is indeed effective. Method also
1. One concern is whether ADAM scales with more complex world and causal graph for intervention-based causal discovery (CD). 2. Interestingly the paper proposes a multimodal agentic framework but all the baselines compared to are text-based frameworks. It would be good to have at least one multi-modal baseline, e.g. [1] as this is also cited by the authors. [1] Wang, Z., Cai, S., Liu, A., Jin, Y., Hou, J., Zhang, B., ... & Liang, Y. (2023). Jarvis-1: Open-world multi-task agents with memory-aug
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Multi-Agent Systems and Negotiation
MethodsAdam
