Interpretability in Action: Exploratory Analysis of VPT, a Minecraft Agent
Karolis Jucys, George Adamopoulos, Mehrab Hamidi, Stephanie Milani,, Mohammad Reza Samsami, Artem Zholus, Sonia Joseph, Blake Richards, Irina, Rish, \"Ozg\"ur \c{S}im\c{s}ek

TL;DR
This paper explores the decision-making mechanisms of the VPT Minecraft agent using interpretability techniques, revealing how it maintains coherence over long tasks and uncovering a goal misgeneralization issue.
Contribution
It provides the first detailed interpretability analysis of a large vision-based Minecraft agent, highlighting attention patterns and identifying a critical goal misgeneralization problem.
Findings
Agent attends to recent and key frames for coherence
Identifies a goal misgeneralization where villagers are mistaken for trees
Reveals attention mechanisms that support long-term task performance
Abstract
Understanding the mechanisms behind decisions taken by large foundation models in sequential decision making tasks is critical to ensuring that such systems operate transparently and safely. In this work, we perform exploratory analysis on the Video PreTraining (VPT) Minecraft playing agent, one of the largest open-source vision-based agents. We aim to illuminate its reasoning mechanisms by applying various interpretability techniques. First, we analyze the attention mechanism while the agent solves its training task - crafting a diamond pickaxe. The agent pays attention to the last four frames and several key-frames further back in its six-second memory. This is a possible mechanism for maintaining coherence in a task that takes 3-10 minutes, despite the short memory span. Secondly, we perform various interventions, which help us uncover a worrying case of goal misgeneralization: VPT…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMulti-Agent Systems and Negotiation
MethodsSoftmax · Attention Is All You Need
