Probing Emergent Semantics in Predictive Agents via Question Answering
Abhishek Das, Federico Carnevale, Hamza Merzic, Laura Rimell, Rosalia, Schneider, Josh Abramson, Alden Hung, Arun Ahuja, Stephen Clark, Gregory, Wayne, Felix Hill

TL;DR
This paper introduces a question-answering probing method to interpret the internal representations of predictive agents trained in complex environments, revealing their learned factual and relational knowledge.
Contribution
It presents a model-agnostic, human-interpretable probing approach using synthetic questions to analyze what predictive agents learn about their environment.
Findings
Agents encode factual and compositional information about objects and properties.
Probing reveals learned spatial relations and object attributes.
Method is applicable across different predictive modeling approaches.
Abstract
Recent work has shown how predictive modeling can endow agents with rich knowledge of their surroundings, improving their ability to act in complex environments. We propose question-answering as a general paradigm to decode and understand the representations that such agents develop, applying our method to two recent approaches to predictive modeling -action-conditional CPC (Guo et al., 2018) and SimCore (Gregor et al., 2019). After training agents with these predictive objectives in a visually-rich, 3D environment with an assortment of objects, colors, shapes, and spatial configurations, we probe their internal state representations with synthetic (English) questions, without backpropagating gradients from the question-answering decoder into the agent. The performance of different agents when probed this way reveals that they learn to encode factual, and seemingly compositional,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Domain Adaptation and Few-Shot Learning
