Modelling Agent Policies with Interpretable Imitation Learning

Tom Bewley; Jonathan Lawry; Arthur Richards

arXiv:2006.11309·cs.AI·June 23, 2020

Modelling Agent Policies with Interpretable Imitation Learning

Tom Bewley, Jonathan Lawry, Arthur Richards

PDF

TL;DR

This paper introduces an imitation learning approach that creates interpretable decision tree models of agent policies, enabling understanding of black box agents' internal states in safety-critical domains.

Contribution

It presents a novel method for reverse-engineering black box policies into decision trees and explicitly models latent state representations from Markov states.

Findings

01

Initial results in a multi-agent traffic environment are promising.

02

The approach produces simplified, interpretable models of complex policies.

03

Explicit latent state modeling enhances understanding of agent behavior.

Abstract

As we deploy autonomous agents in safety-critical domains, it becomes important to develop an understanding of their internal mechanisms and representations. We outline an approach to imitation learning for reverse-engineering black box agent policies in MDP environments, yielding simplified, interpretable models in the form of decision trees. As part of this process, we explicitly model and learn agents' latent state representations by selecting from a large space of candidate features constructed from the Markov state. We present initial promising results from an implementation in a multi-agent traffic environment.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.