A Bayesian Approach to Policy Recognition and State Representation Learning
Adrian \v{S}o\v{s}i\'c, Abdelhak M. Zoubir, Heinz Koeppl

TL;DR
This paper introduces a Bayesian framework for learning from demonstration that models the full distribution of expert policies and infers the complexity of state representations without assuming optimality or deterministic behavior.
Contribution
It presents a Bayesian approach to policy recognition that handles stochastic expert behaviors and infers task-relevant state space partitionings in a nonparametric manner.
Findings
Successfully models the posterior distribution of expert policies.
Infers the complexity of state representations from demonstration data.
Learns task-specific state space partitionings.
Abstract
Learning from demonstration (LfD) is the process of building behavioral models of a task from demonstrations provided by an expert. These models can be used e.g. for system control by generalizing the expert demonstrations to previously unencountered situations. Most LfD methods, however, make strong assumptions about the expert behavior, e.g. they assume the existence of a deterministic optimal ground truth policy or require direct monitoring of the expert's controls, which limits their practical use as part of a general system identification framework. In this work, we consider the LfD problem in a more general setting where we allow for arbitrary stochastic expert policies, without reasoning about the optimality of the demonstrations. Following a Bayesian methodology, we model the full posterior distribution of possible expert controllers that explain the provided demonstration data.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
