Inverse Q-Learning Done Right: Offline Imitation Learning in $Q^\pi$-Realizable MDPs
Antoine Moulin, Gergely Neu, Luca Viano

TL;DR
This paper introduces a new offline imitation learning algorithm, POIL, for linear and nonlinear Q-realizable MDPs, with theoretical guarantees and empirical success on benchmarks.
Contribution
It proposes POIL, a novel saddle-point based algorithm for offline imitation learning in Q-realizable MDPs, with proven performance guarantees and practical neural network implementation.
Findings
POIL matches expert performance with sample complexity in linear cases.
The method extends to nonlinear Q-realizable MDPs with higher sample complexity.
Neural POIL outperforms behavior cloning and rivals state-of-the-art algorithms.
Abstract
We study the problem of offline imitation learning in Markov decision processes (MDPs), where the goal is to learn a well-performing policy given a dataset of state-action pairs generated by an expert policy. Complementing a recent line of work on this topic that assumes the expert belongs to a tractable class of known policies, we approach this problem from a new angle and leverage a different type of structural assumption about the environment. Specifically, for the class of linear -realizable MDPs, we introduce a new algorithm called saddle-point offline imitation learning (\SPOIL), which is guaranteed to match the performance of any expert up to an additive error with access to samples. Moreover, we extend this result to possibly nonlinear -realizable MDPs at the cost of a worse sample complexity of order…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Algorithms and Data Compression · Neural Networks and Applications
