Inverse Q-Learning Done Right: Offline Imitation Learning in $Q^\pi$-Realizable MDPs

Antoine Moulin; Gergely Neu; Luca Viano

arXiv:2505.19946·cs.LG·January 9, 2026

Inverse Q-Learning Done Right: Offline Imitation Learning in $Q^\pi$-Realizable MDPs

Antoine Moulin, Gergely Neu, Luca Viano

PDF

Open Access 1 Repo

TL;DR

This paper introduces a new offline imitation learning algorithm, POIL, for linear and nonlinear Q-realizable MDPs, with theoretical guarantees and empirical success on benchmarks.

Contribution

It proposes POIL, a novel saddle-point based algorithm for offline imitation learning in Q-realizable MDPs, with proven performance guarantees and practical neural network implementation.

Findings

01

POIL matches expert performance with sample complexity in linear cases.

02

The method extends to nonlinear Q-realizable MDPs with higher sample complexity.

03

Neural POIL outperforms behavior cloning and rivals state-of-the-art algorithms.

Abstract

We study the problem of offline imitation learning in Markov decision processes (MDPs), where the goal is to learn a well-performing policy given a dataset of state-action pairs generated by an expert policy. Complementing a recent line of work on this topic that assumes the expert belongs to a tractable class of known policies, we approach this problem from a new angle and leverage a different type of structural assumption about the environment. Specifically, for the class of linear $Q^{π}$ -realizable MDPs, we introduce a new algorithm called saddle-point offline imitation learning (\SPOIL), which is guaranteed to match the performance of any expert up to an additive error $ε$ with access to $O (ε^{- 2})$ samples. Moreover, we extend this result to possibly nonlinear $Q^{π}$ -realizable MDPs at the cost of a worse sample complexity of order…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

antoine-moulin/spoil
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms · Algorithms and Data Compression · Neural Networks and Applications