Fast Rates for Inverse Reinforcement Learning
Andreas Schlaginhaufen, Maryam Kamgarpour

TL;DR
This paper proves fast convergence rates for entropy-regularized inverse reinforcement learning in finite-horizon MDPs, showing statistical efficiency and structural equivalences under broad conditions.
Contribution
It establishes the equivalence of MLE and Min-Max-IRL at the population level and derives fast convergence rates for parameter estimation in IRL.
Findings
KL divergence and parameter error decay at rate O(n^{-1})
Results hold under misspecification without exploration assumptions
Extended reward-identifiability to general Borel spaces
Abstract
We establish novel structural and statistical results for entropy-regularized min-max inverse reinforcement learning (Min-Max-IRL) with linear reward classes in finite-horizon MDPs with Borel state and action spaces. On the structural side, we show that maximum likelihood estimation (MLE) and Min-Max-IRL are equivalent at the population level, and at the empirical level under deterministic dynamics. On the statistical side, exploiting pseudo-self-concordance of the Min-Max-IRL loss, we prove that both the trajectory-level KL divergence and the squared parameter error in the Hessian norm decay at the fast rate , where is the number of expert trajectories. Our guarantees apply under misspecification and require no exploration assumptions. We further extend reward-identifiability results to general Borel spaces and derive novel results on the derivatives of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
