Statistical analysis of Inverse Entropy-regularized Reinforcement Learning
Denis Belomestny, Alexey Naumov, Sergey Samsonov

TL;DR
This paper introduces a statistical framework for inverse entropy-regularized reinforcement learning that uniquely recovers the reward function by combining entropy regularization with least-squares reconstruction, providing theoretical guarantees.
Contribution
It develops a novel statistical approach that resolves reward ambiguity in IRL using entropy regularization and establishes minimax optimal convergence rates.
Findings
Provides high-probability bounds for policy estimation error.
Derives non-asymptotic minimax optimal rates for reward recovery.
Bridges IRL with modern statistical learning theory.
Abstract
Inverse reinforcement learning aims to infer the reward function that explains expert behavior observed through trajectories of state--action pairs. A long-standing difficulty in classical IRL is the non-uniqueness of the recovered reward: many reward functions can induce the same optimal policy, rendering the inverse problem ill-posed. In this paper, we develop a statistical framework for Inverse Entropy-regularized Reinforcement Learning that resolves this ambiguity by combining entropy regularization with a least-squares reconstruction of the reward from the soft Bellman residual. This combination yields a unique and well-defined so-called least-squares reward consistent with the expert policy. We model the expert demonstrations as a Markov chain with the invariant distribution defined by an unknown expert policy and estimate the policy by a penalized maximum-likelihood…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Robot Manipulation and Learning
