Stochastic convex optimization for provably efficient apprenticeship learning
Angeliki Kamoutsi, Goran Banjac, and John Lygeros

TL;DR
This paper introduces a computationally efficient stochastic convex optimization method for apprenticeship learning in large-scale MDPs, providing non-asymptotic performance guarantees without explicitly learning the cost function.
Contribution
It proposes a new convex optimization approach that directly learns policies from demonstrations, bypassing traditional inverse reinforcement learning steps.
Findings
Developed a scalable algorithm with high-confidence regret bounds.
Achieved theoretical guarantees for policy quality in apprenticeship learning.
Demonstrated effectiveness on benchmark tasks.
Abstract
We consider large-scale Markov decision processes (MDPs) with an unknown cost function and employ stochastic convex optimization tools to address the problem of imitation learning, which consists of learning a policy from a finite set of expert demonstrations. We adopt the apprenticeship learning formalism, which carries the assumption that the true cost function can be represented as a linear combination of some known features. Existing inverse reinforcement learning algorithms come with strong theoretical guarantees, but are computationally expensive because they use reinforcement learning or planning algorithms as a subroutine. On the other hand, state-of-the-art policy gradient based algorithms (like IM-REINFORCE, IM-TRPO, and GAIL), achieve significant empirical success in challenging benchmark tasks, but are not well understood in terms of theory. With an emphasis on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
