TL;DR
This paper introduces a new primal-dual framework for reinforcement learning from demonstrations that is computationally efficient, model-free, and has complexities independent of the number of states, bridging theory and practice.
Contribution
It proposes a novel bilinear saddle-point approach using Lagrangian duality for policy learning from demonstrations, with provable efficiency and practical advantages.
Findings
Algorithm is model-free and computationally efficient.
Sample complexity is independent of the number of states.
Provides an online-learning interpretation.
Abstract
We consider large-scale Markov decision processes with an unknown cost function and address the problem of learning a policy from a finite set of expert demonstrations. We assume that the learner is not allowed to interact with the expert and has no access to reinforcement signal of any kind. Existing inverse reinforcement learning methods come with strong theoretical guarantees, but are computationally expensive, while state-of-the-art policy optimization algorithms achieve significant empirical success, but are hampered by limited theoretical understanding. To bridge the gap between theory and practice, we introduce a novel bilinear saddle-point framework using Lagrangian duality. The proposed primal-dual viewpoint allows us to develop a model-free provably efficient algorithm through the lens of stochastic convex optimization. The method enjoys the advantages of simplicity of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
