Online Apprenticeship Learning

Lior Shani; Tom Zahavy; Shie Mannor

arXiv:2102.06924·cs.LG·December 30, 2021

Online Apprenticeship Learning

Lior Shani, Tom Zahavy, Shie Mannor

PDF

Open Access 1 Video

TL;DR

This paper introduces an online apprenticeship learning algorithm that efficiently learns policies from expert trajectories without solving an MDP at each step, achieving low regret and good performance in high-dimensional control tasks.

Contribution

We propose a novel online apprenticeship learning method combining mirror descent algorithms, avoiding repeated MDP solutions, and demonstrate its effectiveness with a deep variant similar to GAIL.

Findings

01

Achieves $O(\sqrt{K})$ regret with optimistic exploration.

02

Avoids solving MDPs at each iteration, improving practicality.

03

Performs well in high-dimensional control environments.

Abstract

In Apprenticeship Learning (AL), we are given a Markov Decision Process (MDP) without access to the cost function. Instead, we observe trajectories sampled by an expert that acts according to some policy. The goal is to find a policy that matches the expert's performance on some predefined set of cost functions. We introduce an online variant of AL (Online Apprenticeship Learning; OAL), where the agent is expected to perform comparably to the expert while interacting with the environment. We show that the OAL problem can be effectively solved by combining two mirror descent based no-regret algorithms: one for policy optimization and another for learning the worst case cost. By employing optimistic exploration, we derive a convergent algorithm with $O (K)$ regret, where $K$ is the number of interactions with the MDP, and an additional linear error term that depends on the amount of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Online Apprenticeship Learning· underline

Taxonomy

TopicsReinforcement Learning in Robotics · Machine Learning and Algorithms · Optimization and Search Problems

MethodsGenerative Adversarial Imitation Learning