Apprenticeship Learning via Frank-Wolfe
Tom Zahavy, Alon Cohen, Haim Kaplan, Yishay Mansour

TL;DR
This paper applies the Frank-Wolfe algorithm to Apprenticeship Learning, providing a convex optimization perspective, tighter convergence bounds, and demonstrating linear convergence rates for the first time.
Contribution
It reformulates Apprenticeship Learning as a convex projection problem and introduces a Frank-Wolfe based method with proven linear convergence and stochastic variants.
Findings
A Frank-Wolfe based algorithm achieves linear convergence in AL.
A stochastic FW variant reduces the need for precise feature expectation estimates.
Experimental results show the proposed method outperforms baseline algorithms.
Abstract
We consider the applications of the Frank-Wolfe (FW) algorithm for Apprenticeship Learning (AL). In this setting, we are given a Markov Decision Process (MDP) without an explicit reward function. Instead, we observe an expert that acts according to some policy, and the goal is to find a policy whose feature expectations are closest to those of the expert policy. We formulate this problem as finding the projection of the feature expectations of the expert on the feature expectations polytope -- the convex hull of the feature expectations of all the deterministic policies in the MDP. We show that this formulation is equivalent to the AL objective and that solving this problem using the FW algorithm is equivalent well-known Projection method of Abbeel and Ng (2004). This insight allows us to analyze AL with tools from convex optimization literature and derive tighter convergence bounds on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
