Provably Efficient Generative Adversarial Imitation Learning for Online   and Offline Setting with Linear Function Approximation

Zhihan Liu; Yufeng Zhang; Zuyue Fu; Zhuoran Yang; and Zhaoran Wang

arXiv:2108.08765·cs.LG·August 20, 2021·1 cites

Provably Efficient Generative Adversarial Imitation Learning for Online and Offline Setting with Linear Function Approximation

Zhihan Liu, Yufeng Zhang, Zuyue Fu, Zhuoran Yang, and Zhaoran Wang

PDF

Open Access

TL;DR

This paper introduces provably efficient algorithms for generative adversarial imitation learning (GAIL) in both online and offline settings with linear function approximation, providing theoretical guarantees on regret and optimality gap.

Contribution

The paper develops the OGAP and PGAP algorithms for online and offline GAIL with linear functions, achieving near-optimal regret and gap bounds with rigorous proofs.

Findings

01

OGAP achieves $ ilde{O}(H^2 d^{3/2}K^{1/2}+KH^{3/2}dN_1^{-1/2})$ regret.

02

PGAP attains the minimax lower bound for offline GAIL.

03

Under sufficient coverage, PGAP achieves $ ilde{O}(H^{2}dK^{-1/2} +H^2d^{3/2}N_2^{-1/2}+H^{3/2}dN_1^{-1/2})$ optimality gap.

Abstract

In generative adversarial imitation learning (GAIL), the agent aims to learn a policy from an expert demonstration so that its performance cannot be discriminated from the expert policy on a certain predefined reward set. In this paper, we study GAIL in both online and offline settings with linear function approximation, where both the transition and reward function are linear in the feature maps. Besides the expert demonstration, in the online setting the agent can interact with the environment, while in the offline setting the agent only accesses an additional dataset collected by a prior. For online GAIL, we propose an optimistic generative adversarial policy optimization algorithm (OGAP) and prove that OGAP achieves $O (H^{2} d^{3/2} K^{1/2} + K H^{3/2} d N_{1}^{- 1/2})$ regret. Here $N_{1}$ represents the number of trajectories of the expert demonstration, $d$ is the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Generative Adversarial Networks and Image Synthesis · Model Reduction and Neural Networks

MethodsGenerative Adversarial Imitation Learning