GAN-Based Interactive Reinforcement Learning from Demonstration and   Human Evaluative Feedback

Jie Huang; Rongshun Juan; Randy Gomez; Keisuke Nakamura; Qixin Sha; Bo; He; Guangliang Li

arXiv:2104.06600·cs.LG·April 15, 2021

GAN-Based Interactive Reinforcement Learning from Demonstration and Human Evaluative Feedback

Jie Huang, Rongshun Juan, Randy Gomez, Keisuke Nakamura, Qixin Sha, Bo, He, Guangliang Li

PDF

TL;DR

This paper introduces GAIRL, a novel reinforcement learning method that combines demonstrations and human feedback to improve policy learning, outperforming traditional GAIL in various control tasks.

Contribution

The paper proposes GAIRL, integrating GAIL and interactive reinforcement learning, to surpass demonstration limits and enhance policy stability in complex tasks.

Findings

01

GAIRL outperforms GAIL in all tested tasks.

02

GAIRL learns more stable and near-optimal policies.

03

Combining demonstrations with human feedback is effective.

Abstract

Deep reinforcement learning (DRL) has achieved great successes in many simulated tasks. The sample inefficiency problem makes applying traditional DRL methods to real-world robots a great challenge. Generative Adversarial Imitation Learning (GAIL) -- a general model-free imitation learning method, allows robots to directly learn policies from expert trajectories in large environments. However, GAIL shares the limitation of other imitation learning methods that they can seldom surpass the performance of demonstrations. In this paper, to address the limit of GAIL, we propose GAN-Based Interactive Reinforcement Learning (GAIRL) from demonstration and human evaluative feedback by combining the advantages of GAIL and interactive reinforcement learning. We tested our proposed method in six physics-based control tasks, ranging from simple low-dimensional control tasks -- Cart Pole and Mountain…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsGenerative Adversarial Imitation Learning