A Joint Imitation-Reinforcement Learning Framework for Reduced Baseline   Regret

Sheelabhadra Dey; Sumedh Pendurkar; Guni Sharon; Josiah P. Hanna

arXiv:2209.09446·cs.LG·September 21, 2022

A Joint Imitation-Reinforcement Learning Framework for Reduced Baseline Regret

Sheelabhadra Dey, Sumedh Pendurkar, Guni Sharon, Josiah P. Hanna

PDF

Open Access 1 Repo

TL;DR

This paper introduces JIRL, a joint imitation-reinforcement learning framework that minimizes baseline regret during training while eventually surpassing baseline performance in control tasks.

Contribution

JIRL is a novel framework that combines imitation of a baseline policy with reinforcement learning to reduce regret and improve control performance.

Findings

01

JIRL reduces baseline regret by up to 21 times compared to existing methods.

02

JIRL achieves comparable final performance to state-of-the-art algorithms.

03

JIRL effectively transitions control from baseline to RL in continuous action domains.

Abstract

In various control task domains, existing controllers provide a baseline level of performance that -- though possibly suboptimal -- should be maintained. Reinforcement learning (RL) algorithms that rely on extensive exploration of the state and action space can be used to optimize a control policy. However, fully exploratory RL algorithms may decrease performance below a baseline level during training. In this paper, we address the issue of online optimization of a control policy while minimizing regret w.r.t a baseline policy performance. We present a joint imitation-reinforcement learning framework, denoted JIRL. The learning process in JIRL assumes the availability of a baseline policy and is designed with two objectives in mind \textbf{(a)} leveraging the baseline's online demonstrations to minimize the regret w.r.t the baseline policy during training, and \textbf{(b)} eventually…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

pi-star-lab/jirl
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Smart Grid Energy Management