A Joint Imitation-Reinforcement Learning Framework for Reduced Baseline Regret
Sheelabhadra Dey, Sumedh Pendurkar, Guni Sharon, Josiah P. Hanna

TL;DR
This paper introduces JIRL, a joint imitation-reinforcement learning framework that minimizes baseline regret during training while eventually surpassing baseline performance in control tasks.
Contribution
JIRL is a novel framework that combines imitation of a baseline policy with reinforcement learning to reduce regret and improve control performance.
Findings
JIRL reduces baseline regret by up to 21 times compared to existing methods.
JIRL achieves comparable final performance to state-of-the-art algorithms.
JIRL effectively transitions control from baseline to RL in continuous action domains.
Abstract
In various control task domains, existing controllers provide a baseline level of performance that -- though possibly suboptimal -- should be maintained. Reinforcement learning (RL) algorithms that rely on extensive exploration of the state and action space can be used to optimize a control policy. However, fully exploratory RL algorithms may decrease performance below a baseline level during training. In this paper, we address the issue of online optimization of a control policy while minimizing regret w.r.t a baseline policy performance. We present a joint imitation-reinforcement learning framework, denoted JIRL. The learning process in JIRL assumes the availability of a baseline policy and is designed with two objectives in mind \textbf{(a)} leveraging the baseline's online demonstrations to minimize the regret w.r.t the baseline policy during training, and \textbf{(b)} eventually…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Smart Grid Energy Management
