Optimistic Reinforcement Learning by Forward Kullback-Leibler Divergence   Optimization

Taisuke Kobayashi

arXiv:2105.12991·cs.LG·April 25, 2022

Optimistic Reinforcement Learning by Forward Kullback-Leibler Divergence Optimization

Taisuke Kobayashi

PDF

TL;DR

This paper introduces an optimistic reinforcement learning approach based on forward KL divergence, which accelerates learning and improves performance by leveraging the asymmetry of KL divergence and integrating with experience replay and eligibility traces.

Contribution

It formulates a novel RL optimization framework using forward KL divergence, leading to an optimistic learning paradigm that enhances efficiency and performance.

Findings

01

Moderate optimism accelerates learning.

02

The method outperforms state-of-the-art RL in robotic simulations.

03

Integration with prioritized replay and eligibility traces enhances learning speed.

Abstract

This paper addresses a new interpretation of the traditional optimization method in reinforcement learning (RL) as optimization problems using reverse Kullback-Leibler (KL) divergence, and derives a new optimization method using forward KL divergence, instead of reverse KL divergence in the optimization problems. Although RL originally aims to maximize return indirectly through optimization of policy, the recent work by Levine has proposed a different derivation process with explicit consideration of optimality as stochastic variable. This paper follows this concept and formulates the traditional learning laws for both value function and policy as the optimization problems with reverse KL divergence including optimality. Focusing on the asymmetry of KL divergence, the new optimization problems with forward KL divergence are derived. Remarkably, such new optimization problems can be…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsPrioritized Experience Replay · Experience Replay