The $f$-Divergence Reinforcement Learning Framework

Chen Gong; Qiang He; Yunpeng Bai; Zhou Yang; Xiaoyu Chen; Xinwen Hou,; Xianjie Zhang; Yu Liu; Guoliang Fan

arXiv:2109.11867·cs.LG·December 15, 2021

The $f$-Divergence Reinforcement Learning Framework

Chen Gong, Qiang He, Yunpeng Bai, Zhou Yang, Xiaoyu Chen, Xinwen Hou,, Xianjie Zhang, Yu Liu, Guoliang Fan

PDF

Open Access

TL;DR

This paper introduces the $f$-Divergence Reinforcement Learning (FRL) framework, which simultaneously performs policy evaluation and improvement by minimizing $f$-divergence, leading to convergence to optimal policies and improved performance in Atari games.

Contribution

The paper proposes a novel DRL framework using $f$-divergence minimization, enabling simultaneous policy evaluation and improvement, and alleviating overestimation issues.

Findings

01

FRL converges to optimal policies theoretically.

02

Agents trained with FRL outperform baseline algorithms on Atari games.

03

The framework naturally reduces value function overestimation.

Abstract

The framework of deep reinforcement learning (DRL) provides a powerful and widely applicable mathematical formalization for sequential decision-making. This paper present a novel DRL framework, termed \emph{ $f$ -Divergence Reinforcement Learning (FRL)}. In FRL, the policy evaluation and policy improvement phases are simultaneously performed by minimizing the $f$ -divergence between the learning policy and sampling policy, which is distinct from conventional DRL algorithms that aim to maximize the expected cumulative rewards. We theoretically prove that minimizing such $f$ -divergence can make the learning policy converge to the optimal policy. Besides, we convert the process of training agents in FRL framework to a saddle-point optimization problem with a specific $f$ function through Fenchel conjugate, which forms new methods for policy evaluation and policy improvement. Through…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Evolutionary Algorithms and Applications