The $f$-Divergence Reinforcement Learning Framework
Chen Gong, Qiang He, Yunpeng Bai, Zhou Yang, Xiaoyu Chen, Xinwen Hou,, Xianjie Zhang, Yu Liu, Guoliang Fan

TL;DR
This paper introduces the $f$-Divergence Reinforcement Learning (FRL) framework, which simultaneously performs policy evaluation and improvement by minimizing $f$-divergence, leading to convergence to optimal policies and improved performance in Atari games.
Contribution
The paper proposes a novel DRL framework using $f$-divergence minimization, enabling simultaneous policy evaluation and improvement, and alleviating overestimation issues.
Findings
FRL converges to optimal policies theoretically.
Agents trained with FRL outperform baseline algorithms on Atari games.
The framework naturally reduces value function overestimation.
Abstract
The framework of deep reinforcement learning (DRL) provides a powerful and widely applicable mathematical formalization for sequential decision-making. This paper present a novel DRL framework, termed \emph{-Divergence Reinforcement Learning (FRL)}. In FRL, the policy evaluation and policy improvement phases are simultaneously performed by minimizing the -divergence between the learning policy and sampling policy, which is distinct from conventional DRL algorithms that aim to maximize the expected cumulative rewards. We theoretically prove that minimizing such -divergence can make the learning policy converge to the optimal policy. Besides, we convert the process of training agents in FRL framework to a saddle-point optimization problem with a specific function through Fenchel conjugate, which forms new methods for policy evaluation and policy improvement. Through…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Evolutionary Algorithms and Applications
