Reinforcement Learning via Fenchel-Rockafellar Duality

Ofir Nachum; Bo Dai

arXiv:2001.01866·cs.LG·January 13, 2020·25 cites

Reinforcement Learning via Fenchel-Rockafellar Duality

Ofir Nachum, Bo Dai

PDF

Open Access 1 Repo

TL;DR

This paper reviews Fenchel-Rockafellar convex duality and demonstrates its application to various reinforcement learning problems, providing a unified framework that enables offline policy evaluation, optimization, and policy learning.

Contribution

It offers a unified perspective on applying convex duality to RL, connecting existing results and enabling new methods for offline and online learning.

Findings

01

Unified treatment of convex duality in RL

02

Methods for offline policy evaluation and gradient estimation

03

Framework for policy learning via max-likelihood optimization

Abstract

We review basic concepts of convex duality, focusing on the very general and supremely useful Fenchel-Rockafellar duality. We summarize how this duality may be applied to a variety of reinforcement learning (RL) settings, including policy evaluation or optimization, online or offline learning, and discounted or undiscounted rewards. The derivations yield a number of intriguing results, including the ability to perform policy evaluation and on-policy policy gradient with behavior-agnostic offline data and methods to learn a policy via max-likelihood optimization. Although many of these results have appeared previously in various forms, we provide a unified treatment and perspective on these results, which we hope will enable researchers to better use and apply the tools of convex duality to make further progress in RL.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

google-research/dice_rl
tf

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Evolutionary Algorithms and Applications · Neural dynamics and brain function