Taylor Expansion of Discount Factors

Yunhao Tang; Mark Rowland; R\'emi Munos; Michal Valko

arXiv:2106.06170·cs.LG·June 16, 2021

Taylor Expansion of Discount Factors

Yunhao Tang, Mark Rowland, R\'emi Munos, Michal Valko

PDF

Open Access 1 Datasets 1 Video

TL;DR

This paper investigates the impact of using different discount factors during learning versus evaluation in reinforcement learning, proposing a family of interpolated objectives that improve value estimation and policy optimization.

Contribution

It introduces a novel framework for interpolating value functions between two discount factors, providing new methods for value estimation and policy updates with empirical benefits.

Findings

01

Empirical performance gains in RL tasks using the proposed interpolation framework.

02

Insights into the effects of discount factor discrepancies on learning.

03

New methods for value function estimation and policy optimization.

Abstract

In practical reinforcement learning (RL), the discount factor used for estimating value functions often differs from that used for defining the evaluation objective. In this work, we study the effect that this discrepancy of discount factors has during learning, and discover a family of objectives that interpolate value functions of two distinct discount factors. Our analysis suggests new ways for estimating value functions and performing policy optimization updates, which demonstrate empirical performance gains. This framework also leads to new insights on commonly-used deep RL heuristic modifications to policy optimization algorithms.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

misovalko/my-research-papers
dataset· 21 dl
21 dl

Videos

Taylor Expansion of Discount Factors· slideslive

Taxonomy

TopicsEnergy Efficiency and Management · Reinforcement Learning in Robotics