Correcting Momentum in Temporal Difference Learning

Emmanuel Bengio; Joelle Pineau; Doina Precup

arXiv:2106.03955·cs.LG·June 9, 2021·1 cites

Correcting Momentum in Temporal Difference Learning

Emmanuel Bengio, Joelle Pineau, Doina Precup

PDF

Open Access 1 Repo

TL;DR

This paper identifies a problem with momentum in TD learning, proposes a correction to improve sample efficiency, and highlights that deep RL benefits from tailored techniques rather than direct transfer from supervised learning.

Contribution

It introduces a first-order correction to momentum in TD learning, addressing gradient staleness and improving policy evaluation efficiency.

Findings

01

Correction improves sample efficiency in policy evaluation

02

Momentum in TD learning accumulates doubly stale gradients

03

Deep RL techniques should be adapted from supervised learning methods

Abstract

A common optimization tool used in deep reinforcement learning is momentum, which consists in accumulating and discounting past gradients, reapplying them at each iteration. We argue that, unlike in supervised learning, momentum in Temporal Difference (TD) learning accumulates gradients that become doubly stale: not only does the gradient of the loss change due to parameter updates, the loss itself changes due to bootstrapping. We first show that this phenomenon exists, and then propose a first-order correction term to momentum. We show that this correction term improves sample efficiency in policy evaluation by correcting target value drift. An important insight of this work is that deep RL methods are not always best served by directly importing techniques from the supervised setting.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

bengioe/staleness-corrected-momentum
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition