Provably Robust Temporal Difference Learning for Heavy-Tailed Rewards

Semih Cayci; Atilla Eryilmaz

arXiv:2306.11455·cs.LG·June 21, 2023·2 cites

Provably Robust Temporal Difference Learning for Heavy-Tailed Rewards

Semih Cayci, Atilla Eryilmaz

PDF

Open Access 1 Video

TL;DR

This paper introduces a robust temporal difference learning method with dynamic gradient clipping to handle heavy-tailed reward distributions in reinforcement learning, providing provable guarantees and improved sample complexity.

Contribution

It develops a provably robust TD learning algorithm with dynamic gradient clipping for heavy-tailed rewards, improving theoretical guarantees over existing methods.

Findings

01

Achieves sample complexity of order O(ε^{-1/p}) with heavy-tailed rewards.

02

Provides high-probability bounds for the robust TD learning.

03

Numerical experiments validate the theoretical results.

Abstract

In a broad class of reinforcement learning applications, stochastic rewards have heavy-tailed distributions, which lead to infinite second-order moments for stochastic (semi)gradients in policy evaluation and direct policy optimization. In such instances, the existing RL methods may fail miserably due to frequent statistical outliers. In this work, we establish that temporal difference (TD) learning with a dynamic gradient clipping mechanism, and correspondingly operated natural actor-critic (NAC), can be provably robustified against heavy-tailed reward distributions. It is shown in the framework of linear function approximation that a favorable tradeoff between bias and variability of the stochastic gradients can be achieved with this dynamic gradient clipping mechanism. In particular, we prove that robust versions of TD learning achieve sample complexities of order…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Provably Robust Temporal Difference Learning for Heavy-Tailed Rewards· slideslive

Taxonomy

TopicsAge of Information Optimization · Reinforcement Learning in Robotics

MethodsGradient Clipping