Robust Losses for Learning Value Functions

Andrew Patterson; Victor Liao; Martha White

arXiv:2205.08464·cs.LG·April 19, 2023

Robust Losses for Learning Value Functions

Andrew Patterson, Victor Liao, Martha White

PDF

Open Access

TL;DR

This paper introduces robust Bellman error losses, like Huber and Absolute Bellman errors, reformulated as saddlepoint problems, leading to more stable and less sensitive reinforcement learning algorithms for value function estimation.

Contribution

It formalizes robust Bellman errors as saddlepoint problems and develops gradient-based algorithms that improve stability and robustness over traditional squared Bellman error methods.

Findings

01

Robust losses outperform mean squared errors in stability and sensitivity.

02

Proposed algorithms show improved performance in both prediction and control tasks.

03

Solutions of robust losses are better suited for outlier-prone environments.

Abstract

Most value function learning algorithms in reinforcement learning are based on the mean squared (projected) Bellman error. However, squared errors are known to be sensitive to outliers, both skewing the solution of the objective and resulting in high-magnitude and high-variance gradients. To control these high-magnitude updates, typical strategies in RL involve clipping gradients, clipping rewards, rescaling rewards, or clipping errors. While these strategies appear to be related to robust losses -- like the Huber loss -- they are built on semi-gradient update rules which do not minimize a known loss. In this work, we build on recent insights reformulating squared Bellman errors as a saddlepoint optimization problem and propose a saddlepoint reformulation for a Huber Bellman error and Absolute Bellman error. We start from a formalization of robust losses, then derive sound…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Fuel Cells and Related Materials

MethodsHuber loss