Robust Losses for Learning Value Functions
Andrew Patterson, Victor Liao, Martha White

TL;DR
This paper introduces robust Bellman error losses, like Huber and Absolute Bellman errors, reformulated as saddlepoint problems, leading to more stable and less sensitive reinforcement learning algorithms for value function estimation.
Contribution
It formalizes robust Bellman errors as saddlepoint problems and develops gradient-based algorithms that improve stability and robustness over traditional squared Bellman error methods.
Findings
Robust losses outperform mean squared errors in stability and sensitivity.
Proposed algorithms show improved performance in both prediction and control tasks.
Solutions of robust losses are better suited for outlier-prone environments.
Abstract
Most value function learning algorithms in reinforcement learning are based on the mean squared (projected) Bellman error. However, squared errors are known to be sensitive to outliers, both skewing the solution of the objective and resulting in high-magnitude and high-variance gradients. To control these high-magnitude updates, typical strategies in RL involve clipping gradients, clipping rewards, rescaling rewards, or clipping errors. While these strategies appear to be related to robust losses -- like the Huber loss -- they are built on semi-gradient update rules which do not minimize a known loss. In this work, we build on recent insights reformulating squared Bellman errors as a saddlepoint optimization problem and propose a saddlepoint reformulation for a Huber Bellman error and Absolute Bellman error. We start from a formalization of robust losses, then derive sound…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Fuel Cells and Related Materials
MethodsHuber loss
