Distributional Reinforcement Learning via the Cram\'er Distance
Vanya Aziz, Ivo Nowak, E.M.T Hendrix

TL;DR
This paper introduces C-DSAC, a distributional reinforcement learning algorithm that minimizes the squared Cramér distance, demonstrating superior performance in robotic benchmarks due to confidence-driven updates.
Contribution
It presents a novel distributional RL method using the Cramér distance within SAC, with empirical validation showing improved results over existing methods.
Findings
C-DSAC outperforms baseline SAC and other distributional methods in robotic benchmarks.
High-variance target distributions lead to more conservative updates, reducing overestimation.
The approach enhances understanding of distributional RL convergence and value estimation.
Abstract
This paper explores the application of the Soft Actor-Critic (SAC) algorithm within a Distributional Reinforcement Learning setting and introduces an implementation of such algorithm named Cram\'er-based Distributional Soft Actor-Critic (C-DSAC). The novel approach employs distributional reinforcement learning to represent state-action values, and minimizes the squared Cram\'er distance for learning the distribution. Empirical results across various robotic benchmarks indicate that our algorithm surpasses the performance of baseline SAC and contemporary distributional methods, with the performance advantage becoming increasingly pronounced in high-complexity environments. To explain the efficiency of the new approach, we conduct an analysis showing that its superior performance is partly due to \textit{confidence-driven} Q-value updates: High-variance target distributions (low…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
