Intrinsic Benefits of Categorical Distributional Loss: Uncertainty-aware Regularized Exploration in Reinforcement Learning

Ke Sun; Yingnan Zhao; Enze Shi; Yafei Wang; Xiaodong Yan; Bei Jiang; Linglong Kong

arXiv:2110.03155·cs.LG·December 25, 2025·1 cites

Intrinsic Benefits of Categorical Distributional Loss: Uncertainty-aware Regularized Exploration in Reinforcement Learning

Ke Sun, Yingnan Zhao, Enze Shi, Yafei Wang, Xiaodong Yan, Bei Jiang, Linglong Kong

PDF

Open Access 1 Video

TL;DR

This paper reveals that the success of distributional reinforcement learning stems from an intrinsic entropy regularization that captures return distribution uncertainty, leading to improved exploration and policy performance.

Contribution

It uncovers the theoretical basis of distributional RL's benefits as an entropy regularization derived from categorical distributional loss, enhancing understanding of its exploration advantages.

Findings

01

Distributional RL's superiority is linked to a distribution-matching entropy regularization.

02

The derived entropy regularization implicitly guides exploration by modeling environmental uncertainty.

03

Experiments demonstrate the effectiveness of uncertainty-aware regularization in improving RL performance.

Abstract

The remarkable empirical performance of distributional reinforcement learning (RL) has garnered increasing attention to understanding its theoretical advantages over classical RL. By decomposing the categorical distributional loss commonly employed in distributional RL, we find that the potential superiority of distributional RL can be attributed to a derived distribution-matching entropy regularization. This less-studied entropy regularization aims to capture additional knowledge of return distribution beyond only its expectation, contributing to an augmented reward signal in policy optimization. In contrast to the vanilla entropy regularization in MaxEnt RL, which explicitly encourages exploration by promoting diverse actions, the novel entropy regularization derived from categorical distributional loss implicitly updates policies to align the learned policy with (estimated)…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Intrinsic Benefits of Categorical Distributional Loss: Uncertainty-aware Regularized Exploration in Reinforcement Learning· slideslive

Taxonomy

TopicsEvolutionary Algorithms and Applications · Innovation Diffusion and Forecasting

MethodsEntropy Regularization