Choquet regularization for reinforcement learning

Xia Han; Ruodu Wang; Xun Yu Zhou

arXiv:2208.08497·stat.ML·August 19, 2022

Choquet regularization for reinforcement learning

Xia Han, Ruodu Wang, Xun Yu Zhou

PDF

Open Access

TL;DR

This paper introduces Choquet regularizers to control exploration in reinforcement learning, reformulating the entropy-regularized problem and deriving explicit solutions in linear-quadratic cases, linking regularizers to common exploration strategies.

Contribution

It proposes a novel Choquet regularizer framework for RL exploration, providing explicit solutions and connecting regularizers to standard exploration methods.

Findings

01

Explicit optimal distributions for specific Choquet regularizers.

02

Choquet regularizers can generate common exploration strategies.

03

Reformulation of entropy-regularized RL with Choquet regularizers.

Abstract

We propose \emph{Choquet regularizers} to measure and manage the level of exploration for reinforcement learning (RL), and reformulate the continuous-time entropy-regularized RL problem of Wang et al. (2020, JMLR, 21(198)) in which we replace the differential entropy used for regularization with a Choquet regularizer. We derive the Hamilton--Jacobi--Bellman equation of the problem, and solve it explicitly in the linear--quadratic (LQ) case via maximizing statically a mean--variance constrained Choquet regularizer. Under the LQ setting, we derive explicit optimal distributions for several specific Choquet regularizers, and conversely identify the Choquet regularizers that generate a number of broadly used exploratory samplers such as $ϵ$ -greedy, exponential, uniform and Gaussian.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Advanced Bandit Algorithms Research