Choquet regularization for reinforcement learning
Xia Han, Ruodu Wang, Xun Yu Zhou

TL;DR
This paper introduces Choquet regularizers to control exploration in reinforcement learning, reformulating the entropy-regularized problem and deriving explicit solutions in linear-quadratic cases, linking regularizers to common exploration strategies.
Contribution
It proposes a novel Choquet regularizer framework for RL exploration, providing explicit solutions and connecting regularizers to standard exploration methods.
Findings
Explicit optimal distributions for specific Choquet regularizers.
Choquet regularizers can generate common exploration strategies.
Reformulation of entropy-regularized RL with Choquet regularizers.
Abstract
We propose \emph{Choquet regularizers} to measure and manage the level of exploration for reinforcement learning (RL), and reformulate the continuous-time entropy-regularized RL problem of Wang et al. (2020, JMLR, 21(198)) in which we replace the differential entropy used for regularization with a Choquet regularizer. We derive the Hamilton--Jacobi--Bellman equation of the problem, and solve it explicitly in the linear--quadratic (LQ) case via maximizing statically a mean--variance constrained Choquet regularizer. Under the LQ setting, we derive explicit optimal distributions for several specific Choquet regularizers, and conversely identify the Choquet regularizers that generate a number of broadly used exploratory samplers such as -greedy, exponential, uniform and Gaussian.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Advanced Bandit Algorithms Research
