Evidence on the Regularisation Properties of Maximum-Entropy Reinforcement Learning

R\'emy Hosseinkhan-Boucher (1; 2); Onofrio Semeraro (1; 2); Lionel Mathelin (1; 2) ((1) Universit\'e Paris-Saclay; (2) CNRS)

arXiv:2501.17115·cs.LG·February 25, 2026

Evidence on the Regularisation Properties of Maximum-Entropy Reinforcement Learning

R\'emy Hosseinkhan-Boucher (1, 2), Onofrio Semeraro (1, 2), Lionel Mathelin (1, 2) ((1) Universit\'e Paris-Saclay, (2) CNRS)

PDF

Open Access

TL;DR

This paper investigates how maximum-entropy reinforcement learning policies exhibit robustness and generalisation properties in noisy, chaotic systems, linking these properties to complexity measures from statistical learning theory.

Contribution

It provides new insights into the robustness of entropy-regularised policies and connects these properties to complexity measures, advancing understanding of their generalisation capabilities.

Findings

01

Entropy-regularised policies are robust to observation noise.

02

Complexity measures predict robustness levels.

03

Relationship established between entropy regularisation and noise robustness.

Abstract

The generalisation and robustness properties of policies learnt through Maximum-Entropy Reinforcement Learning are investigated on chaotic dynamical systems with Gaussian noise on the observable. First, the robustness under noise contamination of the agent's observation of entropy regularised policies is observed. Second, notions of statistical learning theory, such as complexity measures on the learnt model, are borrowed to explain and predict the phenomenon. Results show the existence of a relationship between entropy-regularised policy optimisation and robustness to noise, which can be described by the chosen complexity measures.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdaptive Dynamic Programming Control · Neural Networks and Applications · Machine Learning and ELM