Evidence on the Regularisation Properties of Maximum-Entropy Reinforcement Learning
R\'emy Hosseinkhan-Boucher (1, 2), Onofrio Semeraro (1, 2), Lionel Mathelin (1, 2) ((1) Universit\'e Paris-Saclay, (2) CNRS)

TL;DR
This paper investigates how maximum-entropy reinforcement learning policies exhibit robustness and generalisation properties in noisy, chaotic systems, linking these properties to complexity measures from statistical learning theory.
Contribution
It provides new insights into the robustness of entropy-regularised policies and connects these properties to complexity measures, advancing understanding of their generalisation capabilities.
Findings
Entropy-regularised policies are robust to observation noise.
Complexity measures predict robustness levels.
Relationship established between entropy regularisation and noise robustness.
Abstract
The generalisation and robustness properties of policies learnt through Maximum-Entropy Reinforcement Learning are investigated on chaotic dynamical systems with Gaussian noise on the observable. First, the robustness under noise contamination of the agent's observation of entropy regularised policies is observed. Second, notions of statistical learning theory, such as complexity measures on the learnt model, are borrowed to explain and predict the phenomenon. Results show the existence of a relationship between entropy-regularised policy optimisation and robustness to noise, which can be described by the chosen complexity measures.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdaptive Dynamic Programming Control · Neural Networks and Applications · Machine Learning and ELM
