Model-Free Risk-Sensitive Reinforcement Learning
Gr\'egoire Del\'etang, Jordi Grau-Moya, Markus Kunesch, Tim Genewein,, Rob Brekelmans, Shane Legg, Pedro A. Ortega

TL;DR
This paper introduces a risk-sensitive reinforcement learning algorithm that extends TD learning to estimate free energy, enabling decision-making that accounts for uncertainty in model-free settings.
Contribution
It develops a novel stochastic approximation rule for estimating Gaussian free energy, integrating risk sensitivity into model-free reinforcement learning.
Findings
Provides a new risk-sensitive RL algorithm based on TD learning.
Enables estimation of mean and variance from i.i.d. samples.
Applicable to risk-sensitive decision-making scenarios.
Abstract
We extend temporal-difference (TD) learning in order to obtain risk-sensitive, model-free reinforcement learning algorithms. This extension can be regarded as modification of the Rescorla-Wagner rule, where the (sigmoidal) stimulus is taken to be either the event of over- or underestimating the TD target. As a result, one obtains a stochastic approximation rule for estimating the free energy from i.i.d. samples generated by a Gaussian distribution with unknown mean and variance. Since the Gaussian free energy is known to be a certainty-equivalent sensitive to the mean and the variance, the learning rule has applications in risk-sensitive decision-making.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGene Regulatory Network Analysis · Advanced Multi-Objective Optimization Algorithms · Evolutionary Algorithms and Applications
