Risk-Averse Reinforcement Learning with Itakura-Saito Loss
Igor Udovichenko, Olivier Croissant, Anita Toleutaeva, Evgeny Burnaev, Alexander Korotin

TL;DR
This paper introduces a new risk-averse reinforcement learning approach using the Itakura-Saito divergence as a loss function, demonstrating improved performance over existing methods in various scenarios.
Contribution
It presents a numerically stable, mathematically sound Itakura-Saito divergence-based loss function for risk-averse RL with exponential utility, along with theoretical and empirical validation.
Findings
Itakura-Saito loss outperforms alternatives in multiple scenarios.
The method is stable and compatible with existing RL algorithms.
The approach effectively incorporates risk aversion via exponential utility.
Abstract
Risk-averse reinforcement learning finds application in various high-stakes fields. Unlike classical reinforcement learning, which aims to maximize expected returns, risk-averse agents choose policies that minimize risk, occasionally sacrificing expected value. These preferences can be framed through utility theory. We focus on the specific case of the exponential utility function, where one can derive the Bellman equations and employ various reinforcement learning algorithms with few modifications. To address this, we introduce to the broad machine learning community a numerically stable and mathematically sound loss function based on the Itakura-Saito divergence for learning state-value and action-value functions. We evaluate the Itakura-Saito loss function against established alternatives, both theoretically and empirically. In the experimental section, we explore multiple scenarios,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Risk and Portfolio Optimization · Explainable Artificial Intelligence (XAI)
MethodsFocus
