Reinforcement Learning for Exponential Utility: Algorithms and Convergence in Discounted MDPs

Gugan Thoppe; L. A. Prashanth; Ankur Naskar; and Sanjay Bhat

arXiv:2605.08053·cs.LG·May 11, 2026

Reinforcement Learning for Exponential Utility: Algorithms and Convergence in Discounted MDPs

Gugan Thoppe, L. A. Prashanth, Ankur Naskar, and Sanjay Bhat

PDF

TL;DR

This paper develops and analyzes new reinforcement learning algorithms for exponential utility optimization in discounted MDPs, establishing their convergence and optimality properties.

Contribution

It introduces two novel Q-value-style algorithms with convergence guarantees for exponential utility in discounted MDPs, filling a key gap in principled value-based RL methods.

Findings

01

Proved contraction properties of the operators in specific metrics.

02

Established almost-sure convergence of the two-timescale Q-learning algorithm.

03

Provided finite-time convergence rates and analyzed challenges for the sublinear operator.

Abstract

Reinforcement learning (RL) for exponential-utility optimization in discounted Markov decision processes (MDPs) lacks principled value-based algorithms. We address this gap in the fixed risk-aversion setting. Building on the Bellman-type equation for exponential utility studied in \cite{porteus1975optimality}, we derive two Q-value-style extensions and show that the associated operators are contractions in the $L_{\infty}$ and sup-log/Thompson metrics, respectively. We characterize their fixed points and prove that the induced greedy stationary policy is optimal for the exponential-utility objective among stationary policies. These structural results lead to two model-free algorithms: a two-timescale Q-learning--style algorithm, for which we establish almost-sure convergence and provide finite-time convergence rates via timescale separation, and a one-timescale algorithm governed by a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.