Efficient Algorithms for Mitigating Uncertainty and Risk in Reinforcement Learning

Xihong Su

arXiv:2510.17690·cs.LG·October 21, 2025

Efficient Algorithms for Mitigating Uncertainty and Risk in Reinforcement Learning

Xihong Su

PDF

Open Access

TL;DR

This paper introduces new algorithms and theoretical insights for risk-averse reinforcement learning, including policy optimization, convergence conditions, and model-free Q-learning methods for uncertain environments.

Contribution

It presents a novel connection between policy gradient and dynamic programming, establishes conditions for contraction of ERM Bellman operators, and develops risk-averse Q-learning algorithms.

Findings

01

CADP guarantees monotone policy improvements.

02

Existence of stationary deterministic optimal policies proven.

03

Q-learning algorithms converge to risk-averse optimal policies.

Abstract

This dissertation makes three main contributions. First, We identify a new connection between policy gradient and dynamic programming in MMDPs and propose the Coordinate Ascent Dynamic Programming (CADP) algorithm to compute a Markov policy that maximizes the discounted return averaged over the uncertain models. CADP adjusts model weights iteratively to guarantee monotone policy improvements to a local maximum. Second, We establish sufficient and necessary conditions for the exponential ERM Bellman operator to be a contraction and prove the existence of stationary deterministic optimal policies for ERM-TRC and EVaR-TRC. We also propose exponential value iteration, policy iteration, and linear programming algorithms for computing optimal stationary policies for ERM-TRC and EVaR-TRC. Third, We propose model-free Q-learning algorithms for computing policies with risk-averse objectives:…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Game Theory and Applications