R2L: Reliable Reinforcement Learning: Guaranteed Return & Reliable Policies in Reinforcement Learning

Nadir Farhi

arXiv:2510.18074·cs.LG·October 22, 2025

R2L: Reliable Reinforcement Learning: Guaranteed Return & Reliable Policies in Reinforcement Learning

Nadir Farhi

PDF

Open Access

TL;DR

This paper introduces a reliable reinforcement learning framework that guarantees performance thresholds, reformulates the problem into standard RL, and demonstrates its effectiveness in routing and safety-critical tasks.

Contribution

It proposes a novel formulation for reliable RL that ensures performance guarantees and shows how to adapt existing algorithms to this new setting.

Findings

01

Reliable policies can be derived using standard RL algorithms with state augmentation.

02

The approach effectively balances efficiency and reliability in stochastic environments.

03

Numerical experiments validate the practical benefits of the proposed method.

Abstract

In this work, we address the problem of determining reliable policies in reinforcement learning (RL), with a focus on optimization under uncertainty and the need for performance guarantees. While classical RL algorithms aim at maximizing the expected return, many real-world applications - such as routing, resource allocation, or sequential decision-making under risk - require strategies that ensure not only high average performance but also a guaranteed probability of success. To this end, we propose a novel formulation in which the objective is to maximize the probability that the cumulative return exceeds a prescribed threshold. We demonstrate that this reliable RL problem can be reformulated, via a state-augmented representation, into a standard RL problem, thereby allowing the use of existing RL and deep RL algorithms without the need for entirely new algorithmic frameworks.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Vehicle Routing Optimization Methods · Traffic control and management