R2L: Reliable Reinforcement Learning: Guaranteed Return & Reliable Policies in Reinforcement Learning
Nadir Farhi

TL;DR
This paper introduces a reliable reinforcement learning framework that guarantees performance thresholds, reformulates the problem into standard RL, and demonstrates its effectiveness in routing and safety-critical tasks.
Contribution
It proposes a novel formulation for reliable RL that ensures performance guarantees and shows how to adapt existing algorithms to this new setting.
Findings
Reliable policies can be derived using standard RL algorithms with state augmentation.
The approach effectively balances efficiency and reliability in stochastic environments.
Numerical experiments validate the practical benefits of the proposed method.
Abstract
In this work, we address the problem of determining reliable policies in reinforcement learning (RL), with a focus on optimization under uncertainty and the need for performance guarantees. While classical RL algorithms aim at maximizing the expected return, many real-world applications - such as routing, resource allocation, or sequential decision-making under risk - require strategies that ensure not only high average performance but also a guaranteed probability of success. To this end, we propose a novel formulation in which the objective is to maximize the probability that the cumulative return exceeds a prescribed threshold. We demonstrate that this reliable RL problem can be reformulated, via a state-augmented representation, into a standard RL problem, thereby allowing the use of existing RL and deep RL algorithms without the need for entirely new algorithmic frameworks.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Vehicle Routing Optimization Methods · Traffic control and management
