Stochastic Minimum-Cost Reach-Avoid Reinforcement Learning

Jingduo Pan; Taoran Wu; Yiling Xue; Bai Xue

arXiv:2605.11975·cs.LG·May 19, 2026

Stochastic Minimum-Cost Reach-Avoid Reinforcement Learning

Jingduo Pan, Taoran Wu, Yiling Xue, Bai Xue

PDF

TL;DR

This paper introduces a new reinforcement learning framework that ensures probabilistic reach-avoid constraints are satisfied while minimizing costs in stochastic environments, using novel certificates and a Bellman formulation.

Contribution

The paper proposes reach-avoid probability certificates and a contraction-based Bellman approach to jointly optimize costs and satisfy probabilistic constraints in stochastic RL.

Findings

01

Algorithms converge to locally optimal policies.

02

Experiments show higher reach-avoid satisfaction rates.

03

Cost performance is improved in MuJoCo simulations.

Abstract

We study stochastic minimum-cost reach-avoid reinforcement learning, where an agent must satisfy a reach-avoid specification with probability at least $p$ while minimizing expected cumulative costs in stochastic environments. Existing safe and constrained reinforcement learning methods typically fail to jointly enforce probabilistic reach-avoid constraints and optimize cost in the learning setting in stochastic environments. To address this challenge, we introduce reach-avoid probability certificates (RAPCs), which identify states from which stochastic reach-avoid constraints are satisfiable. Building on RAPCs, we develop a contraction-based Bellman formulation that serves as a principled surrogate for integrating reach-avoid considerations into reinforcement learning, enabling cost optimization under probabilistic constraints. We establish almost sure convergence of the proposed…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.