A learning-based approach to stochastic optimal control under reach-avoid constraint

Tingting Ni; Maryam Kamgarpour

arXiv:2412.16561·math.OC·September 30, 2025

A learning-based approach to stochastic optimal control under reach-avoid constraint

Tingting Ni, Maryam Kamgarpour

PDF

Open Access

TL;DR

This paper introduces a model-free, learning-based method for stochastic reach-avoid control, transforming the problem into a constrained MDP on an augmented state space and using policy gradients to find optimal policies from data.

Contribution

It reformulates the reach-avoid control problem as a constrained MDP on an extended state space and develops a log-barrier policy gradient method for model-free learning.

Findings

01

The approach guarantees high-probability satisfaction of reach-avoid constraints.

02

Convergence of policy parameters to optimal solutions is proven under certain assumptions.

03

The method effectively learns optimal policies from trajectory data without system models.

Abstract

We develop a model-free approach to optimally control stochastic, Markovian systems subject to a reach-avoid constraint. Specifically, the state trajectory must remain within a safe set while reaching a target set within a finite time horizon. Due to the time-dependent nature of these constraints, we show that, in general, the optimal policy for this constrained stochastic control problem is non-Markovian, which increases the computational complexity. To address this challenge, we apply the state-augmentation technique from arXiv:2402.19360, reformulating the problem as a constrained Markov decision process (CMDP) on an extended state space. This transformation allows us to search for a Markovian policy, avoiding the complexity of non-Markovian policies. To learn the optimal policy without a system model, and using only trajectory data, we develop a log-barrier policy gradient approach.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Control Systems Optimization · Reservoir Engineering and Simulation Methods

MethodsSparse Evolutionary Training