A learning-based approach to stochastic optimal control under reach-avoid constraint
Tingting Ni, Maryam Kamgarpour

TL;DR
This paper introduces a model-free, learning-based method for stochastic reach-avoid control, transforming the problem into a constrained MDP on an augmented state space and using policy gradients to find optimal policies from data.
Contribution
It reformulates the reach-avoid control problem as a constrained MDP on an extended state space and develops a log-barrier policy gradient method for model-free learning.
Findings
The approach guarantees high-probability satisfaction of reach-avoid constraints.
Convergence of policy parameters to optimal solutions is proven under certain assumptions.
The method effectively learns optimal policies from trajectory data without system models.
Abstract
We develop a model-free approach to optimally control stochastic, Markovian systems subject to a reach-avoid constraint. Specifically, the state trajectory must remain within a safe set while reaching a target set within a finite time horizon. Due to the time-dependent nature of these constraints, we show that, in general, the optimal policy for this constrained stochastic control problem is non-Markovian, which increases the computational complexity. To address this challenge, we apply the state-augmentation technique from arXiv:2402.19360, reformulating the problem as a constrained Markov decision process (CMDP) on an extended state space. This transformation allows us to search for a Markovian policy, avoiding the complexity of non-Markovian policies. To learn the optimal policy without a system model, and using only trajectory data, we develop a log-barrier policy gradient approach.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Control Systems Optimization · Reservoir Engineering and Simulation Methods
MethodsSparse Evolutionary Training
