TL;DR
This paper introduces a novel algorithm for identifying near-optimal policies in robust constrained MDPs, overcoming limitations of traditional policy gradient methods by using an epigraph form and bisection search.
Contribution
It presents the first guaranteed algorithm for near-optimal policy identification in RCMDPs, utilizing epigraph reformulation and a bisection search approach.
Findings
The proposed algorithm guarantees $ ilde{O}(rac{1}{ ext{epsilon}^4})$ policy evaluations.
Conventional policy gradient methods can get trapped in suboptimal solutions due to conflicting gradients.
The epigraph form effectively resolves gradient conflicts in the RCMDP optimization.
Abstract
Designing a safe policy for uncertain environments is crucial in real-world control systems. However, this challenge remains inadequately addressed within the Markov decision process (MDP) framework. This paper presents the first algorithm guaranteed to identify a near-optimal policy in a robust constrained MDP (RCMDP), where an optimal policy minimizes cumulative cost while satisfying constraints in the worst-case scenario across a set of environments. We first prove that the conventional policy gradient approach to the Lagrangian max-min formulation can become trapped in suboptimal solutions. This occurs when its inner minimization encounters a sum of conflicting gradients from the objective and constraint functions. To address this, we leverage the epigraph form of the RCMDP problem, which resolves the conflict by selecting a single gradient from either the objective or the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
