Learning Constrained Markov Decision Processes With Non-stationary Rewards and Constraints
Francesco Emanuele Stradi, Anna Lunghi, Matteo Castiglioni, Alberto, Marchesi, Nicola Gatti

TL;DR
This paper develops algorithms for constrained Markov decision processes with non-stationary rewards and constraints, achieving near-optimal regret and constraint violation bounds that adapt to the environment's non-stationarity.
Contribution
It introduces algorithms that handle non-stationarity in CMDPs, providing performance guarantees that degrade gracefully with environment changes, extending prior impossibility results.
Findings
Algorithms attain ( \, \\sqrt{T} + C) regret and positive constraint violation.
Performance degrades smoothly as non-stationarity increases, matching worst-case bounds.
A meta-procedure is proposed for unknown non-stationarity levels, applicable to broader online learning settings.
Abstract
In constrained Markov decision processes (CMDPs) with adversarial rewards and constraints, a well-known impossibility result prevents any algorithm from attaining both sublinear regret and sublinear constraint violation, when competing against a best-in-hindsight policy that satisfies constraints on average. In this paper, we show that this negative result can be eased in CMDPs with non-stationary rewards and constraints, by providing algorithms whose performances smoothly degrade as non-stationarity increases. Specifically, we propose algorithms attaining regret and positive constraint violation under bandit feedback, where is a corruption value measuring the environment non-stationarity. This can be in the worst case, coherently with the impossibility result for adversarial CMDPs. First, we design an algorithm with the desired…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Modeling and Causal Inference
