Learning Constrained Markov Decision Processes With Non-stationary   Rewards and Constraints

Francesco Emanuele Stradi; Anna Lunghi; Matteo Castiglioni; Alberto; Marchesi; Nicola Gatti

arXiv:2405.14372·cs.LG·September 27, 2024

Learning Constrained Markov Decision Processes With Non-stationary Rewards and Constraints

Francesco Emanuele Stradi, Anna Lunghi, Matteo Castiglioni, Alberto, Marchesi, Nicola Gatti

PDF

Open Access

TL;DR

This paper develops algorithms for constrained Markov decision processes with non-stationary rewards and constraints, achieving near-optimal regret and constraint violation bounds that adapt to the environment's non-stationarity.

Contribution

It introduces algorithms that handle non-stationarity in CMDPs, providing performance guarantees that degrade gracefully with environment changes, extending prior impossibility results.

Findings

01

Algorithms attain ( \, \\sqrt{T} + C) regret and positive constraint violation.

02

Performance degrades smoothly as non-stationarity increases, matching worst-case bounds.

03

A meta-procedure is proposed for unknown non-stationarity levels, applicable to broader online learning settings.

Abstract

In constrained Markov decision processes (CMDPs) with adversarial rewards and constraints, a well-known impossibility result prevents any algorithm from attaining both sublinear regret and sublinear constraint violation, when competing against a best-in-hindsight policy that satisfies constraints on average. In this paper, we show that this negative result can be eased in CMDPs with non-stationary rewards and constraints, by providing algorithms whose performances smoothly degrade as non-stationarity increases. Specifically, we propose algorithms attaining $\tilde{O} (T + C)$ regret and positive constraint violation under bandit feedback, where $C$ is a corruption value measuring the environment non-stationarity. This can be $Θ (T)$ in the worst case, coherently with the impossibility result for adversarial CMDPs. First, we design an algorithm with the desired…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBayesian Modeling and Causal Inference