Dynamic Regret in Time-varying MDPs with Intermittent Information
Negin Musavi, Melkior Ornik

TL;DR
This paper investigates how limited update rates affect decision-making in time-varying Markov decision processes, proposing a framework that quantifies performance degradation due to intermittent information updates.
Contribution
It introduces a skip-update learning and planning framework with a dynamic regret analysis that explicitly captures the impact of intermittent updates on performance.
Findings
Dynamic regret bound quantifies the effect of update frequency and temporal variation.
Regret decomposes into contributions from update times and skip intervals.
Interval length and variation rate linearly affect regret, mitigated by mixing properties.
Abstract
We study sequential decision-making in time-varying Markov decision processes (TVMDPs) under limited update rates, where the decision-maker observes the system and updates its model only intermittently. Such settings arise in applications with sensing, communication, or computational constraints that preclude continuous adaptation. Our goal is to understand how the performance of an agent, which learns and plans using receding-horizon control under these information constraints, degrades as a function of the update rate. We propose a skip-update learning and planning framework that combines likelihood-based estimation of time-varying transition kernels with finite-horizon planning and executes policies between updates using stale information. We analyze its performance via dynamic regret relative to an oracle policy with full knowledge of the dynamics and continuous observations. Our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
