Concentration of Cumulative Reward in Markov Decision Processes

Borna Sayedana; Peter E. Caines; Aditya Mahajan

arXiv:2411.18551·cs.LG·December 4, 2025

Concentration of Cumulative Reward in Markov Decision Processes

Borna Sayedana, Peter E. Caines, Aditya Mahajan

PDF

Open Access

TL;DR

This paper studies how the total reward in Markov Decision Processes concentrates around its expected value, providing unified asymptotic and non-asymptotic bounds applicable to various settings, with implications for policy comparison and learning.

Contribution

It introduces a unified framework for reward concentration in MDPs, covering both asymptotic and non-asymptotic regimes, and explores implications for policy evaluation and regret definitions.

Findings

01

Established law of large numbers and CLT for MDP rewards

02

Derived Azuma-Hoeffding-type inequalities for finite-horizon rewards

03

Showed rate-equivalence of different regret definitions

Abstract

In this paper, we investigate the concentration properties of cumulative reward in Markov Decision Processes (MDPs), focusing on both asymptotic and non-asymptotic settings. We introduce a unified approach to characterize reward concentration in MDPs, covering both infinite-horizon settings (i.e., average and discounted reward frameworks) and finite-horizon setting. Our asymptotic results include the law of large numbers, the central limit theorem, and the law of iterated logarithms, while our non-asymptotic bounds include Azuma-Hoeffding-type inequalities and a non-asymptotic version of the law of iterated logarithms. Additionally, we explore two key implications of our results. First, we analyze the sample path behavior of the difference in rewards between any two stationary policies. Second, we show that two alternative definitions of regret for learning policies proposed in the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Research in Systems and Signal Processing