Comparing discounted and average-cost Markov Decision Processes: a   statistical significance perspective

Dylan Solms

arXiv:2112.00684·math.OC·December 2, 2021

Comparing discounted and average-cost Markov Decision Processes: a statistical significance perspective

Dylan Solms

PDF

Open Access

TL;DR

This paper compares discounted and average-cost Markov Decision Processes by evaluating whether their optimal policies differ significantly in system-wide performance using statistical tests.

Contribution

It introduces a framework for assessing system-based optimality metrics and applies statistical significance testing to compare policies in a queuing control problem.

Findings

01

Statistically significant differences identified between policies

02

System-based metrics can reveal performance advantages

03

Method applicable to various MDP problems

Abstract

Optimal Markov Decision Process policies for problems with finite state and action space are identified through a partial ordering by comparing the value function across states. This is referred to as state-based optimality. This paper identifies when such optimality guarantees some form of system-based optimality as measured by a scalar. Four such system-based metrics are introduced. Uni-variate empirical distributions of these metrics are obtained through simulation as to assess whether theoretically optimal policies provide a statistically significant advantage. This has been conducted using a Student's t-test, Welch's $t$ -test and a Mann-Whitney $U$ -test. The proposed method is applied to a common problem in queuing theory: admission control.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSimulation Techniques and Applications · Reinforcement Learning in Robotics · Formal Methods in Verification