Comparing discounted and average-cost Markov Decision Processes: a statistical significance perspective
Dylan Solms

TL;DR
This paper compares discounted and average-cost Markov Decision Processes by evaluating whether their optimal policies differ significantly in system-wide performance using statistical tests.
Contribution
It introduces a framework for assessing system-based optimality metrics and applies statistical significance testing to compare policies in a queuing control problem.
Findings
Statistically significant differences identified between policies
System-based metrics can reveal performance advantages
Method applicable to various MDP problems
Abstract
Optimal Markov Decision Process policies for problems with finite state and action space are identified through a partial ordering by comparing the value function across states. This is referred to as state-based optimality. This paper identifies when such optimality guarantees some form of system-based optimality as measured by a scalar. Four such system-based metrics are introduced. Uni-variate empirical distributions of these metrics are obtained through simulation as to assess whether theoretically optimal policies provide a statistically significant advantage. This has been conducted using a Student's t-test, Welch's -test and a Mann-Whitney -test. The proposed method is applied to a common problem in queuing theory: admission control.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSimulation Techniques and Applications · Reinforcement Learning in Robotics · Formal Methods in Verification
