The Number of Trials Matters in Infinite-Horizon General-Utility Markov Decision Processes

Pedro P. Santos; Alberto Sardinha; Francisco S. Melo

arXiv:2409.15128·cs.LG·July 2, 2025

The Number of Trials Matters in Infinite-Horizon General-Utility Markov Decision Processes

Pedro P. Santos, Alberto Sardinha, Francisco S. Melo

PDF

Open Access 1 Video

TL;DR

This paper investigates how the number of sampled trajectories affects policy evaluation in infinite-horizon general-utility Markov decision processes, revealing that the number of trials significantly influences performance estimates.

Contribution

It provides the first analysis of the impact of trial count in infinite-horizon GUMDPs, including bounds and empirical insights on how trials affect policy evaluation accuracy.

Findings

01

Number of trials influences policy performance estimates in GUMDPs.

02

Bounds established for the mismatch between finite and infinite trials.

03

Empirical results show the structure of GUMDP affects the impact of trial count.

Abstract

The general-utility Markov decision processes (GUMDPs) framework generalizes the MDPs framework by considering objective functions that depend on the frequency of visitation of state-action pairs induced by a given policy. In this work, we contribute with the first analysis on the impact of the number of trials, i.e., the number of randomly sampled trajectories, in infinite-horizon GUMDPs. We show that, as opposed to standard MDPs, the number of trials plays a key-role in infinite-horizon GUMDPs and the expected performance of a given policy depends, in general, on the number of trials. We consider both discounted and average GUMDPs, where the objective function depends, respectively, on discounted and average frequencies of visitation of state-action pairs. First, we study policy evaluation under discounted GUMDPs, proving lower and upper bounds on the mismatch between the finite and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

The Number of Trials Matters in Infinite-Horizon General-Utility Markov Decision Processes· slideslive

Taxonomy

TopicsSimulation Techniques and Applications · Bayesian Modeling and Causal Inference

MethodsSparse Evolutionary Training