Measuring the Unmeasurable: Markov Chain Reliability for LLM Agents

Phat T. Tran-Truong; Xuan-Bach Le

arXiv:2604.24579·cs.SE·April 28, 2026

Measuring the Unmeasurable: Markov Chain Reliability for LLM Agents

Phat T. Tran-Truong, Xuan-Bach Le

PDF

TL;DR

This paper introduces TraceToChain, a pipeline that models LLM agent traces as Markov chains, enabling detailed reliability analysis, diagnostics, and uncertainty quantification beyond traditional scalar metrics.

Contribution

It presents a reproducible method for fitting agent execution traces to Markov chains with diagnostics, uncertainty estimates, and a unified success-time distribution framework.

Findings

01

TraceToChain accurately fits agent traces with high goodness-of-fit.

02

The approach unifies various reliability metrics into a single success-time distribution.

03

Empirical tests show close alignment between fitted models and observed data.

Abstract

Large language model (LLM) agents increasingly operate as sequential software systems, but their reliability is often summarized by scalar benchmark metrics. Metrics such as pass $@ k$ , pass $^{k}$ , and the reliability decay curve (RDC) are useful summaries, but they do not identify the success-time distribution being estimated, test whether traces support that distribution, or quantify finite-trace uncertainty. We present \textsc{TraceToChain}, a reproducible pipeline that fits agent execution traces to an absorbing discrete-time Markov chain (DTMC), $\hat{M} = (\hat{Q}, \hat{R}_{\oplus}, \hat{R}_{⊖})$ , with explicit diagnostics and uncertainty. The pipeline builds an automatic cluster taxonomy, estimates transitions with Laplace-smoothed maximum-likelihood estimation (MLE), checks fit with a composite Akaike information criterion (AIC) and Kolmogorov--Smirnov (KS) goodness-of-fit certificate,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.