From Stochasticity to Signal: A Bayesian Latent State Model for Reliable Measurement with LLMs
Yichi Zhang, Ignacio Martinez

TL;DR
This paper introduces a Bayesian latent state model to quantify and improve the reliability of LLM-based classifications, addressing stochasticity-induced measurement errors in business and scientific contexts.
Contribution
It presents a formal Bayesian framework that jointly estimates error rates, true outcome probabilities, and intervention effects, applicable in semi-supervised and unsupervised settings.
Findings
Model accurately recovers true parameters in simulations.
Outperforms existing methods in estimating population metrics.
Provides reliable insights from LLM outputs in real-world case study.
Abstract
Large Language Models (LLMs) are increasingly used to automate classification tasks in business, such as analyzing customer satisfaction from text. However, the inherent stochasticity of LLMs can create measurement error when the outcome is considered deterministic. This problem is often neglected with the empirical practice of a single round of output, or addressed with ad-hoc methods like majority voting. Such naive approaches fail to quantify uncertainty and can produce biased estimates of population-level metrics. In this paper, we propose a formal statistical solution by introducing a Bayesian latent state model to address it. Our model treats the true classification as a latent variable and the multiple LLM ratings as noisy measurements of this outcome state. This framework jointly estimates LLM error rates, population-level outcome rates, individual-level probabilities of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
