Correcting Selection Bias in Sparse User Feedback for Large Language Model Quality Estimation: A Multi-Agent Hierarchical Bayesian Approach

Andrea Morandi; Mahesh Viswanathan

arXiv:2605.12177·cs.CL·May 13, 2026

Correcting Selection Bias in Sparse User Feedback for Large Language Model Quality Estimation: A Multi-Agent Hierarchical Bayesian Approach

Andrea Morandi, Mahesh Viswanathan

PDF

TL;DR

This paper introduces a hierarchical Bayesian method to correct selection bias in user feedback for large language model quality estimation, improving accuracy without ground-truth labels.

Contribution

It proposes a three-agent Bayesian pipeline that models topic and sentiment stratified biases, enabling more accurate system quality estimation from biased user feedback.

Findings

01

Hierarchical Bayesian approach reduces bias by 4-13 percentage points.

02

Using priors on feedback channels improves bias correction accuracy.

03

Method achieves credible intervals that reliably cover true system quality.

Abstract

[Abridged] Production LLM deployments receive feedback from a non-random fraction of users: thumbs sit mostly in the tails of the satisfaction distribution, and a naive average over them can land 40-50 percentage points away from true system quality. We treat this as a topic- and sentiment- stratified selection-bias problem and propose a three-agent hierarchical Bayesian pipeline that does not require ground-truth labels on individual interactions. A Topic Clustering Agent partitions the stream via UMAP + HDBSCAN over text embeddings; a Bias Modeling Agent fits a two-stage hierarchical Beta-Binomial under NUTS, inferring per-topic selection rates $s_{c}$ and quality $q_{c}$ with partial pooling; a Synthesis Agent reweights $q_{c}$ by true topic prevalence $\overset{π}{^}_{c} = n_{c} / N$ to report a bias-corrected aggregate posterior $\overset{ˉ}{Q} = \sum_{c} \overset{π}{^}_{c} q_{c}$ with credible interval, plus drift…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.