Designing Service Systems from Textual Evidence

Ruicheng Ao; Hongyu Chen; Siyang Gao; Hanwei Li; David Simchi-Levi

arXiv:2603.10400·cs.LG·March 12, 2026

Designing Service Systems from Textual Evidence

Ruicheng Ao, Hongyu Chen, Siyang Gao, Hanwei Li, David Simchi-Levi

PDF

Open Access

TL;DR

This paper develops a method to identify the best service system configuration using biased automated evaluations and minimal human audits, leveraging a new estimator and decision algorithm to reduce costs while maintaining high confidence.

Contribution

It introduces a novel estimator and the PP-LUCB algorithm for sequentially selecting service configurations, effectively combining biased LLM scores with selective human audits.

Findings

01

Successfully identified the best model in all trials

02

Achieved 90% reduction in audit costs

03

Proved theoretical guarantees for the estimator and algorithm

Abstract

Designing service systems requires selecting among alternative configurations -- choosing the best chatbot variant, the optimal routing policy, or the most effective quality control procedure. In many service systems, the primary evidence of performance quality is textual -- customer support transcripts, complaint narratives, compliance review reports -- rather than the scalar measurements assumed by classical optimization methods. Large language models (LLMs) can read such textual evidence and produce standardized quality scores, but these automated judges exhibit systematic biases that vary across alternatives and evaluation instances. Human expert review remains accurate but costly. We study how to identify the best service configuration with high confidence while minimizing expensive human audits, given that automated evaluation is cheap but biased. We formalize this as a sequential…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSentiment Analysis and Opinion Mining · Topic Modeling · AI in Service Interactions