Real-Time Trustworthiness Scoring for LLM Structured Outputs and Data Extraction

Hui Wen Goh; Jonas Mueller

arXiv:2603.18014·cs.CL·April 1, 2026

Real-Time Trustworthiness Scoring for LLM Structured Outputs and Data Extraction

Hui Wen Goh, Jonas Mueller

PDF

TL;DR

CONSTRUCT is a real-time trustworthiness scoring system for LLM structured outputs that identifies errors and helps prioritize human review without requiring labeled data or model modifications.

Contribution

It introduces a novel, model-agnostic uncertainty estimator for structured outputs, supporting complex schemas and providing detailed trust scores for each output field.

Findings

01

CONSTRUCT outperforms existing techniques in error detection precision and recall.

02

It is applicable to black-box LLM APIs without logprobs or retraining.

03

The paper introduces one of the first public benchmarks for LLM structured output quality.

Abstract

Structured Outputs from current LLMs exhibit sporadic errors, hindering enterprise AI deployment. We present CONSTRUCT, a real-time uncertainty estimator that scores the trustworthiness of LLM Structured Outputs. Lower-scoring outputs are more likely to contain errors, enabling automatic prioritization of limited human review bandwidth. CONSTRUCT additionally scores the trustworthiness of each field within a Structured Output, helping reviewers quickly identify which parts of the output are incorrect. Our method is suitable for any LLM (including black-box LLM APIs without logprobs), does not require labeled training data or custom model deployment, and supports complex Structured Outputs with heterogeneous fields and nested JSON schemas. We also introduce one of the first public LLM Structured Output benchmarks with reliable ground-truth values. Over this four-dataset benchmark,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.