Stage-Audit: Auditable Source-Frontier Discovery for Cross-Wiki Tables

Chen Shen

arXiv:2605.20478·cs.CL·May 21, 2026

Stage-Audit: Auditable Source-Frontier Discovery for Cross-Wiki Tables

Chen Shen

PDF

TL;DR

Stage-Audit introduces a novel framework with disjoint roles and a comprehensive audit taxonomy to improve the accuracy and traceability of source citations in LLM-curated cross-Wiki tables.

Contribution

It presents a new audit mechanism that significantly enhances source-frontier precision and F1 scores in structured table discovery tasks.

Findings

01

Source-frontier precision improved from 0.356 to 0.505 (+42%)

02

F1 score increased from 0.334 to 0.451 (+35%)

03

Maintains explicit per-row source traceability

Abstract

LLM-curated tables can appear source-grounded while containing unsupported rows: the curator may recall entries from parametric memory and retroactively attach page-level citations that are not the actual source. We study this hazard in Seed2Frontier discovery: the task of finding complement Wikipedia pages from a seed page to assemble a structured table. Stage-Audit addresses it with disjoint curator-auditor write rights, a row-level source-citation gate, and a 12-check audit taxonomy over keys, schema, source roles, cardinality, and scope. On a curated 51-instance Seed2Frontier evaluation set spanning 15 top-level domains, Stage-Audit improves source-frontier precision over a vanilla LLM curator from 0.356 to 0.505 (+42% relative) and F1 from 0.334 to 0.451 (+35%), while maintaining explicit per-row source traceability. The vanilla-LLM-vs-Stage-Audit comparison isolates the policy…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.