Text Corpora as Concept Fields: Black-Box Hallucination and Novelty Measurement
Nicholas S. Kersting, Vittorio Castelli, Chieh Ting Yeh, Xinzhu Wang, Saad Taame

TL;DR
This paper introduces Concept Fields, a novel method for analyzing text corpora by estimating local drift fields in sentence-embedding space to detect hallucination and novelty, with applications in large-scale datasets.
Contribution
The paper presents Concept Fields and a Vector Sequence Database, enabling black-box, corpus-attributable, and interpretable detection of groundedness and novelty in text.
Findings
Effective hallucination detection on regulatory texts.
Strong performance in novelty detection over literary corpora.
Cross-domain stability of the deviation score across datasets.
Abstract
We introduce the \textbf{Concept Field} of a text corpus: a local drift field with pointwise uncertainty, estimated in sentence-embedding space from the deltas between consecutive sentences. Given a candidate sentence transition, we score its agreement with the field by , the mean absolute z-distance between the observed delta and the field's local Gaussian estimate. The score is black-box (no model internals), corpus-attributable (every score traces to nearby corpus sentences), and admits a probabilistically motivated interpretation under a local Gaussian approximation. We support the computation with the introduction of a \textbf{Vector Sequence Database (VSDB)} that stores embeddings together with sequence-position and next-delta metadata. We evaluate this approach on two large-scale settings: hallucination-style groundedness detection over the U.S. Code of Federal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
