Text Corpora as Concept Fields: Black-Box Hallucination and Novelty Measurement

Nicholas S. Kersting; Vittorio Castelli; Chieh Ting Yeh; Xinzhu Wang; Saad Taame

arXiv:2605.05103·cs.CL·May 12, 2026

Text Corpora as Concept Fields: Black-Box Hallucination and Novelty Measurement

Nicholas S. Kersting, Vittorio Castelli, Chieh Ting Yeh, Xinzhu Wang, Saad Taame

PDF

TL;DR

This paper introduces Concept Fields, a novel method for analyzing text corpora by estimating local drift fields in sentence-embedding space to detect hallucination and novelty, with applications in large-scale datasets.

Contribution

The paper presents Concept Fields and a Vector Sequence Database, enabling black-box, corpus-attributable, and interpretable detection of groundedness and novelty in text.

Findings

01

Effective hallucination detection on regulatory texts.

02

Strong performance in novelty detection over literary corpora.

03

Cross-domain stability of the deviation score across datasets.

Abstract

We introduce the \textbf{Concept Field} of a text corpus: a local drift field with pointwise uncertainty, estimated in sentence-embedding space from the deltas between consecutive sentences. Given a candidate sentence transition, we score its agreement with the field by $ζ$ , the mean absolute z-distance between the observed delta and the field's local Gaussian estimate. The score is black-box (no model internals), corpus-attributable (every score traces to nearby corpus sentences), and admits a probabilistically motivated interpretation under a local Gaussian approximation. We support the computation with the introduction of a \textbf{Vector Sequence Database (VSDB)} that stores embeddings together with sequence-position and next-delta metadata. We evaluate this approach on two large-scale settings: hallucination-style groundedness detection over the U.S. Code of Federal…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.