DataDignity: Training Data Attribution for Large Language Models

Xiaomin Li; Andrzej Banburski-Fahey; Jaron Lanier

arXiv:2605.05687·cs.AI·May 8, 2026

DataDignity: Training Data Attribution for Large Language Models

Xiaomin Li, Andrzej Banburski-Fahey, Jaron Lanier

PDF

TL;DR

This paper introduces FakeWiki, a benchmark for evaluating provenance attribution in language models, and proposes methods that significantly improve retrieval accuracy across various models and query conditions.

Contribution

It presents FakeWiki, a new controlled benchmark for provenance attribution, and develops ScoringModel, a supervised contrastive ranker that outperforms baselines in identifying source documents.

Findings

01

ScoringModel improves mean Recall@10 from 35.0 to 52.2 across models.

02

SteerFuse, a training-free method, performs competitively as a complement to retrieval.

03

ScoringModel enhances performance on jailbreak-inspired queries by 15.7 points.

Abstract

Auditing language-model outputs often requires more than judging correctness: an auditor may need to identify which source document most likely supports the knowledge expressed in a response. We study this as pinpoint provenance: given a prompt, a target-model response, and a candidate corpus, rank the documents that best support the response. We introduce FakeWiki, a controlled benchmark of 3,537 fabricated Wikipedia-style articles designed to preserve ground-truth provenance while weakening lexical shortcuts. FakeWiki includes QA probes, source-preserving paraphrases, retro-generated variants, hard anti-documents that remain topically similar while removing answer-critical facts, and five query conditions: clean prompting plus four jailbreak-inspired transformations. We evaluate seven retrieval baselines, a training-free activation-steering retrieval-fusion method, SteerFuse, and a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.