If It's Nice, Do It Twice: We Should Try Iterative Corpus Curation

Robin Young

arXiv:2501.15280·cs.AI·February 4, 2026

If It's Nice, Do It Twice: We Should Try Iterative Corpus Curation

Robin Young

PDF

Open Access

TL;DR

This paper proposes an iterative process for corpus curation where models filter their training data repeatedly, leading to progressively safer datasets and models, supported by theoretical convergence analysis and practical implications.

Contribution

It introduces an iterative corpus filtering framework with theoretical guarantees of convergence to a self-consistent, safer training corpus, enhancing scalable oversight and interpretability.

Findings

01

Iterative filtering reduces harmful content in training data.

02

The process converges to a self-consistent corpus under certain conditions.

03

Single iteration yields large-scale human-readable annotations.

Abstract

Recent work demonstrates that filtering harmful content from pretraining data improves model safety without degrading capabilities. We propose a natural extension: do it again. A model trained on filtered data can filter the corpus further; training on this cleaner corpus produces an even cleaner model. We provide theoretical analysis showing this process converges to a self-consistent corpus where the model trained on it approves of its own training data. Even under the weak assumption of constant filter quality, iteration yields decay in harmful content. We argue this framework offers a novel form of scalable oversight. While model internals are opaque, the resulting corpus is human-auditable. Even a single iteration produces a large-scale preference annotations over documents, potentially valuable for interpretability research. We derive bounds on capability-safety tradeoffs and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGame Theory and Applications · Economic theories and models · Computability, Logic, AI Algorithms