Dynamic Context Evolution for Scalable Synthetic Data Generation

Ryan Lingo; Rajeev Chhajer

arXiv:2604.07147·cs.CL·April 9, 2026

Dynamic Context Evolution for Scalable Synthetic Data Generation

Ryan Lingo, Rajeev Chhajer

PDF

TL;DR

This paper introduces Dynamic Context Evolution (DCE), a framework that mitigates output repetition in large language models during batch prompting by using self-assessment, semantic memory, and adaptive prompt updates.

Contribution

DCE provides a principled approach combining filtering, memory, and prompt adaptation to enhance diversity and conceptual richness in synthetic data generation.

Findings

01

DCE reduces mode collapse to 0% compared to 5.6% for naive prompting.

02

DCE produces significantly more diverse clusters (17-18) than naive methods (2-17).

03

Results are validated across multiple domains, models, and sensitivity settings.

Abstract

Large language models produce repetitive output when prompted independently across many batches, a phenomenon we term cross-batch mode collapse: the progressive loss of output diversity when a language model is prompted repeatedly without access to its prior generations. Practitioners have long mitigated this with ad hoc deduplication and seed rotation, but no principled framework exists. We introduce Dynamic Context Evolution (DCE), comprising three mechanisms: (1) verbalized tail sampling (the model labels each idea with a guess about how obvious it is, and obvious ideas are discarded), which filters high-probability candidates via model self-assessment; (2) semantic memory, which maintains a persistent embedding index to reject near-duplicates across batches; and (3) adaptive prompt evolution, which reconstructs the generation prompt each batch using memory state and rotating…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.