Dynamic Context Evolution for Scalable Synthetic Data Generation
Ryan Lingo, Rajeev Chhajer

TL;DR
This paper introduces Dynamic Context Evolution (DCE), a framework that mitigates output repetition in large language models during batch prompting by using self-assessment, semantic memory, and adaptive prompt updates.
Contribution
DCE provides a principled approach combining filtering, memory, and prompt adaptation to enhance diversity and conceptual richness in synthetic data generation.
Findings
DCE reduces mode collapse to 0% compared to 5.6% for naive prompting.
DCE produces significantly more diverse clusters (17-18) than naive methods (2-17).
Results are validated across multiple domains, models, and sensitivity settings.
Abstract
Large language models produce repetitive output when prompted independently across many batches, a phenomenon we term cross-batch mode collapse: the progressive loss of output diversity when a language model is prompted repeatedly without access to its prior generations. Practitioners have long mitigated this with ad hoc deduplication and seed rotation, but no principled framework exists. We introduce Dynamic Context Evolution (DCE), comprising three mechanisms: (1) verbalized tail sampling (the model labels each idea with a guess about how obvious it is, and obvious ideas are discarded), which filters high-probability candidates via model self-assessment; (2) semantic memory, which maintains a persistent embedding index to reject near-duplicates across batches; and (3) adaptive prompt evolution, which reconstructs the generation prompt each batch using memory state and rotating…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
