The Necessity of Imperfection:Reversing Model Collapse via Simulating Cognitive Boundedness
Zhongjie Jiang

TL;DR
This paper introduces a cognitive simulation framework for text generation that incorporates human-like imperfections, improving model robustness and functional performance over traditional smooth data training methods.
Contribution
It proposes the Prompt-driven Cognitive Computing Framework (PMCSF) that models human cognitive processes to generate more realistic and effective synthetic data for language models.
Findings
Cognitive text closely matches human text with low divergence and high cognitive profile alignment.
Models trained with cognitive-perturbed data show significant risk reduction in financial stress tests.
The approach offers a promising solution to AI data-collapse by emphasizing cognitive realism over statistical smoothness.
Abstract
Although synthetic data is widely promoted as a remedy, its prevailing production paradigm -- one optimizing for statistical smoothness -- systematically removes the long-tail, cognitively grounded irregularities that characterize human text. Prolonged training on such statistically optimal but cognitively impoverished data accelerates model collapse. This paper proposes a paradigm shift: instead of imitating the surface properties of data, we simulate the cognitive processes that generate human text. We introduce the Prompt-driven Cognitive Computing Framework (PMCSF), whose core consists of a Cognitive State Decoder (CSD) that reverse-engineers unstructured text into structured cognitive vectors, and a Cognitive Text Encoder (CTE) that re-materializes these states into text enriched with human-typical imperfections via mathematically defined Cognitive Perturbation Operators. The…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBlockchain Technology Applications and Security · Benford’s Law and Fraud Detection · Distributed systems and fault tolerance
