Entropy-Reservoir Bregman Projection: An Information-Geometric Unification of Model Collapse
Jingwei Chen

TL;DR
This paper introduces ERBP, an information-geometric framework that unifies understanding of model collapse in self-referential learning and proposes a stabilizing entropy reservoir to prevent collapse.
Contribution
ERBP provides a theoretical foundation and practical guidelines for stabilizing self-referential models by controlling entropy flux, unifying various heuristics under a single principle.
Findings
Entropy decay leads to collapse in self-referential models.
Introducing an entropy reservoir stabilizes the models.
Experimental results confirm theoretical predictions across multiple domains.
Abstract
Self-referential learning -- training a model on data it generated itself -- promises boundless scalability but chronically suffers from model collapse: language models degenerate into repetitive text, GANs drop modes, and reinforcement-learning policies over-exploit. Although practitioners employ ad~hoc fixes such as real-data mixing, entropy bonuses, knowledge distillation, or retrieval-augmented generation, a single principle that explains both the failure mode and the success of these fixes has remained elusive. We present Entropy-Reservoir Bregman Projection (ERBP), an information-geometric framework that unifies these phenomena. We model the closed loop as a stochastic Bregman projection sequence in distribution space. Without external coupling, finite-sample noise forces the system to project onto an ever-shrinking empirical support, causing exponential entropy decay and eventual…
Peer Reviews
Decision·Submitted to ICLR 2026
The paper has a few strengths: 1. The core idea of modeling self-referential learning as a Bregman projection dynamical system is elegant and provides a powerful new language for analyzing these systems. 2. The "Entropy Reservoir" concept is insightful, successfully connecting disparate, seemingly ad hoc techniques (like data mixing, RLHF, and label smoothing) under a single, coherent mathematical principle. 3. The paper provides theoretical proofs for its claims, formalizing the conditions for
The paper's primary, and critical, weakness is a failure to substantiate its broad claims with empirical evidence. The experimental section is critically incomplete: 1. The abstract explicitly claims validation across large-language-model self-training, Soft Actor-Critic in reinforcement learning, and GAN optimisation. However, Section 6 directly contradicts this, stating that the work on "LLM fine-tuning and reinforcement learning" is "planned future work". The GAN experiment is never mentione
1) The paper provides a unifying perspective that cleanly ties together disparate "folk remedies" via the $\lambda$-coupled reservoir. 2) The proposed method provides simple, quantitative conditions that are easy to reason about and potentially monitor during training. 3) The results are demonstrated with a breadth across various divergences beyond just KL.
1) The use of terminology is a little bit confusing. As a researcher from the generative models community, the term that I'm more familiar with is "mode collapse" instead of "model collapse". I originally thought the authors wanted to propose a new definition that describes a different class of model failure case, but according to the paper it seems like the authors are just describing "mode collapse". Please correct me if I'm wrong. 2) While the proposed framework can be very promising, and in
1. The novelty is ok. Recasting self-training dynamics as Bregman projection processes is novel and potentially unifying for understanding entropy decay across domains. 2. This paper is clearly organized, with some tables summarizing conceptual mappings.
1. Mathematical inconsistency / over-claim in theoretical findings, e.g., thm 1. 2. Lack of proof detail. For example, in the proof of thm 1, what does ``martingale convergence plus the support argument of the main text finishes the proof.'' mean?
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning
