Experimental evidence of progressive ChatGPT models self-convergence
Konstantinos F. Xylogiannopoulos, Petros Xanthopoulos, Panagiotis Karampelas, Georgios A. Bakamitsos

TL;DR
This study provides empirical evidence that successive ChatGPT models tend to produce increasingly similar outputs over time, indicating a self-convergence phenomenon likely caused by synthetic data influence in training datasets.
Contribution
It is the first longitudinal investigation demonstrating how ChatGPT models' output diversity diminishes over successive versions due to self-convergence.
Findings
Recent ChatGPT models show reduced output diversity over time.
Synthetic data infiltration in training datasets influences model convergence.
Output similarity increases among different ChatGPT versions.
Abstract
Large Language Models (LLMs) that undergo recursive training on synthetically generated data are susceptible to model collapse, a phenomenon marked by the generation of meaningless output. Existing research has examined this issue from either theoretical or empirical perspectives, often focusing on a single model trained recursively on its own outputs. While prior studies have cautioned against the potential degradation of LLM output quality under such conditions, no longitudinal investigation has yet been conducted to assess this effect over time. In this study, we employ a text similarity metric to evaluate different ChatGPT models' capacity to generate diverse textual outputs. Our findings indicate a measurable decline of recent ChatGPT releases' ability to produce varied text, even when explicitly prompted to do so, by setting the temperature parameter to one. The observed reduction…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare and Education · Computational and Text Analysis Methods · Topic Modeling
