The Self-Improvement Paradox: Can Language Models Bootstrap Reasoning Capabilities without External Scaffolding?
Yutao Sun, Mingshuai Chen, Tiancheng Zhao, Ruochen Xu, Zilun Zhang,, Jianwei Yin

TL;DR
This paper introduces Crescent, a fully autonomous framework for self-improving large language models by generating synthetic question-answer data without external supervision, significantly enhancing reasoning capabilities and knowledge distillation.
Contribution
Crescent demonstrates that LLMs can self-generate high-quality training data for reasoning tasks without external signals, advancing autonomous model improvement methods.
Findings
Crescent improves LLM reasoning performance without external supervision.
Synthetic data from Crescent enhances knowledge distillation to smaller models.
The framework maintains general performance while boosting specific reasoning skills.
Abstract
Self-improving large language models (LLMs) -- i.e., to improve the performance of an LLM by fine-tuning it with synthetic data generated by itself -- is a promising way to advance the capabilities of LLMs while avoiding extensive supervision. Existing approaches to self-improvement often rely on external supervision signals in the form of seed data and/or assistance from third-party models. This paper presents Crescent -- a simple yet effective framework for generating high-quality synthetic question-answer data in a fully autonomous manner. Crescent first elicits the LLM to generate raw questions via a bait prompt, then diversifies these questions leveraging a rejection sampling-based self-deduplication, and finally feeds the questions to the LLM and collects the corresponding answers by means of majority voting. We show that Crescent sheds light on the potential of true…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling
