The Self-Improvement Paradox: Can Language Models Bootstrap Reasoning   Capabilities without External Scaffolding?

Yutao Sun; Mingshuai Chen; Tiancheng Zhao; Ruochen Xu; Zilun Zhang,; Jianwei Yin

arXiv:2502.13441·cs.CL·February 20, 2025

The Self-Improvement Paradox: Can Language Models Bootstrap Reasoning Capabilities without External Scaffolding?

Yutao Sun, Mingshuai Chen, Tiancheng Zhao, Ruochen Xu, Zilun Zhang,, Jianwei Yin

PDF

Open Access 1 Video

TL;DR

This paper introduces Crescent, a fully autonomous framework for self-improving large language models by generating synthetic question-answer data without external supervision, significantly enhancing reasoning capabilities and knowledge distillation.

Contribution

Crescent demonstrates that LLMs can self-generate high-quality training data for reasoning tasks without external signals, advancing autonomous model improvement methods.

Findings

01

Crescent improves LLM reasoning performance without external supervision.

02

Synthetic data from Crescent enhances knowledge distillation to smaller models.

03

The framework maintains general performance while boosting specific reasoning skills.

Abstract

Self-improving large language models (LLMs) -- i.e., to improve the performance of an LLM by fine-tuning it with synthetic data generated by itself -- is a promising way to advance the capabilities of LLMs while avoiding extensive supervision. Existing approaches to self-improvement often rely on external supervision signals in the form of seed data and/or assistance from third-party models. This paper presents Crescent -- a simple yet effective framework for generating high-quality synthetic question-answer data in a fully autonomous manner. Crescent first elicits the LLM to generate raw questions via a bait prompt, then diversifies these questions leveraging a rejection sampling-based self-deduplication, and finally feeds the questions to the LLM and collects the corresponding answers by means of majority voting. We show that Crescent sheds light on the potential of true…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

The Self-Improvement Paradox: Can Language Models Bootstrap Reasoning Capabilities without External Scaffolding?· underline

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling