Mind the Gap: Examining the Self-Improvement Capabilities of Large   Language Models

Yuda Song; Hanlin Zhang; Carson Eisenach; Sham Kakade; Dean Foster,; Udaya Ghai

arXiv:2412.02674·cs.CL·February 26, 2025·3 cites

Mind the Gap: Examining the Self-Improvement Capabilities of Large Language Models

Yuda Song, Hanlin Zhang, Carson Eisenach, Sham Kakade, Dean Foster,, Udaya Ghai

PDF

Open Access

TL;DR

This paper investigates the mechanisms and limits of self-improvement in large language models, introducing a formal framework and analyzing how model scaling affects self-improvement capabilities.

Contribution

It provides a mathematical formulation of LLM self-improvement and reveals a scaling phenomenon related to the generation-verification gap across different models.

Findings

01

Self-improvement scales monotonically with model pre-training flops.

02

A formal framework for understanding self-improvement in LLMs.

03

Insights into when and how iterative self-improvement is feasible.

Abstract

Self-improvement is a mechanism in Large Language Model (LLM) pre-training, post-training and test-time inference. We explore a framework where the model verifies its own outputs, filters or reweights data based on this verification, and distills the filtered data. Despite several empirical successes, a fundamental understanding is still lacking. In this work, we initiate a comprehensive, modular and controlled study on LLM self-improvement. We provide a mathematical formulation for self-improvement, which is largely governed by a quantity which we formalize as the generation-verification gap. Through experiments with various model families and tasks, we discover a scaling phenomenon of self-improvement -- a variant of the generation-verification gap scales monotonically with the model pre-training flops. We also examine when self-improvement is possible, an iterative self-improvement…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling