The Delta Learning Hypothesis: Preference Tuning on Weak Data can Yield Strong Gains

Scott Geng; Hamish Ivison; Chun-Liang Li; Maarten Sap; Jerry Li; Ranjay Krishna; Pang Wei Koh

arXiv:2507.06187·cs.AI·July 9, 2025

The Delta Learning Hypothesis: Preference Tuning on Weak Data can Yield Strong Gains

Scott Geng, Hamish Ivison, Chun-Liang Li, Maarten Sap, Jerry Li, Ranjay Krishna, Pang Wei Koh

PDF

Open Access 1 Repo 3 Datasets

TL;DR

This paper demonstrates that preference tuning on weak data pairs can significantly improve language model performance, matching state-of-the-art models while relying on weaker supervision sources.

Contribution

It introduces the delta learning hypothesis, showing how relative quality differences in weak data can be exploited to enhance model training beyond traditional supervised methods.

Findings

01

Preference data from weak models can improve stronger models.

02

Delta learning matches state-of-the-art performance with weaker supervision.

03

Theoretical proof supports the effectiveness of relative quality signals.

Abstract

Improvements in language models are often driven by improving the quality of the data we train them on, which can be limiting when strong supervision is scarce. In this work, we show that paired preference data consisting of individually weak data points can enable gains beyond the strength of each individual data point. We formulate the delta learning hypothesis to explain this phenomenon, positing that the relative quality delta between points suffices to drive learning via preference tuning--even when supervised finetuning on the weak data hurts. We validate our hypothesis in controlled experiments and at scale, where we post-train 8B models on preference data generated by pairing a small 3B model's responses with outputs from an even smaller 1.5B model to create a meaningful delta. Strikingly, on a standard 11-benchmark evaluation suite (MATH, MMLU, etc.), our simple recipe matches…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

scottgeng00/delta_learning
noneOfficial

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Domain Adaptation and Few-Shot Learning · Machine Learning and Data Classification

MethodsBalanced Selection · Logistic Regression