Iterative Label Refinement Matters More than Preference Optimization   under Weak Supervision

Yaowen Ye; Cassidy Laidlaw; Jacob Steinhardt

arXiv:2501.07886·cs.LG·January 15, 2025

Iterative Label Refinement Matters More than Preference Optimization under Weak Supervision

Yaowen Ye, Cassidy Laidlaw, Jacob Steinhardt

PDF

Open Access 1 Repo

TL;DR

This paper demonstrates that iterative label refinement (ILR) can outperform traditional reinforcement learning from human feedback (RLHF) in training language models with unreliable supervision, emphasizing data improvement over preference optimization.

Contribution

The paper introduces ILR, a novel method that enhances supervised fine-tuning data using comparison feedback, outperforming RLHF in unreliable supervision scenarios.

Findings

01

ILR improves model performance on math, coding, and safety tasks.

02

RLHF fails to improve models beyond supervised fine-tuning under unreliable supervision.

03

Data refinement is more effective than preference optimization in complex tasks with noisy supervision.

Abstract

Language model (LM) post-training relies on two stages of human supervision: task demonstrations for supervised finetuning (SFT), followed by preference comparisons for reinforcement learning from human feedback (RLHF). As LMs become more capable, the tasks they are given become harder to supervise. Will post-training remain effective under unreliable supervision? To test this, we simulate unreliable demonstrations and comparison feedback using small LMs and time-constrained humans. We find that in the presence of unreliable supervision, SFT still retains some effectiveness, but DPO (a common RLHF algorithm) fails to improve the model beyond SFT. To address this, we propose iterative label refinement (ILR) as an alternative to RLHF. ILR improves the SFT data by using comparison feedback to decide whether human demonstrations should be replaced by model-generated alternatives, then…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

helloelwin/iterative-label-refinement
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Multi-Objective Optimization Algorithms

MethodsDirect Preference Optimization · Shrink and Fine-Tune