Iterative Label Refinement Matters More than Preference Optimization under Weak Supervision
Yaowen Ye, Cassidy Laidlaw, Jacob Steinhardt

TL;DR
This paper demonstrates that iterative label refinement (ILR) can outperform traditional reinforcement learning from human feedback (RLHF) in training language models with unreliable supervision, emphasizing data improvement over preference optimization.
Contribution
The paper introduces ILR, a novel method that enhances supervised fine-tuning data using comparison feedback, outperforming RLHF in unreliable supervision scenarios.
Findings
ILR improves model performance on math, coding, and safety tasks.
RLHF fails to improve models beyond supervised fine-tuning under unreliable supervision.
Data refinement is more effective than preference optimization in complex tasks with noisy supervision.
Abstract
Language model (LM) post-training relies on two stages of human supervision: task demonstrations for supervised finetuning (SFT), followed by preference comparisons for reinforcement learning from human feedback (RLHF). As LMs become more capable, the tasks they are given become harder to supervise. Will post-training remain effective under unreliable supervision? To test this, we simulate unreliable demonstrations and comparison feedback using small LMs and time-constrained humans. We find that in the presence of unreliable supervision, SFT still retains some effectiveness, but DPO (a common RLHF algorithm) fails to improve the model beyond SFT. To address this, we propose iterative label refinement (ILR) as an alternative to RLHF. ILR improves the SFT data by using comparison feedback to decide whether human demonstrations should be replaced by model-generated alternatives, then…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Multi-Objective Optimization Algorithms
MethodsDirect Preference Optimization · Shrink and Fine-Tune
