Balancing the Budget: Understanding Trade-offs Between Supervised and Preference-Based Finetuning
Mohit Raghavendra, Junmo Kang, Alan Ritter

TL;DR
This paper investigates how to best allocate limited annotation budgets between supervised and preference-based finetuning for large language models, revealing that initial supervised training is crucial in low-data regimes and a hybrid approach benefits larger budgets.
Contribution
It provides a comprehensive analysis of budget allocation strategies between supervised and preference finetuning, offering practical guidelines for optimizing model performance under data constraints.
Findings
Supervised finetuning on base models is best in low-data regimes (<1,000 examples).
Combining SFT and PFT with increasing preference data improves performance at larger budgets.
Skipping SFT and directly applying PFT causes a cold start problem due to distribution shift.
Abstract
Post-training of Large Language Models often involves a pipeline of Supervised Finetuning (SFT) followed by Preference Finetuning (PFT) using methods like Direct Preference Optimization. Both stages require annotated data that are very different in structure and costs. We study how to optimally allocate a fixed training data budget between the two stages, through extensive experiments spanning four diverse tasks, multiple model sizes and various data annotation costs. Our findings reveal that just SFT on the base model dominates performance in low-data regimes ( annotated examples). With larger data-budgets, we observe that a combination of SFT and PFT, often with increasing portions allocated towards preference data yields optimal performance. However, completely eliminating SFT and running PFT directly on the base model yields suboptimal performance, described as the cold…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsTopic Modeling · Machine Learning and Data Classification · Natural Language Processing Techniques
MethodsDirect Preference Optimization · Shrink and Fine-Tune · Balanced Selection
