Beyond Sample-Level Feedback: Using Reference-Level Feedback to Guide Data Synthesis
Shuhaib Mehri, Xiusi Chen, Heng Ji, Dilek Hakkani-T\"ur

TL;DR
This paper introduces Reference-Level Feedback, a novel method for guiding synthetic data generation for instruction tuning, resulting in higher quality datasets and improved model performance.
Contribution
The paper proposes Reference-Level Feedback to enhance synthetic data quality by leveraging reference samples, surpassing traditional sample-level feedback methods.
Findings
Synthesized REFED dataset with 10K instruction-response pairs.
Fine-tuned models achieved state-of-the-art performance on AlpacaEval 2.0.
Reference-Level Feedback outperforms traditional methods and generalizes across models.
Abstract
High-quality instruction-tuning data is crucial for developing Large Language Models (LLMs) that can effectively navigate real-world tasks and follow human instructions. While synthetic data generation offers a scalable approach for creating such datasets, it imposes a quality ceiling where models trained on the data cannot outperform the LLM generating it. To overcome this limitation, we introduce Reference-Level Feedback, a paradigm that extracts desirable characteristics from carefully curated reference samples to guide the synthesis of higher-quality instruction-response pairs. Using this approach, we synthesize REFED, a dataset of 10K instruction-response pairs. Fine-tuning Llama-3.1-8B-Instruct and Mistral-7B-Instruct on REFED demonstrate state-of-the-art performance among similarly sized models, notably reaching a 43.96\% length-controlled win-rate on AlpacaEval 2.0. Extensive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsExperimental Learning in Engineering · Numerical Methods and Algorithms · Intelligent Tutoring Systems and Adaptive Learning
