Refine-n-Judge: Curating High-Quality Preference Chains for LLM-Fine-Tuning
Derin Cayir, Renjie Tao, Rashi Rungta, Kai Sun, Sean Chen, Haidar Khan, Minseok Kim, Julia Reinspach, Yue Liu

TL;DR
Refine-n-Judge is an automated iterative method using a single LLM to improve dataset quality for fine-tuning, eliminating the need for human feedback and separate reward models, leading to better model performance.
Contribution
It introduces a novel LLM-based iterative refinement and evaluation process that enhances dataset quality without human annotation or additional reward models.
Findings
Models fine-tuned on Refine-n-Judge datasets outperform original datasets in preference tests.
Achieved +5% on AlpacaEval and +19% on MT-Bench benchmarks.
Over 74% of comparisons favored models trained on Refine-n-Judge data.
Abstract
Large Language Models (LLMs) have demonstrated remarkable progress through preference-based fine-tuning, which critically depends on the quality of the underlying training data. While human feedback is essential for improving data quality, it is costly and does not scale well. In this paper, we introduce Refine-n-Judge, an automated iterative approach that leverages a single LLM as both a refiner and a judge to enhance dataset quality. Unlike existing iterative refinement methods, Refine-n-Judge employs an LLM to both generate refinements and explicitly evaluate each improvement, ensuring that every iteration meaningfully enhances the dataset without requiring additional human annotation or a separate reward model. At each step, the LLM refines a response and judges whether the refinement is an improvement over the previous answer. This process continues until the LLM prefers the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Multimodal Machine Learning Applications
