R.I.P.: Better Models by Survival of the Fittest Prompts

Ping Yu; Weizhe Yuan; Olga Golovneva; Tianhao Wu; Sainbayar; Sukhbaatar; Jason Weston; Jing Xu

arXiv:2501.18578·cs.CL·February 27, 2025

R.I.P.: Better Models by Survival of the Fittest Prompts

Ping Yu, Weizhe Yuan, Olga Golovneva, Tianhao Wu, Sainbayar, Sukhbaatar, Jason Weston, Jing Xu

PDF

Open Access 2 Datasets 1 Video

TL;DR

This paper introduces Rejecting Instruction Preferences (RIP), a method to evaluate and filter training prompts based on response quality variance, leading to significant performance improvements in language models.

Contribution

RIP provides a novel data filtering technique that enhances training data quality and model performance by assessing response variance and preference gaps.

Findings

01

RIP improves benchmark scores significantly.

02

Filtering with RIP enhances model robustness.

03

Performance gains observed across multiple Llama models.

Abstract

Training data quality is one of the most important drivers of final model quality. In this work, we introduce a method for evaluating data integrity based on the assumption that low-quality input prompts result in high variance and low quality responses. This is achieved by measuring the rejected response quality and the reward gap between the chosen and rejected preference pair. Our method, Rejecting Instruction Preferences (RIP) can be used to filter prompts from existing training sets, or to make high quality synthetic datasets, yielding large performance gains across various benchmarks compared to unfiltered data. Using Llama 3.1-8B-Instruct, RIP improves AlpacaEval2 LC Win Rate by 9.4%, Arena-Hard by 8.7%, and WildBench by 9.9%. Using Llama 3.3-70B-Instruct, RIP improves Arena-Hard from 67.5 to 82.9, which is from 18th place to 6th overall in the leaderboard.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Videos

R.I.P.: Better Models by Survival of the Fittest Prompts· slideslive

Taxonomy

TopicsSimulation Techniques and Applications

MethodsLLaMA