Efficient Response Generation Strategy Selection for Fine-Tuning Large Language Models Through Self-Aligned Perplexity
Xuan Ren, Qi Chen, Lingqiao Liu

TL;DR
This paper introduces self-aligned perplexity, a new metric to select the best data generation strategy for fine-tuning large language models, leading to improved performance on reasoning benchmarks.
Contribution
It proposes a scalable method using self-aligned perplexity to identify optimal data generation strategies for fine-tuning LLMs, addressing variability in training data quality.
Findings
Self-aligned perplexity better captures model familiarity than traditional perplexity.
Selecting data generation strategies with this metric improves fine-tuned model performance.
The method is effective across diverse reasoning benchmarks.
Abstract
Fine-tuning large language models (LLMs) typically relies on producing large sets of input-output pairs. Yet for a given question, there can be many valid outputs. In practice, these outputs are often derived by distilling knowledge from teacher models, and they can vary depending on the specific teacher model or prompting strategy employed. Recent findings show that how these training outputs are generated can significantly affect the performance of the fine-tuned model, raising an important question: how do we pick the best data generation method from among numerous possibilities? Rather than exhaustively training and evaluating on each candidate, this paper proposes a scalable approximate method that assesses a small subset of generated data to estimate its suitability for a specific target LLM. Our central idea is that effective outputs should be familiar to the target LLM. While…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText and Document Classification Technologies · Topic Modeling · Speech Recognition and Synthesis
