Long Is More for Alignment: A Simple but Tough-to-Beat Baseline for   Instruction Fine-Tuning

Hao Zhao; Maksym Andriushchenko; Francesco Croce; Nicolas Flammarion

arXiv:2402.04833·cs.CL·June 5, 2024·1 cites

Long Is More for Alignment: A Simple but Tough-to-Beat Baseline for Instruction Fine-Tuning

Hao Zhao, Maksym Andriushchenko, Francesco Croce, Nicolas Flammarion

PDF

Open Access 1 Repo 2 Datasets

TL;DR

Selecting the longest instruction responses from datasets is a simple yet highly effective baseline for instruction fine-tuning of large language models, outperforming more complex methods and improving model capabilities with minimal data.

Contribution

Demonstrates that a straightforward baseline of choosing the longest responses surpasses sophisticated selection methods in instruction fine-tuning, with additional lightweight refinement further enhancing performance.

Findings

01

Long response selection outperforms state-of-the-art methods.

02

Lightweight refinement improves fine-tuned model abilities.

03

Effective with minimal data and no extra preference data.

Abstract

There is a consensus that instruction fine-tuning of LLMs requires high-quality data, but what are they? LIMA (NeurIPS 2023) and AlpaGasus (ICLR 2024) are state-of-the-art methods for selecting such high-quality examples, either via manual curation or using GPT-3.5-Turbo as a quality scorer. We show that the extremely simple baseline of selecting the 1,000 instructions with longest responses -- that intuitively contain more learnable information and are harder to overfit -- from standard datasets can consistently outperform these sophisticated methods according to GPT-4 and PaLM-2 as judges, while remaining competitive on the Open LLM benchmarks that test factual knowledge. We demonstrate this for several LLMs (Llama-2-7B, Llama-2-13B, Mistral-7B-v0.1) and datasets (Alpaca-52k, Evol-Instruct-70k). In addition, a lightweight refinement of such long instructions can further improve the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

tml-epfl/long-is-more-for-alignment
jaxOfficial

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExperimental Learning in Engineering

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · {Dispute@FaQ-s}How to file a dispute with Expedia? · 15 Ways to Contact How can i speak to someone at Delta Airlines · Attention Is All You Need · Position-Wise Feed-Forward Layer · Label Smoothing · Cosine Annealing · Absolute Position Encodings · Linear Layer · Byte Pair Encoding