LIMA: Less Is More for Alignment

Chunting Zhou; Pengfei Liu; Puxin Xu; Srini Iyer; Jiao Sun; Yuning; Mao; Xuezhe Ma; Avia Efrat; Ping Yu; Lili Yu; Susan Zhang; Gargi Ghosh; Mike; Lewis; Luke Zettlemoyer; Omer Levy

arXiv:2305.11206·cs.CL·May 22, 2023·127 cites

LIMA: Less Is More for Alignment

Chunting Zhou, Pengfei Liu, Puxin Xu, Srini Iyer, Jiao Sun, Yuning, Mao, Xuezhe Ma, Avia Efrat, Ping Yu, Lili Yu, Susan Zhang, Gargi Ghosh, Mike, Lewis, Luke Zettlemoyer, Omer Levy

PDF

Open Access 5 Repos 10 Models 5 Datasets 1 Video

TL;DR

LIMA shows that a large language model can achieve high performance and generalization with minimal instruction tuning, relying mainly on pretraining, and can even outperform some models trained with reinforcement learning.

Contribution

This paper demonstrates that extensive instruction tuning and reinforcement learning are not strictly necessary for high-quality language model performance, emphasizing the importance of pretraining.

Findings

01

LIMA performs well with only 1,000 curated prompts and responses.

02

LIMA generalizes to unseen tasks not in training data.

03

LIMA responses are often preferred or equivalent to GPT-4 in human evaluations.

Abstract

Large language models are trained in two stages: (1) unsupervised pretraining from raw text, to learn general-purpose representations, and (2) large scale instruction tuning and reinforcement learning, to better align to end tasks and user preferences. We measure the relative importance of these two stages by training LIMA, a 65B parameter LLaMa language model fine-tuned with the standard supervised loss on only 1,000 carefully curated prompts and responses, without any reinforcement learning or human preference modeling. LIMA demonstrates remarkably strong performance, learning to follow specific response formats from only a handful of examples in the training data, including complex queries that range from planning trip itineraries to speculating about alternate history. Moreover, the model tends to generalize well to unseen tasks that did not appear in the training data. In a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Datasets

Videos

LIMA: Less Is More for Alignment· slideslive

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis

MethodsMulti-Head Attention · Attention Is All You Need · Dense Connections · Dropout · Byte Pair Encoding · Softmax · Layer Normalization · Position-Wise Feed-Forward Layer · Linear Layer · Absolute Position Encodings