ALMA: Alignment with Minimal Annotation

Michihiro Yasunaga; Leonid Shamis; Chunting Zhou; Andrew Cohen; Jason; Weston; Luke Zettlemoyer; Marjan Ghazvininejad

arXiv:2412.04305·cs.CL·December 6, 2024

ALMA: Alignment with Minimal Annotation

Michihiro Yasunaga, Leonid Shamis, Chunting Zhou, Andrew Cohen, Jason, Weston, Luke Zettlemoyer, Marjan Ghazvininejad

PDF

Open Access

TL;DR

ALMA demonstrates that effective large language model alignment can be achieved with significantly fewer human annotations by leveraging synthetic data generation techniques and self-bootstrapping, reducing reliance on extensive human labeling.

Contribution

ALMA introduces a minimal annotation approach for LLM alignment, utilizing synthetic data generation and iterative self-bootstrapping to match state-of-the-art performance.

Findings

01

Achieves near state-of-the-art alignment with only 9,000 labeled examples.

02

Multi-round self-bootstrapping improves alignment quality over 10 iterations.

03

Synthetic data generation exposes existing knowledge in base models.

Abstract

Recent approaches to large language model (LLM) alignment typically require millions of human annotations or rely on external aligned models for synthetic data generation. This paper introduces ALMA: Alignment with Minimal Annotation, demonstrating that effective alignment can be achieved using only 9,000 labeled examples -- less than 1% of conventional approaches. ALMA generates large amounts of high-quality synthetic alignment data through new techniques: diverse prompt synthesis via few-shot learning, diverse response generation with multiple model checkpoints, and judge (reward model) enhancement through score aggregation and self-distillation. Using only a pretrained Llama3 base model, 5,000 SFT examples, and 4,000 judge annotations, ALMA achieves performance close to Llama3-Instruct across diverse alignment benchmarks (e.g., 0.1% difference on AlpacaEval 2.0 score). These results…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques · Natural Language Processing Techniques · Semantic Web and Ontologies

MethodsShrink and Fine-Tune · Balanced Selection