ALMA: Alignment with Minimal Annotation
Michihiro Yasunaga, Leonid Shamis, Chunting Zhou, Andrew Cohen, Jason, Weston, Luke Zettlemoyer, Marjan Ghazvininejad

TL;DR
ALMA demonstrates that effective large language model alignment can be achieved with significantly fewer human annotations by leveraging synthetic data generation techniques and self-bootstrapping, reducing reliance on extensive human labeling.
Contribution
ALMA introduces a minimal annotation approach for LLM alignment, utilizing synthetic data generation and iterative self-bootstrapping to match state-of-the-art performance.
Findings
Achieves near state-of-the-art alignment with only 9,000 labeled examples.
Multi-round self-bootstrapping improves alignment quality over 10 iterations.
Synthetic data generation exposes existing knowledge in base models.
Abstract
Recent approaches to large language model (LLM) alignment typically require millions of human annotations or rely on external aligned models for synthetic data generation. This paper introduces ALMA: Alignment with Minimal Annotation, demonstrating that effective alignment can be achieved using only 9,000 labeled examples -- less than 1% of conventional approaches. ALMA generates large amounts of high-quality synthetic alignment data through new techniques: diverse prompt synthesis via few-shot learning, diverse response generation with multiple model checkpoints, and judge (reward model) enhancement through score aggregation and self-distillation. Using only a pretrained Llama3 base model, 5,000 SFT examples, and 4,000 judge annotations, ALMA achieves performance close to Llama3-Instruct across diverse alignment benchmarks (e.g., 0.1% difference on AlpacaEval 2.0 score). These results…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Natural Language Processing Techniques · Semantic Web and Ontologies
MethodsShrink and Fine-Tune · Balanced Selection
