Evolutionary Contrastive Distillation for Language Model Alignment
Julian Katz-Samuels, Zheng Li, Hyokun Yun, Priyanka Nigam, Yi Xu,, Vaclav Petricek, Bing Yin, Trishul Chilimbi

TL;DR
This paper introduces Evolutionary Contrastive Distillation, a method that enhances large language models' ability to follow complex instructions by generating and training on synthetic contrastive data, leading to significant performance improvements.
Contribution
It proposes a novel data generation technique that creates challenging contrastive examples for instruction tuning, improving model alignment with complex tasks.
Findings
7B model surpasses current SOTA 7B models in instruction following
Method achieves competitive performance with open-source 70B models
Contrastive data improves complex instruction adherence
Abstract
The ability of large language models (LLMs) to execute complex instructions is essential for their real-world applications. However, several recent studies indicate that LLMs struggle with challenging instructions. In this paper, we propose Evolutionary Contrastive Distillation (ECD), a novel method for generating high-quality synthetic preference data designed to enhance the complex instruction-following capability of language models. ECD generates data that specifically illustrates the difference between a response that successfully follows a set of complex instructions and a response that is high-quality, but nevertheless makes some subtle mistakes. This is done by prompting LLMs to progressively evolve simple instructions to more complex instructions. When the complexity of an instruction is increased, the original successful response to the original instruction becomes a "hard…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems
MethodsDirect Preference Optimization · Sparse Evolutionary Training · Contrastive Learning
