Advantage-Guided Distillation for Preference Alignment in Small Language   Models

Shiping Gao; Fanqi Wan; Jiajian Guo; Xiaojun Quan; Qifan Wang

arXiv:2502.17927·cs.CL·March 6, 2025

Advantage-Guided Distillation for Preference Alignment in Small Language Models

Shiping Gao, Fanqi Wan, Jiajian Guo, Xiaojun Quan, Qifan Wang

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a novel approach to improve the alignment of Small Language Models by leveraging a well-aligned teacher model through advantage-guided distillation, significantly narrowing the performance gap with larger models.

Contribution

It proposes Advantage-Guided Distillation for Preference Alignment (ADPA), a new method that transfers human preference knowledge from a large teacher LLM to small models, enhancing their alignment.

Findings

01

ADPA outperforms existing methods in aligning SLMs with human preferences.

02

Combining ADPA with DCKD yields even better alignment results.

03

The approaches significantly reduce the performance gap between small and large language models.

Abstract

Alignment techniques enable Large Language Models (LLMs) to generate outputs that align with human preferences and play a crucial role in their effectiveness. However, their impact often diminishes when applied to Small Language Models (SLMs), likely due to the limited capacity of these models. Instead of directly applying existing alignment techniques to SLMs, we propose to utilize a well-aligned teacher LLM to guide the alignment process for these models, thereby facilitating the transfer of the teacher's knowledge of human preferences to the student model. To achieve this, we first explore a straightforward approach, Dual-Constrained Knowledge Distillation (DCKD), that employs knowledge distillation with two KL-divergence constraints from the aligned teacher to the unaligned student. To further enhance the student's ability to distinguish between preferred and dispreferred responses,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

slit-ai/adpa
pytorchOfficial

Videos

Advantage-Guided Distillation for Preference Alignment in Small Language Models· slideslive

Taxonomy

TopicsNatural Language Processing Techniques

MethodsKnowledge Distillation · ALIGN