Well Begun is Half Done: Low-resource Preference Alignment by Weak-to-Strong Decoding

Feifan Song; Shaohang Wei; Wen Luo; Yuxuan Fan; Tianyu Liu; Guoyin Wang; Houfeng Wang

arXiv:2506.07434·cs.CL·June 10, 2025

Well Begun is Half Done: Low-resource Preference Alignment by Weak-to-Strong Decoding

Feifan Song, Shaohang Wei, Wen Luo, Yuxuan Fan, Tianyu Liu, Guoyin Wang, Houfeng Wang

PDF

Open Access 1 Repo 1 Models 1 Datasets

TL;DR

This paper introduces Weak-to-Strong Decoding, a novel framework that improves low-resource preference alignment in large language models by guiding the decoding process with a small aligned model, resulting in better aligned content without sacrificing downstream performance.

Contribution

The paper proposes a new Weak-to-Strong Decoding framework and a dataset, GenerAlign, to enhance low-resource preference alignment in LLMs using a small draft model guiding the large base model.

Findings

01

WSD outperforms baseline methods in alignment quality.

02

WSD maintains downstream task performance, avoiding alignment tax.

03

The approach improves alignment efficiency and effectiveness.

Abstract

Large Language Models (LLMs) require alignment with human preferences to avoid generating offensive, false, or meaningless content. Recently, low-resource methods for LLM alignment have been popular, while still facing challenges in obtaining both high-quality and aligned content. Motivated by the observation that the difficulty of generating aligned responses is concentrated at the beginning of decoding, we propose a novel framework, Weak-to-Strong Decoding (WSD), to enhance the alignment ability of base models by the guidance of a small aligned model. The small model first drafts well-aligned beginnings, followed by the large base model to continue the rest, controlled by a well-designed auto-switch mechanism. We also collect a new dataset, GenerAlign, to fine-tune a small-sized Pilot-3B as the draft model, which effectively enhances different base models under the WSD framework to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

F2-Song/Weak-to-Strong-Decoding
pytorchOfficial

Models

🤗
songff/Pilot-3B
model· 14 dl· ♡ 3
14 dl♡ 3

Datasets

songff/GenerAlign
dataset· 16 dl
16 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection · Topic Modeling · Computational and Text Analysis Methods

MethodsBalanced Selection