Well Begun is Half Done: Low-resource Preference Alignment by Weak-to-Strong Decoding
Feifan Song, Shaohang Wei, Wen Luo, Yuxuan Fan, Tianyu Liu, Guoyin Wang, Houfeng Wang

TL;DR
This paper introduces Weak-to-Strong Decoding, a novel framework that improves low-resource preference alignment in large language models by guiding the decoding process with a small aligned model, resulting in better aligned content without sacrificing downstream performance.
Contribution
The paper proposes a new Weak-to-Strong Decoding framework and a dataset, GenerAlign, to enhance low-resource preference alignment in LLMs using a small draft model guiding the large base model.
Findings
WSD outperforms baseline methods in alignment quality.
WSD maintains downstream task performance, avoiding alignment tax.
The approach improves alignment efficiency and effectiveness.
Abstract
Large Language Models (LLMs) require alignment with human preferences to avoid generating offensive, false, or meaningless content. Recently, low-resource methods for LLM alignment have been popular, while still facing challenges in obtaining both high-quality and aligned content. Motivated by the observation that the difficulty of generating aligned responses is concentrated at the beginning of decoding, we propose a novel framework, Weak-to-Strong Decoding (WSD), to enhance the alignment ability of base models by the guidance of a small aligned model. The small model first drafts well-aligned beginnings, followed by the large base model to continue the rest, controlled by a well-designed auto-switch mechanism. We also collect a new dataset, GenerAlign, to fine-tune a small-sized Pilot-3B as the draft model, which effectively enhances different base models under the WSD framework to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection · Topic Modeling · Computational and Text Analysis Methods
MethodsBalanced Selection
