Weak-to-Strong Search: Align Large Language Models via Searching over   Small Language Models

Zhanhui Zhou; Zhixuan Liu; Jie Liu; Zhichen Dong; Chao Yang; Yu Qiao

arXiv:2405.19262·cs.CL·November 20, 2024

Weak-to-Strong Search: Align Large Language Models via Searching over Small Language Models

Zhanhui Zhou, Zhixuan Liu, Jie Liu, Zhichen Dong, Chao Yang, Yu Qiao

PDF

Open Access 1 Repo 1 Datasets

TL;DR

This paper introduces weak-to-strong search, a test-time method that aligns large language models with human preferences by leveraging small models, improving performance without additional training.

Contribution

It proposes a novel test-time greedy search method that uses small tuned and untuned models to enhance large model alignment efficiently.

Findings

01

Improves large model alignment in sentiment and summarization tasks.

02

Enhances instruction-following performance using off-the-shelf small models.

03

Achieves better win rates against GPT-4 Turbo without additional training.

Abstract

Large language models are usually fine-tuned to align with human preferences. However, fine-tuning a large language model can be challenging. In this work, we introduce $weak-to-strong search$ , framing the alignment of a large language model as a test-time greedy search to maximize the log-probability difference between small tuned and untuned models while sampling from the frozen large model. This method serves both as (1) a compute-efficient model up-scaling strategy that avoids directly tuning the large model and as (2) an instance of weak-to-strong generalization that enhances a strong model with weak test-time guidance. Empirically, we demonstrate the flexibility of weak-to-strong search across different tasks. In controlled-sentiment generation and summarization, we use tuned and untuned $gpt2$ s to improve the alignment of large models without additional…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zhziszz/weak-to-strong-search
pytorchOfficial

Datasets

ZHZisZZ/imdb_preference
dataset· 1.0k dl
1.0k dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques

MethodsALIGN