Weak-to-Strong Search: Align Large Language Models via Searching over Small Language Models
Zhanhui Zhou, Zhixuan Liu, Jie Liu, Zhichen Dong, Chao Yang, Yu Qiao

TL;DR
This paper introduces weak-to-strong search, a test-time method that aligns large language models with human preferences by leveraging small models, improving performance without additional training.
Contribution
It proposes a novel test-time greedy search method that uses small tuned and untuned models to enhance large model alignment efficiently.
Findings
Improves large model alignment in sentiment and summarization tasks.
Enhances instruction-following performance using off-the-shelf small models.
Achieves better win rates against GPT-4 Turbo without additional training.
Abstract
Large language models are usually fine-tuned to align with human preferences. However, fine-tuning a large language model can be challenging. In this work, we introduce , framing the alignment of a large language model as a test-time greedy search to maximize the log-probability difference between small tuned and untuned models while sampling from the frozen large model. This method serves both as (1) a compute-efficient model up-scaling strategy that avoids directly tuning the large model and as (2) an instance of weak-to-strong generalization that enhances a strong model with weak test-time guidance. Empirically, we demonstrate the flexibility of weak-to-strong search across different tasks. In controlled-sentiment generation and summarization, we use tuned and untuned s to improve the alignment of large models without additional…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques
MethodsALIGN
