Reward-Shifted Speculative Sampling Is An Efficient Test-Time Weak-to-Strong Aligner
Bolian Li, Yanran Wu, Xinyu Luo, Ruqi Zhang

TL;DR
This paper introduces reward-shifted speculative sampling, a method that efficiently aligns large language models with human preferences during inference by leveraging a draft model, reducing costs while maintaining high reward scores.
Contribution
The paper proposes a novel reward-shifted speculative sampling algorithm that improves test-time alignment efficiency without sacrificing alignment quality.
Findings
Achieves higher reward scores with lower inference costs.
Effectively exploits distributional shifts between draft and target models.
Validates efficiency and effectiveness through experiments.
Abstract
Aligning large language models (LLMs) with human preferences has become a critical step in their development. Recent research has increasingly focused on test-time alignment, where additional compute is allocated during inference to enhance LLM safety and reasoning capabilities. However, these test-time alignment techniques often incur substantial inference costs, limiting their practical application. We are inspired by the speculative sampling acceleration, which leverages a small draft model to efficiently predict future tokens, to address the efficiency bottleneck of test-time alignment. We introduce the reward-shifted speculative sampling (SSS) algorithm, in which the draft model is aligned with human preferences, while the target model remains unchanged. We theoretically demonstrate that the distributional shift between the aligned draft model and the unaligned target model can be…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsTopic Modeling · Mobile Crowdsensing and Crowdsourcing · Explainable Artificial Intelligence (XAI)
