Transfer Q Star: Principled Decoding for LLM Alignment

Souradip Chakraborty; Soumya Suvra Ghosal; Ming Yin; Dinesh Manocha,; Mengdi Wang; Amrit Singh Bedi; and Furong Huang

arXiv:2405.20495·cs.CL·June 3, 2024

Transfer Q Star: Principled Decoding for LLM Alignment

Souradip Chakraborty, Soumya Suvra Ghosal, Ming Yin, Dinesh Manocha,, Mengdi Wang, Amrit Singh Bedi, and Furong Huang

PDF

Open Access

TL;DR

Transfer Q* offers a principled decoding approach for aligning large language models by implicitly estimating the optimal value function, reducing sub-optimality, and improving response quality without extensive fine-tuning.

Contribution

This work introduces Transfer Q*, a novel method that estimates the optimal value function for alignment, providing theoretical guarantees and superior empirical performance over prior methods.

Findings

01

Reduces sub-optimality gap compared to previous methods

02

Achieves higher coherence, diversity, and quality in responses

03

Demonstrates strong empirical results on synthetic and real datasets

Abstract

Aligning foundation models is essential for their safe and trustworthy deployment. However, traditional fine-tuning methods are computationally intensive and require updating billions of model parameters. A promising alternative, alignment via decoding, adjusts the response distribution directly without model updates to maximize a target reward $r$ , thus providing a lightweight and adaptable framework for alignment. However, principled decoding methods rely on oracle access to an optimal Q-function ( $Q^{*}$ ), which is often unavailable in practice. Hence, prior SoTA methods either approximate this $Q^{*}$ using $Q^{π_{sft}}$ (derived from the reference $SFT$ model) or rely on short-term rewards, resulting in sub-optimal decoding performance. In this work, we propose Transfer $Q^{*}$ , which implicitly estimates the optimal value function for a target reward $r$ through a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Mathematics, Computing, and Information Processing