Transfer Q Star: Principled Decoding for LLM Alignment
Souradip Chakraborty, Soumya Suvra Ghosal, Ming Yin, Dinesh Manocha,, Mengdi Wang, Amrit Singh Bedi, and Furong Huang

TL;DR
Transfer Q* offers a principled decoding approach for aligning large language models by implicitly estimating the optimal value function, reducing sub-optimality, and improving response quality without extensive fine-tuning.
Contribution
This work introduces Transfer Q*, a novel method that estimates the optimal value function for alignment, providing theoretical guarantees and superior empirical performance over prior methods.
Findings
Reduces sub-optimality gap compared to previous methods
Achieves higher coherence, diversity, and quality in responses
Demonstrates strong empirical results on synthetic and real datasets
Abstract
Aligning foundation models is essential for their safe and trustworthy deployment. However, traditional fine-tuning methods are computationally intensive and require updating billions of model parameters. A promising alternative, alignment via decoding, adjusts the response distribution directly without model updates to maximize a target reward , thus providing a lightweight and adaptable framework for alignment. However, principled decoding methods rely on oracle access to an optimal Q-function (), which is often unavailable in practice. Hence, prior SoTA methods either approximate this using (derived from the reference model) or rely on short-term rewards, resulting in sub-optimal decoding performance. In this work, we propose Transfer , which implicitly estimates the optimal value function for a target reward through a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Mathematics, Computing, and Information Processing
