PITA: Preference-Guided Inference-Time Alignment for LLM Post-Training

Sarat Chandra Bobbili; Ujwal Dinesha; Dheeraj Narasimha; Srinivas Shakkottai

arXiv:2507.20067·cs.AI·November 14, 2025

PITA: Preference-Guided Inference-Time Alignment for LLM Post-Training

Sarat Chandra Bobbili, Ujwal Dinesha, Dheeraj Narasimha, Srinivas Shakkottai

PDF

TL;DR

PITA introduces a novel inference-time alignment method for LLMs that directly incorporates user preferences into token generation, avoiding the need for pre-trained reward models and reducing computational costs.

Contribution

It presents a new framework that learns preference-based guidance policies during inference, bypassing reward model dependence and enabling efficient alignment of LLM outputs.

Findings

01

Effective in aligning outputs with user preferences across tasks

02

Reduces reliance on pre-trained reward models

03

Demonstrates improved alignment in mathematical reasoning and sentiment classification

Abstract

Inference-time alignment enables large language models (LLMs) to generate outputs aligned with end-user preferences without further training. Recent post-training methods achieve this by using small guidance models to modify token generation during inference. These methods typically optimize a reward function KL-regularized by the original LLM taken as the reference policy. A critical limitation, however, is their dependence on a pre-trained reward model, which requires fitting to human preference feedback--a potentially unstable process. In contrast, we introduce PITA, a novel framework that integrates preference feedback directly into the LLM's token generation, eliminating the need for a reward model. PITA learns a small preference-based guidance policy to modify token probabilities at inference time without LLM fine-tuning, reducing computational cost and bypassing the pre-trained…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.