InfAlign: Inference-aware language model alignment

Ananth Balashankar; Ziteng Sun; Jonathan Berant; Jacob Eisenstein; Michael Collins; Adrian Hutter; Jong Lee; Chirag Nagpal; Flavien Prost; Aradhana Sinha; Ananda Theertha Suresh; Ahmad Beirami

arXiv:2412.19792·cs.LG·August 22, 2025

InfAlign: Inference-aware language model alignment

Ananth Balashankar, Ziteng Sun, Jonathan Berant, Jacob Eisenstein, Michael Collins, Adrian Hutter, Jong Lee, Chirag Nagpal, Flavien Prost, Aradhana Sinha, Ananda Theertha Suresh, Ahmad Beirami

PDF

Open Access 1 Video

TL;DR

InfAlign introduces an inference-aware framework for language model alignment that optimizes inference-time win rates, addressing train/test mismatch issues in standard RLHF and improving decoding performance.

Contribution

The paper proposes a novel inference-aware alignment framework and algorithms that optimize inference-time win rates, with specific transformations for better decoding outcomes.

Findings

01

Up to 8% improvement in inference-time win rates for best-of-N sampling.

02

The proposed reward calibration method outperforms standard win rate optimization.

03

The framework generalizes RLHF to inference-time decoding procedures.

Abstract

Language model alignment is a critical step in training modern generative language models. Alignment targets to improve win rate of a sample from the aligned model against the base model. Today, we are increasingly using inference-time algorithms (e.g., Best-of-N, controlled decoding, tree search) to decode from language models rather than standard sampling. We show that this train/test mismatch makes standard RLHF framework sub-optimal in view of such inference-time methods. To this end, we propose a framework for inference-aware alignment (InfAlign), which aims to optimize inference-time win rate of the aligned policy against the base model. We prove that for any inference-time decoding procedure, the optimal aligned policy is the solution to the standard RLHF problem with a transformation of the reward. This motivates us to provide the calibrate-and-transform RL (InfAlign-CTRL)…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

InfAlign: Inference-aware language model alignment· slideslive

Taxonomy

TopicsTopic Modeling

MethodsBalanced Selection