InfoPO: On Mutual Information Maximization for Large Language Model Alignment

Teng Xiao; Zhen Ge; Sujay Sanghavi; Tian Wang; Julian Katz-Samuels; Marc Versage; Qingjun Cui; Trishul Chilimbi

arXiv:2505.08507·cs.LG·May 14, 2025

InfoPO: On Mutual Information Maximization for Large Language Model Alignment

Teng Xiao, Zhen Ge, Sujay Sanghavi, Tian Wang, Julian Katz-Samuels, Marc Versage, Qingjun Cui, Trishul Chilimbi

PDF

1 Video

TL;DR

This paper introduces InfoPO, a new preference fine-tuning method for large language models that improves alignment with human preferences, especially in reasoning tasks, by avoiding overfitting and reliance on the Bradley-Terry model.

Contribution

The paper presents InfoPO, a novel preference fine-tuning algorithm that outperforms existing methods by eliminating the need for the BT model and enhancing reasoning capabilities.

Findings

01

InfoPO outperforms baselines on open benchmarks.

02

It improves reasoning task performance.

03

It prevents likelihood decrease during training.

Abstract

We study the post-training of large language models (LLMs) with human preference data. Recently, direct preference optimization and its variants have shown considerable promise in aligning language models, eliminating the need for reward models and online sampling. Despite these benefits, these methods rely on explicit assumptions about the Bradley-Terry (BT) model, which makes them prone to overfitting and results in suboptimal performance, particularly on reasoning-heavy tasks. To address these challenges, we propose a principled preference fine-tuning algorithm called InfoPO, which effectively and efficiently aligns large language models using preference data. InfoPO eliminates the reliance on the BT model and prevents the likelihood of the chosen response from decreasing. Extensive experiments confirm that InfoPO consistently outperforms established baselines on widely used open…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

InfoPO: On Mutual Information Maximization for Large Language Model Alignment· underline