Intrinsic Mutual Information as a Modulator for Preference Optimization

Peng Liao; Peijia Zheng; Lingbo Li; Shangsong Liang; Lin Chen

arXiv:2604.24804·cs.LG·April 29, 2026

Intrinsic Mutual Information as a Modulator for Preference Optimization

Peng Liao, Peijia Zheng, Lingbo Li, Shangsong Liang, Lin Chen

PDF

1 Repo

TL;DR

This paper introduces RMiPO, a new framework for offline preference optimization that uses intrinsic mutual information to improve performance and reduce hyperparameter tuning overhead in aligning LLMs with human values.

Contribution

RMiPO is a lightweight, efficient method that dynamically modulates preference contributions using intrinsic mutual information, outperforming existing approaches.

Findings

01

RMiPO achieves consistently superior performance over existing methods.

02

Reduces training overhead by more than 15%.

03

Leverages intrinsic response-level mutual information for preference optimization.

Abstract

Offline preference optimization methods, such as Direct Preference Optimization (DPO), offer significant advantages in aligning Large Language Models (LLMs) with human values. However, achieving optimal performance with these methods typically involves additional hyperparameter tuning, resulting in substantial time overhead. Although prior work has proposed a range of improvements, these methods remain limited in effectiveness and have not fully eliminated reliance on hyperparameter tuning. In this work, we propose RMiPO, a lightweight and efficient framework for offline preference optimization. RMiPO leverages intrinsic Response-level Mutual information for Preference Optimization with hyperparameter modulation, dynamically decoupling preference contributions at negligible additional computational cost. Extensive experimental results demonstrate that RMiPO achieves consistently…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

liavonpenn/rmipo
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.