Back to Blackwell: Closing the Loop on Intransitivity in Multi-Objective Preference Fine-Tuning
Jiahao Zhang, Lujing Zhang, Keltin Grimes, Zhuohao Yu, Gokul Swamy, Zhiwei Steven Wu

TL;DR
This paper introduces MaxEntBW, a game-theoretic concept for multi-objective intransitive preferences, and PROSPER, an efficient algorithm for fine-tuning large language models without scalarization.
Contribution
It proposes MaxEntBW as a new solution for intransitive preferences and develops PROSPER, a scalable PFT algorithm that handles multiple objectives directly.
Findings
PROSPER outperforms baselines on instruction following and chat benchmarks.
Trained 7B and 3B parameter models using PROSPER.
Addresses intransitivity in multi-objective preference fine-tuning.
Abstract
A recurring challenge in preference fine-tuning (PFT) is handling (i.e., cyclic) preferences. Intransitive preferences often stem from either inconsistent rankings along a single objective or scalarizing multiple objectives into a single metric. Regardless of their source, the downstream implication of intransitive preferences is the same: there is no well-defined optimal policy, breaking a core assumption of the standard PFT pipeline. In response, we propose a novel, game-theoretic solution concept, the (), that is well-defined under multi-objective intransitive preferences. To enable computing MaxEntBWs at scale, we derive : a provably efficient PFT algorithm. Unlike prior self-play techniques, directly handles multiple objectives…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
