Back to Blackwell: Closing the Loop on Intransitivity in Multi-Objective Preference Fine-Tuning

Jiahao Zhang; Lujing Zhang; Keltin Grimes; Zhuohao Yu; Gokul Swamy; Zhiwei Steven Wu

arXiv:2602.19041·cs.LG·May 7, 2026

Back to Blackwell: Closing the Loop on Intransitivity in Multi-Objective Preference Fine-Tuning

Jiahao Zhang, Lujing Zhang, Keltin Grimes, Zhuohao Yu, Gokul Swamy, Zhiwei Steven Wu

PDF

TL;DR

This paper introduces MaxEntBW, a game-theoretic concept for multi-objective intransitive preferences, and PROSPER, an efficient algorithm for fine-tuning large language models without scalarization.

Contribution

It proposes MaxEntBW as a new solution for intransitive preferences and develops PROSPER, a scalable PFT algorithm that handles multiple objectives directly.

Findings

01

PROSPER outperforms baselines on instruction following and chat benchmarks.

02

Trained 7B and 3B parameter models using PROSPER.

03

Addresses intransitivity in multi-objective preference fine-tuning.

Abstract

A recurring challenge in preference fine-tuning (PFT) is handling $intransitive$ (i.e., cyclic) preferences. Intransitive preferences often stem from either $(i)$ inconsistent rankings along a single objective or $(ii)$ scalarizing multiple objectives into a single metric. Regardless of their source, the downstream implication of intransitive preferences is the same: there is no well-defined optimal policy, breaking a core assumption of the standard PFT pipeline. In response, we propose a novel, game-theoretic solution concept, the $Maximum Entropy Blackwell Winner$ ( $MaxEntBW$ ), that is well-defined under multi-objective intransitive preferences. To enable computing MaxEntBWs at scale, we derive $PROSPER$ : a provably efficient PFT algorithm. Unlike prior self-play techniques, $PROSPER$ directly handles multiple objectives…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.