Distributional Alignment Games for Answer-Level Fine-Tuning
Mehryar Mohri, Jon Schneider, Yifan Wu

TL;DR
This paper introduces a game-theoretical framework for answer-level fine-tuning of language models, enabling efficient optimization by framing it as a distributional alignment game.
Contribution
It formulates answer-level fine-tuning as a two-player game, unifying diverse approaches and providing algorithms with improved efficiency for reasoning tasks.
Findings
The Nash Equilibrium of the game aligns with the original optimization goal.
The framework unifies diversity and self-improvement methods.
Algorithms like Coherence-GRPO achieve significant complexity reductions.
Abstract
We focus on the problem of \emph{Answer-Level Fine-Tuning} (ALFT), where the goal is to optimize a language model based on the correctness or properties of its final answers, rather than the specific reasoning traces used to produce them. Directly optimizing answer-level objectives is computationally intractable due to the need to marginalize over the vast space of latent reasoning paths. To overcome this, we propose a general game-theoretical framework that lifts the problem to a \emph{Distributional Alignment Game}. We formulate ALFT as a two-player game between a Policy (the generator) and a Target (an auxiliary distribution). We prove that the Nash Equilibrium of this game corresponds exactly to the solution of the original answer-level optimization problem. This variational perspective transforms the intractable marginalization problem into a tractable projection problem. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
