On Monotonicity in AI Alignment
Gilles Bareilles, Julien Fageot, L\^e-Nguy\^en Hoang, Peva Blanchard, Wassim Bouaziz, S\'ebastien Rouault, El-Mahdi El-Mhamdi

TL;DR
This paper investigates the causes of non-monotonic behavior in comparison-based preference learning methods for AI alignment, providing theoretical insights and conditions to evaluate and improve their trustworthiness.
Contribution
It offers a formal analysis of monotonicity in preference learning frameworks, identifying conditions for monotonicity guarantees and clarifying limitations of current methods.
Findings
Models satisfy local pairwise monotonicity under mild assumptions.
Provides formalizations and conditions for monotonicity guarantees.
Clarifies limitations and guides development of trustworthy preference learning algorithms.
Abstract
Comparison-based preference learning has become central to the alignment of AI models with human preferences. However, these methods may behave counterintuitively. After empirically observing that, when accounting for a preference for response over , the model may actually decrease the probability (and reward) of generating (an observation also made by others), this paper investigates the root causes of (non) monotonicity, for a general comparison-based preference learning framework that subsumes Direct Preference Optimization (DPO), Generalized Preference Optimization (GPO) and Generalized Bradley-Terry (GBT). Under mild assumptions, we prove that such methods still satisfy what we call local pairwise monotonicity. We also provide a bouquet of formalizations of monotonicity, and identify sufficient conditions for their guarantee, thereby providing a toolbox to evaluate how…
Peer Reviews
Decision·Submitted to ICLR 2026
* Math is well-done; I did not make a "deep dive" but was able to follow the math and did not catch any errors or inconsistencies. A substantial piece of this paper is dedicated to theory, so this is fairly significant. * The paper is well-motivated; the main novelty (monotonicity taxonomy and local pairwise guarantee) are clear and potential impacts with future work are clear. * Overall, the paper is well-written. The Structure is clear and notation is consistent.
*The motivating example is good, but the figure is small. I would like to see a zero line, larger fonts, and clearer panel labels. *The claim of a "toolbox" (as written in the abstract) feels somewhat strong to me. The authors do formalize several forms of monotonicity, which is appreciated, but there is not even a minimal empirical example or guidance on how to use and interpret results. I recognize space constraints but a clearer link between the sections would be appreciated.
< Strength > - The paper addresses a practically significant issue in AI alignment. The empirical observation that preferred responses can have decreasing scores during training is widely-known concern for its reliability. The motivating example in Figure 1 effectively demonstrates this counterintuitive behavior across multiple Llama models. - The general formulation in Section 3 successfully unifies multiple existing methods (BT, DPO, GPO, GBT) under a common loss structure and thereby enables
< Weakness > - All 6 models tested are from the Llama family (3.1 8B, 3.2 3B, 3.2 1B with base/instruct variants) and this can raise concerns about generalizability. Testing on other architectures (e.g., Qwen) would strengthen confidence that findings aren't specific to Llama's particular parameterization. - Although the paper is a theoretical analysis paper, it lacks a practical Interpretation or experiments. It would be great if the paper mention about what the theoretical guarantees mean for
The authors apparently know what they want to study.
The problem is, we have no idea what's the implication of their findings in terms of helping with either SFT or inference. There are no numerical or experimental results.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Bayesian Modeling and Causal Inference · Constraint Satisfaction and Optimization
