Data Selection for LLM Alignment Using Fine-Grained Preferences
Jia Zhang, Yao Liu, Chen-Xi Zhang, Yi Liu, Yi-Xuan Jin, Lan-Zhe Guo, Yu-Feng Li

TL;DR
This paper introduces a data selection method based on preference divergence to improve LLM alignment with fine-grained human preferences, achieving better results with less data.
Contribution
It formulates preference conflicts as divergence and proposes a data selection strategy that enhances alignment efficiency and effectiveness.
Findings
Achieves better alignment with 30% of data compared to full-data methods.
Theoretically guarantees near-optimal data selection based on preference divergence.
Empirically demonstrates consistent improvements across various datasets.
Abstract
Large language models (LLMs) alignment aims to ensure that the behavior of LLMs meets human preferences. While collecting data from multiple fine-grained, aspect-specific preferences becomes more and more feasible, existing alignment methods typically work on a single preference and thus struggle with conflicts inherent in such aggregated datasets. As one early attempt, in this paper, we propose a data-centric approach to align LLMs through the effective use of fine-grained preferences. Specifically, we formulate the problem as a direct fine-grained preference optimization and introduce preference divergence (PD) that quantifies inter-aspect preference conflicts. Instead of directly tackling the consequent complicated optimization, we recast it as a data selection problem and propose a simple yet effective strategy, which identifies a subset of data corresponding to the most negative PD…
Peer Reviews
Decision·ICLR 2026 Poster
A balanced sampling strategy is applied to mitigate the intrinsic bias towards longer responses that are favored regardless of quality. A penalty term is introduced into the reward model to discourage length bias as well. The paper is well written and easy to follow.
See the below questions.
1. The authors clearly demonstrate motivation for the problem, where aggregating fine-grained preferences introduces conflicts, redundancy, and noise that degrade LLM alignment. 2. The development of loss bounds and the selection optimality result underpin the proposed data selection strategy with rigorous analysis, providing compelling mathematical justification for selecting samples by most-negative PD. 3. Extensive evaluation: The method is thoroughly evaluated against full-data and alternati
1. I am not familiar with this research scope, but the current evaluation focuses on UltraFeedback and HelpSteer, and their derived conflict settings are limited. The author should conduct experiments with more advanced benchmarks for a clear demonstration of their effectiveness. 2. The empirical studies do not report in-depth on the sensitivity of the method to hyperparameters (e.g., $\lambda$, quantile level $\gamma$, length penalty $\rho$, sampling ratio $p_r$), aside from the generic selec
1. Detailed problem formulation and the novel transformation of the problem into data selection methods instead of algorithm development is interesting. 2. Empirical coverage is thorough.
1. With ever-larger models and compute, scaling-laws may simply “wash out” moderate preference noise; the urgency of the problem is not demonstrated. 2. The method operates on a fixed dataset and is demonstrated only with the now “classical” DPO pipeline. Readers working with on-policy RL extensions are unlikely to see an immediate hook. Extending DFPO to iterative regimes like iterative DPO would greatly widen its appeal. 3. If a dataset contains several conflicting preferences, DFPO appear
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Natural Language Processing Techniques
