TL;DR
ActiveDPO introduces a theoretically grounded, LLM-aware active data selection method for preference-based alignment, significantly reducing data collection costs and improving alignment quality.
Contribution
It presents a novel active data selection algorithm that accounts for the LLM's influence, outperforming existing methods in sample-efficient alignment.
Findings
ActiveDPO outperforms existing methods across multiple models.
It effectively reduces the amount of human preference data needed.
The method demonstrates superior performance on real-world datasets.
Abstract
The recent success in using human preferences to align large language models (LLMs) has significantly improved their performance in various downstream tasks, such as question answering, mathematical reasoning, and code generation. However, achieving effective LLM alignment depends on high-quality datasets of human preferences. Collecting these datasets requires human preference annotation, which is costly and resource-intensive, necessitating efficient active data selection methods. Existing methods either lack a strong theoretical foundation or depend on restrictive assumptions about the reward function, such as linear latent reward functions. To this end, we propose an algorithm, ActiveDPO, that uses a theoretically grounded data selection criterion for non-linear reward functions while directly leveraging the LLM itself to parameterize the reward model used for active data selection.…
Peer Reviews
Decision·ICLR 2026 Poster
- First theoretical and algorithm formulation of active learning for DPO. - Provides algorithm for both online and offline settings, enabling flexible application. - Validate the theoretical foundation with reasonable empirical results.
- Reliance on a log-linear policy approximation may lead to shallow alignment, easy reward-hacking and neglected task complexity. - The paper assumes (Assumption 1) that *all policies* are log-linear in the last layer features. This assumptions is the key assumption that enables the D-optimal design analysis, but concurrently restricts the model's expressive capacity. - In realistic settings, such as aligning LLMs, the relation between prompt/response pairs and human judgements is likely
The active preference learning research topic is very important. I like the method which is similar to the influence function to conduct active learning. The writing is clear and the scale of the experiment is OK.
**1. Discussion on Comparison with Active Preference Learning for Large Language Models** A detailed comparison with Active Preference Learning for Large Language Models (arXiv:2402.08114 ) would strengthen the paper. In particular, while the APL paper focuses on reward difference—which only reflects the immediate step before updates—the current work’s use of gradient difference captures the potential improvement after updates. This distinction highlights a more forward-looking and theoreticall
+ The motivation of the Active DPO method is theoretically grounded and principled. + The implementation details which allow the method to become tractable are quite clever and dramatically improve the tractability of the method. + The ablations are quite in-depth and demonstrate effectively which parts of the algorithm are important; and show that their design is robust to various different models and datasets. + The Active DPO method appears to outperform all other Active DPO approaches in the
+ The description of the algorithm is slightly unclear at times. Particularly around the description of the matrix V_t. This is presumably an outer product of the gradients, but the authors don't comment on the fact that this is obviously intractable to store for modern LLMs, let alone invert. I appreciate that the matrix is tractable when projected to 8192 dimensions with the approximations made later on, but the authors should highlight this difficulty earlier on. + The computational requireme
Videos
Taxonomy
TopicsAdvanced Data Compression Techniques · Advanced Image and Video Retrieval Techniques · Algorithms and Data Compression
MethodsALIGN
