Random Is Hard to Beat: Active Selection in online DPO with Modern LLMs

Giyeong Oh; Junghyun Lee; Jaehyun Park; Youngjae Yu; Wonho Bae; Junhyug Noh

arXiv:2604.02766·cs.LG·April 6, 2026

Random Is Hard to Beat: Active Selection in online DPO with Modern LLMs

Giyeong Oh, Junghyun Lee, Jaehyun Park, Youngjae Yu, Wonho Bae, Junhyug Noh

PDF

1 Repo

TL;DR

In the context of strong pre-trained language models, active preference learning offers minimal benefits over simple random sampling for data selection, with negligible improvements in proxy win-rates and no mitigation of capability degradation.

Contribution

This paper critically evaluates the effectiveness of active preference learning in large language models, showing it often underperforms simple random sampling in strong prior regimes.

Findings

01

Active preference learning yields negligible improvements over random sampling.

02

Win-rate improves even as overall model capability degrades.

03

Active preference learning does not significantly reduce variance or mitigate capability collapse.

Abstract

Modern LLMs inherit strong priors from web-scale pretraining, which can limit the headroom of post-training data-selection strategies. While Active Preference Learning (APL) seeks to optimize query efficiency in online Direct Preference Optimization (DPO), the inherent richness of on-policy candidate pools often renders simple Random sampling a surprisingly formidable baseline. We evaluate uncertainty-based APL against Random across harmlessness, helpfulness, and instruction-following settings, utilizing both reward models and LLM-as-a-judge proxies. We find that APL yields negligible improvements in proxy win-rates compared to Random. Crucially, we observe a dissociation where win-rate improves even as general capability -- measured by standard benchmarks -- degrades. APL fails to mitigate this capability collapse or reduce variance significantly better than random sampling. Our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

BootsofLagrangian/random-vs-apl
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.