Response Time Enhances Alignment with Heterogeneous Preferences
Federico Echenique, Alireza Fallah, Baihe Huang, Michael I. Jordan

TL;DR
This paper introduces a method that uses response time data, modeled with Drift-Diffusion Models, to accurately identify population preferences in LLM alignment, overcoming limitations of choice-only data.
Contribution
It proposes augmenting preference datasets with response times and models them with DDMs to restore preference identifiability and correct biases in heterogeneous labeler populations.
Findings
Response time data improves preference estimation accuracy.
The proposed estimator converges to true preferences even with minimal labeler data.
Empirical results outperform standard choice-only baselines.
Abstract
Aligning large language models (LLMs) to human preferences typically relies on aggregating pooled feedback into a single reward model. However, this standard approach assumes that all labelers share the same underlying preferences, ignoring the fact that real-world labelers are highly heterogeneous and usually anonymous. Consequently, relying solely on binary choice data fundamentally distorts the learned policy, making the true population-average preference unidentifiable. To overcome this critical limitation, we demonstrate that augmenting preference datasets with a simple, secondary signal -- the user's response time -- can restore the identifiability of the population's average preference. By modeling each decision as a Drift-Diffusion Model (DDM), we introduce a novel, consistent estimator of heterogeneous preferences that successfully corrects the distortions of standard…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
