Why Expert Alignment Is Hard: Evidence from Subjective Evaluation
Tzu-Mi Lin, Wataru Hirota, Tatsuya Ishigaki, Lung-Hao Lee, Chung-Chi Chen

TL;DR
Aligning large language models with expert judgment in subjective tasks is challenging due to expert heterogeneity, tacit criteria, and judgment instability, as shown by empirical evaluation patterns.
Contribution
This paper provides empirical evidence on the complexities of expert alignment, highlighting the heterogeneity and tacit nature of subjective evaluation.
Findings
Alignment difficulty varies across experts.
Explicit criteria do not always improve alignment.
Alignment is easier on content-based dimensions.
Abstract
Aligning large language models with expert judgment is especially difficult in subjective evaluation tasks, where experts may disagree, rely on tacit criteria, and change their judgments over time. In this paper, we study expert alignment as a way to understand this difficulty. Using expert evaluations and follow-up questionnaires, we examine how different forms of expert information affect alignment and what this reveals about subjective judgment. Our findings show four consistent patterns. First, alignment difficulty varies substantially across experts, suggesting that expert evaluation styles differ widely in their distance from a model's prior behavior. Second, explicit criteria and reasoning do not always improve alignment, indicating that expert judgment is not fully captured by verbalized rules. Third, editing is sensitive to both the number and the identity of examples, with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
