A Unified Evaluation Framework for Multi-Annotator Tendency Learning
Liyun Zhang, Fengkai Liu, Xuanmeng Sha, Bowen Wang, Hong Liu, Zheng Lian

TL;DR
This paper introduces a comprehensive evaluation framework for multi-annotator tendency learning, focusing on measuring how well models capture individual annotator behaviors and provide meaningful explanations.
Contribution
It proposes the first unified evaluation framework with two novel metrics, DIC and BAE, to assess annotator tendency modeling and explanation relevance in multi-annotator learning.
Findings
The framework effectively distinguishes models that accurately capture annotator tendencies.
DIC correlates with model performance in representing annotator similarity.
BAE aligns model explanations with true annotator decision patterns.
Abstract
Recent works have emerged in multi-annotator learning that shift focus from Consensus-oriented Learning (CoL), which aggregates multiple annotations into a single ground-truth prediction, to Individual Tendency Learning (ITL), which models annotator-specific labeling behavior patterns (i.e., tendency) to provide explanation analysis for understanding annotator decisions. However, no evaluation framework currently exists to assess whether ITL methods truly capture individual tendencies and provide meaningful behavioral explanations. To address this gap, we propose the first unified evaluation framework with two novel metrics: (1) Difference of Inter-annotator Consistency (DIC) quantifies how well models capture annotator tendencies by comparing predicted inter-annotator similarity structures with ground-truth; (2) Behavior Alignment Explainability (BAE) evaluates how well model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
