A Unified Evaluation Framework for Multi-Annotator Tendency Learning

Liyun Zhang; Fengkai Liu; Xuanmeng Sha; Bowen Wang; Hong Liu; Zheng Lian

arXiv:2508.10393·cs.LG·February 2, 2026

A Unified Evaluation Framework for Multi-Annotator Tendency Learning

Liyun Zhang, Fengkai Liu, Xuanmeng Sha, Bowen Wang, Hong Liu, Zheng Lian

PDF

TL;DR

This paper introduces a comprehensive evaluation framework for multi-annotator tendency learning, focusing on measuring how well models capture individual annotator behaviors and provide meaningful explanations.

Contribution

It proposes the first unified evaluation framework with two novel metrics, DIC and BAE, to assess annotator tendency modeling and explanation relevance in multi-annotator learning.

Findings

01

The framework effectively distinguishes models that accurately capture annotator tendencies.

02

DIC correlates with model performance in representing annotator similarity.

03

BAE aligns model explanations with true annotator decision patterns.

Abstract

Recent works have emerged in multi-annotator learning that shift focus from Consensus-oriented Learning (CoL), which aggregates multiple annotations into a single ground-truth prediction, to Individual Tendency Learning (ITL), which models annotator-specific labeling behavior patterns (i.e., tendency) to provide explanation analysis for understanding annotator decisions. However, no evaluation framework currently exists to assess whether ITL methods truly capture individual tendencies and provide meaningful behavioral explanations. To address this gap, we propose the first unified evaluation framework with two novel metrics: (1) Difference of Inter-annotator Consistency (DIC) quantifies how well models capture annotator tendencies by comparing predicted inter-annotator similarity structures with ground-truth; (2) Behavior Alignment Explainability (BAE) evaluates how well model…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.