TL;DR
This paper introduces a novel method using Large Language Models to identify and correct subjective annotation errors in NLP tasks by leveraging label-in-a-haystack prompts, improving label quality.
Contribution
It proposes the Label-in-a-Haystack framework and LiaHR method for subjective label correction, enhancing annotation accuracy in subjective NLP tasks.
Findings
LiaHR effectively identifies annotation errors in subjective tasks.
Human evaluations confirm the accuracy of label corrections.
The method improves data quality for NLP models.
Abstract
Modeling complex subjective tasks in Natural Language Processing, such as recognizing emotion and morality, is considerably challenging due to significant variation in human annotations. This variation often reflects reasonable differences in semantic interpretations rather than mere noise, necessitating methods to distinguish between legitimate subjectivity and error. We address this challenge by exploring label verification in these contexts using Large Language Models (LLMs). First, we propose a simple In-Context Learning binary filtering baseline that estimates the reasonableness of a document-label pair. We then introduce the Label-in-a-Haystack setting: the query and its label(s) are included in the demonstrations shown to LLMs, which are prompted to predict the label(s) again, while receiving task-specific instructions (e.g., emotion recognition) rather than label copying. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
