TL;DR
This paper compares two data annotation paradigms for subjective NLP tasks—descriptive, which captures annotator beliefs, and prescriptive, which enforces a single belief—highlighting their benefits, challenges, and implications for dataset use.
Contribution
It introduces and contrasts two annotation paradigms for subjective NLP data, providing a framework for dataset creators to align annotation strategies with their intended application.
Findings
Descriptive annotation captures diverse beliefs about data labels.
Prescriptive annotation enforces consistent labeling according to a single belief.
An experiment with hate speech data illustrates the practical differences between the paradigms.
Abstract
Labelled data is the foundation of most natural language processing tasks. However, labelling data is difficult and there often are diverse valid beliefs about what the correct data labels should be. So far, dataset creators have acknowledged annotator subjectivity, but rarely actively managed it in the annotation process. This has led to partly-subjective datasets that fail to serve a clear downstream use. To address this issue, we propose two contrasting paradigms for data annotation. The descriptive paradigm encourages annotator subjectivity, whereas the prescriptive paradigm discourages it. Descriptive annotation allows for the surveying and modelling of different beliefs, whereas prescriptive annotation enables the training of models that consistently apply one belief. We discuss benefits and challenges in implementing both paradigms, and argue that dataset creators should…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
