Aggregation Artifacts in Subjective Tasks Collapse Large Language Models' Posteriors
Georgios Chochlakis, Alexandros Potamianos, Kristina Lerman, Shrikanth Narayanan

TL;DR
This paper investigates how aggregation artifacts in datasets affect Large Language Models' performance on subjective tasks, revealing that modeling individual annotators can improve alignment and understanding of model biases.
Contribution
The study demonstrates that dataset aggregation introduces noise affecting LLMs' handling of subjective tasks and advocates for modeling individual annotator perspectives instead.
Findings
Aggregation artifacts introduce noise in subjective task datasets.
Modeling individual annotators improves LLM alignment with diverse perspectives.
Aggregation does not fully account for the gap between ICL and state-of-the-art performance.
Abstract
In-context Learning (ICL) has become the primary method for performing natural language tasks with Large Language Models (LLMs). The knowledge acquired during pre-training is crucial for this few-shot capability, providing the model with task priors. However, recent studies have shown that ICL predominantly relies on retrieving task priors rather than "learning" to perform tasks. This limitation is particularly evident in complex subjective domains such as emotion and morality, where priors significantly influence posterior predictions. In this work, we examine whether this is the result of the aggregation used in corresponding datasets, where trying to combine low-agreement, disparate annotations might lead to annotation artifacts that create detrimental noise in the prompt. Moreover, we evaluate the posterior bias towards certain annotators by grounding our study in appropriate,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsNatural Language Processing Techniques
MethodsALIGN
