Aggregation Artifacts in Subjective Tasks Collapse Large Language Models' Posteriors

Georgios Chochlakis; Alexandros Potamianos; Kristina Lerman; Shrikanth Narayanan

arXiv:2410.13776·cs.CL·September 15, 2025

Aggregation Artifacts in Subjective Tasks Collapse Large Language Models' Posteriors

Georgios Chochlakis, Alexandros Potamianos, Kristina Lerman, Shrikanth Narayanan

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper investigates how aggregation artifacts in datasets affect Large Language Models' performance on subjective tasks, revealing that modeling individual annotators can improve alignment and understanding of model biases.

Contribution

The study demonstrates that dataset aggregation introduces noise affecting LLMs' handling of subjective tasks and advocates for modeling individual annotator perspectives instead.

Findings

01

Aggregation artifacts introduce noise in subjective task datasets.

02

Modeling individual annotators improves LLM alignment with diverse perspectives.

03

Aggregation does not fully account for the gap between ICL and state-of-the-art performance.

Abstract

In-context Learning (ICL) has become the primary method for performing natural language tasks with Large Language Models (LLMs). The knowledge acquired during pre-training is crucial for this few-shot capability, providing the model with task priors. However, recent studies have shown that ICL predominantly relies on retrieving task priors rather than "learning" to perform tasks. This limitation is particularly evident in complex subjective domains such as emotion and morality, where priors significantly influence posterior predictions. In this work, we examine whether this is the result of the aggregation used in corresponding datasets, where trying to combine low-agreement, disparate annotations might lead to annotation artifacts that create detrimental noise in the prompt. Moreover, we evaluate the posterior bias towards certain annotators by grounding our study in appropriate,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

gchochla/aggregation-artifacts-llms
pytorchOfficial

Videos

Aggregation Artifacts in Subjective Tasks Collapse Large Language Models' Posteriors· underline

Taxonomy

TopicsNatural Language Processing Techniques

MethodsALIGN