Just Put a Human in the Loop? Investigating LLM-Assisted Annotation for Subjective Tasks
Hope Schroeder, Deb Roy, Jad Kabbara

TL;DR
This study investigates how LLM-assisted annotation influences subjective task labels, annotator confidence, and model evaluation, revealing significant impacts on label distribution and performance metrics in social science and NLP contexts.
Contribution
It provides empirical evidence on the effects of LLM assistance in subjective annotation, highlighting changes in label distribution and evaluation outcomes.
Findings
LLM suggestions increase annotator confidence.
LLM assistance significantly alters label distribution.
Model performance metrics improve with LLM-assisted labels.
Abstract
LLM use in annotation is becoming widespread, and given LLMs' overall promising performance and speed, simply "reviewing" LLM annotations in interpretive tasks can be tempting. In subjective annotation tasks with multiple plausible answers, reviewing LLM outputs can change the label distribution, impacting both the evaluation of LLM performance, and analysis using these labels in a social science task downstream. We conducted a pre-registered experiment with 410 unique annotators and over 7,000 annotations testing three AI assistance conditions against controls, using two models, and two datasets. We find that presenting crowdworkers with LLM-generated annotation suggestions did not make them faster, but did improve their self-reported confidence in the task. More importantly, annotators strongly took the LLM suggestions, significantly changing the label distribution compared to the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
