The Effectiveness of LLMs as Annotators: A Comparative Overview and Empirical Analysis of Direct Representation
Maja Pavlovic, Massimo Poesio

TL;DR
This paper reviews and empirically evaluates the use of Large Language Models for data annotation, highlighting their benefits and limitations, and emphasizing the importance of diverse perspective consideration.
Contribution
It provides a comparative overview of existing studies on LLMs as annotators and introduces an empirical analysis of GPT's opinion distribution alignment with humans.
Findings
LLMs offer cost and time savings in data annotation
Significant limitations include bias and sensitivity to prompts
Empirical analysis shows partial alignment between GPT and human opinions
Abstract
Large Language Models (LLMs) have emerged as powerful support tools across various natural language tasks and a range of application domains. Recent studies focus on exploring their capabilities for data annotation. This paper provides a comparative overview of twelve studies investigating the potential of LLMs in labelling data. While the models demonstrate promising cost and time-saving benefits, there exist considerable limitations, such as representativeness, bias, sensitivity to prompt variations and English language preference. Leveraging insights from these studies, our empirical analysis further examines the alignment between human and GPT-generated opinion distributions across four subjective datasets. In contrast to the studies examining representation, our methodology directly obtains the opinion distribution from GPT. Our analysis thereby supports the minority of studies…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
MethodsAttention Is All You Need · Linear Layer · Discriminative Fine-Tuning · Adam · Layer Normalization · Multi-Head Attention · Cosine Annealing · Dense Connections · Weight Decay · Linear Warmup With Cosine Annealing
