Advancing Annotation of Stance in Social Media Posts: A Comparative Analysis of Large Language Models and Crowd Sourcing
Mao Li, Frederick Conrad

TL;DR
This study evaluates the effectiveness of eight large language models in annotating stance in social media posts, comparing their performance to human crowd-sourced judgments and identifying conditions affecting their accuracy.
Contribution
It provides a comprehensive benchmark of LLMs against human annotations for stance detection and analyzes factors influencing LLMs' agreement with humans.
Findings
LLMs perform well when stance expressions are explicit.
Disagreements often occur when humans also struggle to reach consensus.
Explicitness of stance significantly impacts LLM accuracy.
Abstract
In the rapidly evolving landscape of Natural Language Processing (NLP), the use of Large Language Models (LLMs) for automated text annotation in social media posts has garnered significant interest. Despite the impressive innovations in developing LLMs like ChatGPT, their efficacy, and accuracy as annotation tools are not well understood. In this paper, we analyze the performance of eight open-source and proprietary LLMs for annotating the stance expressed in social media posts, benchmarking their performance against human annotators' (i.e., crowd-sourced) judgments. Additionally, we investigate the conditions under which LLMs are likely to disagree with human judgment. A significant finding of our study is that the explicitness of text expressing a stance plays a critical role in how faithfully LLMs' stance judgments match humans'. We argue that LLMs perform well when human annotators…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital Communication and Language · Digital Marketing and Social Media · Sentiment Analysis and Opinion Mining
