Benchmarking LLMs in Political Content Text-Annotation: Proof-of-Concept with Toxicity and Incivility Data
Basti\'an Gonz\'alez-Bustamante

TL;DR
This study benchmarks various large language models, including GPTs and open-source models, on political content annotation tasks related to toxicity and incivility, highlighting performance and reproducibility differences.
Contribution
It introduces a novel protest event dataset and compares the annotation capabilities of multiple LLMs, emphasizing open-source models' reproducibility and efficiency.
Findings
GPT-4o and Perspective API outperform others in zero-shot classification.
Open-source LLMs like Nous Hermes 2 achieve high performance with fewer parameters.
Open-source models ensure full reproducibility, unlike proprietary APIs.
Abstract
This article benchmarked the ability of OpenAI's GPTs and a number of open-source LLMs to perform annotation tasks on political content. We used a novel protest event dataset comprising more than three million digital interactions and created a gold standard that includes ground-truth labels annotated by human coders about toxicity and incivility on social media. We included in our benchmark Google's Perspective algorithm, which, along with GPTs, was employed throughout their respective APIs while the open-source LLMs were deployed locally. The findings show that Perspective API using a laxer threshold, GPT-4o, and Nous Hermes 2 Mixtral outperform other LLM's zero-shot classification annotations. In addition, Nous Hermes 2 and Mistral OpenOrca, with a smaller number of parameters, are able to perform the task with high performance, being attractive options that could offer good…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Law · Natural Language Processing Techniques
