Benchmarking LLMs in Political Content Text-Annotation: Proof-of-Concept   with Toxicity and Incivility Data

Basti\'an Gonz\'alez-Bustamante

arXiv:2409.09741·cs.CL·September 17, 2024·2 cites

Benchmarking LLMs in Political Content Text-Annotation: Proof-of-Concept with Toxicity and Incivility Data

Basti\'an Gonz\'alez-Bustamante

PDF

Open Access 3 Models 2 Datasets

TL;DR

This study benchmarks various large language models, including GPTs and open-source models, on political content annotation tasks related to toxicity and incivility, highlighting performance and reproducibility differences.

Contribution

It introduces a novel protest event dataset and compares the annotation capabilities of multiple LLMs, emphasizing open-source models' reproducibility and efficiency.

Findings

01

GPT-4o and Perspective API outperform others in zero-shot classification.

02

Open-source LLMs like Nous Hermes 2 achieve high performance with fewer parameters.

03

Open-source models ensure full reproducibility, unlike proprietary APIs.

Abstract

This article benchmarked the ability of OpenAI's GPTs and a number of open-source LLMs to perform annotation tasks on political content. We used a novel protest event dataset comprising more than three million digital interactions and created a gold standard that includes ground-truth labels annotated by human coders about toxicity and incivility on social media. We included in our benchmark Google's Perspective algorithm, which, along with GPTs, was employed throughout their respective APIs while the open-source LLMs were deployed locally. The findings show that Perspective API using a laxer threshold, GPT-4o, and Nous Hermes 2 Mixtral outperform other LLM's zero-shot classification annotations. In addition, Nous Hermes 2 and Mistral OpenOrca, with a smaller number of parameters, are able to perform the task with high performance, being attractive options that could offer good…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Law · Natural Language Processing Techniques