Hybrid Annotation for Propaganda Detection: Integrating LLM Pre-Annotations with Human Intelligence

Ariana Sahitaj; Premtim Sahitaj; Veronika Solopova; Jiaao Li; Sebastian M\"oller; Vera Schmitt

arXiv:2507.18343·cs.CL·July 25, 2025

Hybrid Annotation for Propaganda Detection: Integrating LLM Pre-Annotations with Human Intelligence

Ariana Sahitaj, Premtim Sahitaj, Veronika Solopova, Jiaao Li, Sebastian M\"oller, Vera Schmitt

PDF

Open Access 1 Video

TL;DR

This paper presents a hybrid annotation framework combining human expertise and Large Language Model assistance to improve propaganda detection, achieving higher consistency and efficiency in labeling social media data.

Contribution

It introduces a hierarchical taxonomy, an LLM-assisted pre-annotation pipeline, and a knowledge distillation approach to enhance scalable propaganda annotation.

Findings

01

Improved inter-annotator agreement with LLM assistance

02

Significant reduction in annotation time

03

Effective training of smaller models on high-quality LLM-generated data

Abstract

Propaganda detection on social media remains challenging due to task complexity and limited high-quality labeled data. This paper introduces a novel framework that combines human expertise with Large Language Model (LLM) assistance to improve both annotation consistency and scalability. We propose a hierarchical taxonomy that organizes 14 fine-grained propaganda techniques into three broader categories, conduct a human annotation study on the HQP dataset that reveals low inter-annotator agreement for fine-grained labels, and implement an LLM-assisted pre-annotation pipeline that extracts propagandistic spans, generates concise explanations, and assigns local labels as well as a global label. A secondary human verification study shows significant improvements in both agreement and time-efficiency. Building on this, we fine-tune smaller language models (SLMs) to perform structured…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Hybrid Annotation for Propaganda Detection: Integrating LLM Pre-Annotations with Human Intelligence· underline

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Software Engineering Research