Hybrid Annotation for Propaganda Detection: Integrating LLM Pre-Annotations with Human Intelligence
Ariana Sahitaj, Premtim Sahitaj, Veronika Solopova, Jiaao Li, Sebastian M\"oller, Vera Schmitt

TL;DR
This paper presents a hybrid annotation framework combining human expertise and Large Language Model assistance to improve propaganda detection, achieving higher consistency and efficiency in labeling social media data.
Contribution
It introduces a hierarchical taxonomy, an LLM-assisted pre-annotation pipeline, and a knowledge distillation approach to enhance scalable propaganda annotation.
Findings
Improved inter-annotator agreement with LLM assistance
Significant reduction in annotation time
Effective training of smaller models on high-quality LLM-generated data
Abstract
Propaganda detection on social media remains challenging due to task complexity and limited high-quality labeled data. This paper introduces a novel framework that combines human expertise with Large Language Model (LLM) assistance to improve both annotation consistency and scalability. We propose a hierarchical taxonomy that organizes 14 fine-grained propaganda techniques into three broader categories, conduct a human annotation study on the HQP dataset that reveals low inter-annotator agreement for fine-grained labels, and implement an LLM-assisted pre-annotation pipeline that extracts propagandistic spans, generates concise explanations, and assigns local labels as well as a global label. A secondary human verification study shows significant improvements in both agreement and time-efficiency. Building on this, we fine-tune smaller language models (SLMs) to perform structured…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Software Engineering Research
