Can GPT-4 Identify Propaganda? Annotation and Detection of Propaganda Spans in News Articles
Maram Hasanain, Fatema Ahmed, Firoj Alam

TL;DR
This paper introduces ArPro, the largest annotated dataset for propaganda spans in news articles, and evaluates GPT-4's ability to detect and classify propaganda techniques across multiple languages, revealing current limitations.
Contribution
It provides the first large-scale propaganda dataset with span-level annotations and assesses GPT-4's performance on fine-grained propaganda detection tasks.
Findings
GPT-4's performance decreases with task complexity
GPT-4 underperforms compared to fine-tuned models
Multilingual span detection remains challenging for GPT-4
Abstract
The use of propaganda has spiked on mainstream and social media, aiming to manipulate or mislead users. While efforts to automatically detect propaganda techniques in textual, visual, or multimodal content have increased, most of them primarily focus on English content. The majority of the recent initiatives targeting medium to low-resource languages produced relatively small annotated datasets, with a skewed distribution, posing challenges for the development of sophisticated propaganda detection models. To address this challenge, we carefully develop the largest propaganda dataset to date, ArPro, comprised of 8K paragraphs from newspaper articles, labeled at the text span level following a taxonomy of 23 propagandistic techniques. Furthermore, our work offers the first attempt to understand the performance of large language models (LLMs), using GPT-4, for fine-grained propaganda…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMisinformation and Its Impacts · Hate Speech and Cyberbullying Detection · Topic Modeling
MethodsAttention Is All You Need · Linear Layer · Dropout · Layer Normalization · Byte Pair Encoding · Multi-Head Attention · Dense Connections · Label Smoothing · Adam · Softmax
