Large Language Models for Propaganda Span Annotation
Maram Hasanain, Fatema Ahmad, Firoj Alam

TL;DR
This paper explores using GPT-4 to extract propaganda spans and generate annotated datasets, demonstrating improved performance and cost-effectiveness, especially for low-resource languages like Arabic.
Contribution
It introduces a novel approach of leveraging LLMs for propaganda span annotation and dataset creation, outperforming human annotators and enabling better training of smaller models.
Findings
GPT-4 with enhanced prompts improves span extraction accuracy.
GPT-4 labels align more closely with expert annotations.
State-of-the-art results achieved on Arabic propaganda detection.
Abstract
The use of propagandistic techniques in online content has increased in recent years aiming to manipulate online audiences. Fine-grained propaganda detection and extraction of textual spans where propaganda techniques are used, are essential for more informed content consumption. Automatic systems targeting the task over lower resourced languages are limited, usually obstructed by lack of large scale training datasets. Our study investigates whether Large Language Models (LLMs), such as GPT-4, can effectively extract propagandistic spans. We further study the potential of employing the model to collect more cost-effective annotations. Finally, we examine the effectiveness of labels provided by GPT-4 in training smaller language models for the task. The experiments are performed over a large-scale in-house manually annotated dataset. The results suggest that providing more annotation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsSentiment Analysis and Opinion Mining · Topic Modeling · Complex Network Analysis Techniques
MethodsMulti-Head Attention · Attention Is All You Need · Adam · Softmax · Dense Connections · Linear Layer · Position-Wise Feed-Forward Layer · Label Smoothing · Absolute Position Encodings · Residual Connection
