Large Language Models for Propaganda Span Annotation

Maram Hasanain; Fatema Ahmad; Firoj Alam

arXiv:2311.09812·cs.CL·October 8, 2024·2 cites

Large Language Models for Propaganda Span Annotation

Maram Hasanain, Fatema Ahmad, Firoj Alam

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper explores using GPT-4 to extract propaganda spans and generate annotated datasets, demonstrating improved performance and cost-effectiveness, especially for low-resource languages like Arabic.

Contribution

It introduces a novel approach of leveraging LLMs for propaganda span annotation and dataset creation, outperforming human annotators and enabling better training of smaller models.

Findings

01

GPT-4 with enhanced prompts improves span extraction accuracy.

02

GPT-4 labels align more closely with expert annotations.

03

State-of-the-art results achieved on Arabic propaganda detection.

Abstract

The use of propagandistic techniques in online content has increased in recent years aiming to manipulate online audiences. Fine-grained propaganda detection and extraction of textual spans where propaganda techniques are used, are essential for more informed content consumption. Automatic systems targeting the task over lower resourced languages are limited, usually obstructed by lack of large scale training datasets. Our study investigates whether Large Language Models (LLMs), such as GPT-4, can effectively extract propagandistic spans. We further study the potential of employing the model to collect more cost-effective annotations. Finally, we examine the effectiveness of labels provided by GPT-4 in training smaller language models for the task. The experiments are performed over a large-scale in-house manually annotated dataset. The results suggest that providing more annotation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

MaramHasanain/llm_prop_annot
noneOfficial

Videos

Large Language Models for Propaganda Span Annotation· underline

Taxonomy

TopicsSentiment Analysis and Opinion Mining · Topic Modeling · Complex Network Analysis Techniques

MethodsMulti-Head Attention · Attention Is All You Need · Adam · Softmax · Dense Connections · Linear Layer · Position-Wise Feed-Forward Layer · Label Smoothing · Absolute Position Encodings · Residual Connection