PromptAug: Fine-grained Conflict Classification Using Data Augmentation
Oliver Warke, Joemon M. Jose, Faegheh Hasibi, and Jan Breitsohl

TL;DR
PromptAug is a novel LLM-based data augmentation technique that improves conflict detection models on social media by addressing data scarcity and generating diverse, high-quality training data, with significant accuracy gains.
Contribution
This paper introduces PromptAug, a new data augmentation method leveraging LLMs to enhance conflict classification, especially under data scarcity and sensitive content constraints.
Findings
PromptAug improves accuracy and F1-score by 2% on conflict datasets.
It effectively generates diverse and high-quality augmented data.
Thematic analysis reveals four key challenges in augmented text quality.
Abstract
Given the rise of conflicts on social media, effective classification models to detect harmful behaviours are essential. Following the garbage-in-garbage-out maxim, machine learning performance depends heavily on training data quality. However, high-quality labelled data, especially for nuanced tasks like identifying conflict behaviours, is limited, expensive, and difficult to obtain. Additionally, as social media platforms increasingly restrict access to research data, text data augmentation is gaining attention as an alternative to generate training data. Augmenting conflict-related data poses unique challenges due to Large Language Model (LLM) guardrails that prevent generation of offensive content. This paper introduces PromptAug, an innovative LLM-based data augmentation method. PromptAug achieves statistically significant improvements of 2% in both accuracy and F1-score on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection · Sentiment Analysis and Opinion Mining · Misinformation and Its Impacts
