Sensitive Content Classification in Social Media: A Holistic Resource and Evaluation
Dimosthenis Antypas, Indira Sen, Carla Perez-Almendros, Jose Camacho-Collados, Francesco Barbieri

TL;DR
This paper introduces a comprehensive dataset for detecting six types of sensitive social media content and shows that fine-tuning large language models on this dataset significantly improves detection accuracy over existing models and APIs.
Contribution
The paper presents a new unified dataset for multiple sensitive content categories and demonstrates improved detection performance through fine-tuning large language models.
Findings
Fine-tuned LLMs outperform off-the-shelf models by 10-15%.
Existing moderation APIs underperform on sensitive content detection.
The dataset covers six diverse sensitive categories.
Abstract
The detection of sensitive content in large datasets is crucial for ensuring that shared and analysed data is free from harmful material. However, current moderation tools, such as external APIs, suffer from limitations in customisation, accuracy across diverse sensitive categories, and privacy concerns. Additionally, existing datasets and open-source models focus predominantly on toxic language, leaving gaps in detecting other sensitive categories such as substance abuse or self-harm. In this paper, we put forward a unified dataset tailored for social media content moderation across six sensitive categories: conflictual language, profanity, sexually explicit material, drug-related content, self-harm, and spam. By collecting and annotating data with consistent retrieval strategies and guidelines, we address the shortcomings of previous focalised research. Our analysis demonstrates that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsSentiment Analysis and Opinion Mining
MethodsLLaMA · Focus
