Towards Safer Social Media Platforms: Scalable and Performant Few-Shot   Harmful Content Moderation Using Large Language Models

Akash Bonagiri; Lucen Li; Rajvardhan Oak; Zeerak Babar; Magdalena; Wojcieszak; Anshuman Chhabra

arXiv:2501.13976·cs.CL·January 27, 2025

Towards Safer Social Media Platforms: Scalable and Performant Few-Shot Harmful Content Moderation Using Large Language Models

Akash Bonagiri, Lucen Li, Rajvardhan Oak, Zeerak Babar, Magdalena, Wojcieszak, Anshuman Chhabra

PDF

Open Access

TL;DR

This paper explores using large language models with few-shot learning to improve scalable, dynamic harmful content moderation on social media, outperforming existing methods and incorporating multimodal data for enhanced effectiveness.

Contribution

It introduces a novel few-shot content moderation approach using LLMs, demonstrating superior performance over existing baselines and exploring multimodal techniques for better accuracy.

Findings

01

Few-shot LLM approaches outperform proprietary baselines.

02

Incorporating visual data improves moderation accuracy.

03

Scalable methods effectively adapt to evolving harmful content.

Abstract

The prevalence of harmful content on social media platforms poses significant risks to users and society, necessitating more effective and scalable content moderation strategies. Current approaches rely on human moderators, supervised classifiers, and large volumes of training data, and often struggle with scalability, subjectivity, and the dynamic nature of harmful content (e.g., violent content, dangerous challenge trends, etc.). To bridge these gaps, we utilize Large Language Models (LLMs) to undertake few-shot dynamic content moderation via in-context learning. Through extensive experiments on multiple LLMs, we demonstrate that our few-shot approaches can outperform existing proprietary baselines (Perspective and OpenAI Moderation) as well as prior state-of-the-art few-shot learning methods, in identifying harm. We also incorporate visual information (video thumbnails) and assess if…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection