Advancing Content Moderation: Evaluating Large Language Models for   Detecting Sensitive Content Across Text, Images, and Videos

Nouar AlDahoul; Myles Joshua Toledo Tan; Harishwar Reddy Kasireddy,; Yasir Zaki

arXiv:2411.17123·cs.CV·November 27, 2024·2 cites

Advancing Content Moderation: Evaluating Large Language Models for Detecting Sensitive Content Across Text, Images, and Videos

Nouar AlDahoul, Myles Joshua Toledo Tan, Harishwar Reddy Kasireddy,, Yasir Zaki

PDF

Open Access

TL;DR

This paper evaluates large language models for detecting sensitive content across text, images, and videos, demonstrating their superior accuracy over traditional methods and highlighting their potential for scalable content moderation.

Contribution

It provides a comprehensive assessment of LLM-based content moderation tools across multiple media types, showcasing their effectiveness and potential for deployment.

Findings

01

LLMs outperform traditional detection techniques in accuracy.

02

LLMs achieve lower false positive and false negative rates.

03

Evaluation across diverse datasets confirms LLMs' robustness.

Abstract

The widespread dissemination of hate speech, harassment, harmful and sexual content, and violence across websites and media platforms presents substantial challenges and provokes widespread concern among different sectors of society. Governments, educators, and parents are often at odds with media platforms about how to regulate, control, and limit the spread of such content. Technologies for detecting and censoring the media contents are a key solution to addressing these challenges. Techniques from natural language processing and computer vision have been used widely to automatically identify and filter out sensitive content such as offensive languages, violence, nudity, and addiction in both text, images, and videos, enabling platforms to enforce content policies at scale. However, existing methods still have limitations in achieving high detection accuracy with fewer false positives…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Dropout · Discriminative Fine-Tuning · Linear Layer · Cosine Annealing · Attention Dropout · Layer Normalization · Byte Pair Encoding · Adam · Residual Connection