Advancing Content Moderation: Evaluating Large Language Models for Detecting Sensitive Content Across Text, Images, and Videos
Nouar AlDahoul, Myles Joshua Toledo Tan, Harishwar Reddy Kasireddy,, Yasir Zaki

TL;DR
This paper evaluates large language models for detecting sensitive content across text, images, and videos, demonstrating their superior accuracy over traditional methods and highlighting their potential for scalable content moderation.
Contribution
It provides a comprehensive assessment of LLM-based content moderation tools across multiple media types, showcasing their effectiveness and potential for deployment.
Findings
LLMs outperform traditional detection techniques in accuracy.
LLMs achieve lower false positive and false negative rates.
Evaluation across diverse datasets confirms LLMs' robustness.
Abstract
The widespread dissemination of hate speech, harassment, harmful and sexual content, and violence across websites and media platforms presents substantial challenges and provokes widespread concern among different sectors of society. Governments, educators, and parents are often at odds with media platforms about how to regulate, control, and limit the spread of such content. Technologies for detecting and censoring the media contents are a key solution to addressing these challenges. Techniques from natural language processing and computer vision have been used widely to automatically identify and filter out sensitive content such as offensive languages, violence, nudity, and addiction in both text, images, and videos, enabling platforms to enforce content policies at scale. However, existing methods still have limitations in achieving high detection accuracy with fewer false positives…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Dropout · Discriminative Fine-Tuning · Linear Layer · Cosine Annealing · Attention Dropout · Layer Normalization · Byte Pair Encoding · Adam · Residual Connection
