Watch Your Language: Investigating Content Moderation with Large Language Models
Deepak Kumar, Yousef AbuHashem, Zakir Durumeric

TL;DR
This paper evaluates large language models' effectiveness in content moderation tasks, demonstrating their strengths in rule-based moderation and toxicity detection, while highlighting a performance plateau with increasing model size.
Contribution
It provides a comprehensive assessment of commodity LLMs on content moderation, revealing their capabilities and limitations in real-world moderation scenarios.
Findings
GPT-3.5 achieves 64% accuracy in rule-based moderation
LLMs outperform existing toxicity classifiers
Model size increases yield marginal improvements in toxicity detection
Abstract
Large language models (LLMs) have exploded in popularity due to their ability to perform a wide array of natural language tasks. Text-based content moderation is one LLM use case that has received recent enthusiasm, however, there is little research investigating how LLMs perform in content moderation settings. In this work, we evaluate a suite of commodity LLMs on two common content moderation tasks: rule-based community moderation and toxic content detection. For rule-based community moderation, we instantiate 95 subcommunity specific LLMs by prompting GPT-3.5 with rules from 95 Reddit subcommunities. We find that GPT-3.5 is effective at rule-based moderation for many communities, achieving a median accuracy of 64% and a median precision of 83%. For toxicity detection, we evaluate a suite of commodity LLMs (GPT-3, GPT-3.5, GPT-4, Gemini Pro, LLAMA 2) and show that LLMs significantly…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection · Topic Modeling · Software Engineering Research
MethodsMulti-Head Attention · 15 Ways to Contact How can i speak to someone at Delta Airlines · Attention Is All You Need · Label Smoothing · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Transformer · GPT-4 · Linear Layer · Attention Dropout
