Watch Your Language: Investigating Content Moderation with Large   Language Models

Deepak Kumar; Yousef AbuHashem; Zakir Durumeric

arXiv:2309.14517·cs.HC·January 18, 2024·2 cites

Watch Your Language: Investigating Content Moderation with Large Language Models

Deepak Kumar, Yousef AbuHashem, Zakir Durumeric

PDF

Open Access

TL;DR

This paper evaluates large language models' effectiveness in content moderation tasks, demonstrating their strengths in rule-based moderation and toxicity detection, while highlighting a performance plateau with increasing model size.

Contribution

It provides a comprehensive assessment of commodity LLMs on content moderation, revealing their capabilities and limitations in real-world moderation scenarios.

Findings

01

GPT-3.5 achieves 64% accuracy in rule-based moderation

02

LLMs outperform existing toxicity classifiers

03

Model size increases yield marginal improvements in toxicity detection

Abstract

Large language models (LLMs) have exploded in popularity due to their ability to perform a wide array of natural language tasks. Text-based content moderation is one LLM use case that has received recent enthusiasm, however, there is little research investigating how LLMs perform in content moderation settings. In this work, we evaluate a suite of commodity LLMs on two common content moderation tasks: rule-based community moderation and toxic content detection. For rule-based community moderation, we instantiate 95 subcommunity specific LLMs by prompting GPT-3.5 with rules from 95 Reddit subcommunities. We find that GPT-3.5 is effective at rule-based moderation for many communities, achieving a median accuracy of 64% and a median precision of 83%. For toxicity detection, we evaluate a suite of commodity LLMs (GPT-3, GPT-3.5, GPT-4, Gemini Pro, LLAMA 2) and show that LLMs significantly…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection · Topic Modeling · Software Engineering Research

MethodsMulti-Head Attention · 15 Ways to Contact How can i speak to someone at Delta Airlines · Attention Is All You Need · Label Smoothing · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Transformer · GPT-4 · Linear Layer · Attention Dropout