SLM-Mod: Small Language Models Surpass LLMs at Content Moderation
Xianyang Zhan, Agam Goyal, Yilun Chen, Eshwar Chandrasekharan, Koustuv, Saha

TL;DR
This paper demonstrates that small open-source language models, when fine-tuned, can outperform larger models in community-specific content moderation tasks, offering a cost-effective and adaptable alternative.
Contribution
The study shows that small language models (<15B parameters) can surpass larger models in content moderation accuracy and recall, especially in community-specific contexts.
Findings
SLMs outperform zero-shot LLMs in accuracy and recall
Few-shot LLMs show marginal performance gains
Cross-community moderation is promising for new platforms
Abstract
Large language models (LLMs) have shown promise in many natural language understanding tasks, including content moderation. However, these models can be expensive to query in real-time and do not allow for a community-specific approach to content moderation. To address these challenges, we explore the use of open-source small language models (SLMs) for community-specific content moderation tasks. We fine-tune and evaluate SLMs (less than 15B parameters) by comparing their performance against much larger open- and closed-sourced models in both a zero-shot and few-shot setting. Using 150K comments from 15 popular Reddit communities, we find that SLMs outperform zero-shot LLMs at content moderation -- 11.5% higher accuracy and 25.7% higher recall on average across all communities. Moreover, few-shot in-context learning leads to only a marginal increase in the performance of LLMs, still…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Hate Speech and Cyberbullying Detection
