STAND-Guard: A Small Task-Adaptive Content Moderation Model
Minjia Wang, Pingping Lin, Siqi Cai, Shengnan An, Shengjie Ma, Zeqi, Lin, Congrui Huang, Bixiong Xu

TL;DR
STAND-GUARD is a small, adaptable content moderation model that, through instruct tuning, performs competitively with large models like GPT-3.5 and GPT-4 on diverse and unseen moderation tasks, enabling efficient customization.
Contribution
The paper introduces a small, task-adaptive content moderation model that effectively generalizes to new tasks via instruct tuning, reducing reliance on large models.
Findings
STAND-GUARD matches GPT-3.5-Turbo on 40+ datasets.
It nearly matches GPT-4-Turbo on unseen tasks.
Training task selection and model size impact cross-task performance.
Abstract
Content moderation, the process of reviewing and monitoring the safety of generated content, is important for development of welcoming online platforms and responsible large language models. Content moderation contains various tasks, each with its unique requirements tailored to specific scenarios. Therefore, it is crucial to develop a model that can be easily adapted to novel or customized content moderation tasks accurately without extensive model tuning. This paper presents STAND-GUARD, a Small Task-Adaptive coNtent moDeration model. The basic motivation is: by performing instruct tuning on various content moderation tasks, we can unleash the power of small language models (SLMs) on unseen (out-of-distribution) content moderation tasks. We also carefully study the effects of training tasks and model size on the efficacy of cross-task fine-tuning mechanism. Experiments demonstrate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpam and Phishing Detection · Hate Speech and Cyberbullying Detection · Network Security and Intrusion Detection
Methods15 Ways to Contact How can i speak to someone at Delta Airlines · Attention Is All You Need · Cosine Annealing · Layer Normalization · Adam · Attention Dropout · Linear Layer · Weight Decay · {Dispute@FaQ-s}How to file a dispute with Expedia? · Multi-Head Attention
