Beyond One-Size-Fits-All: Personalized Harmful Content Detection with In-Context Learning
Rufan Zhang, Lin Zhang, Xianghang Mi

TL;DR
This paper introduces a personalized content moderation framework using in-context learning with foundation models, enabling adaptable, privacy-preserving detection of harmful online content across various categories with minimal user input.
Contribution
It presents a novel ICL-based approach for unified, personalized moderation that requires no retraining and adapts quickly with few examples or definitions.
Findings
Foundation models generalize well across tasks.
Personalization effective with one example.
Prompt augmentation improves robustness.
Abstract
The proliferation of harmful online content--e.g., toxicity, spam, and negative sentiment--demands robust and adaptable moderation systems. However, prevailing moderation systems are centralized and task-specific, offering limited transparency and neglecting diverse user preferences--an approach ill-suited for privacy-sensitive or decentralized environments. We propose a novel framework that leverages in-context learning (ICL) with foundation models to unify the detection of toxicity, spam, and negative sentiment across binary, multi-class, and multi-label settings. Crucially, our approach enables lightweight personalization, allowing users to easily block new categories, unblock existing ones, or extend detection to semantic variations through simple prompt-based interventions--all without model retraining. Extensive experiments on public benchmarks (TextDetox, UCI SMS, SST2) and a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection · Spam and Phishing Detection · Misinformation and Its Impacts
