Towards Agile Text Classifiers for Everyone
Maximilian Mozes, Jessica Hoffmann, Katrin Tomanek, Muhamed Kouate,, Nithum Thain, Ann Yuan, Tolga Bolukbasi, Lucas Dixon

TL;DR
This paper presents methods for agile text classifiers that can be quickly trained with small datasets, enabling rapid adaptation to evolving safety policies in online content moderation and AI safety.
Contribution
It demonstrates that prompt-tuning large language models with minimal data can achieve state-of-the-art safety classification performance, facilitating rapid and tailored safety classifier development.
Findings
Prompt-tuning PaLM 62B with 80 examples achieves SOTA performance.
Small datasets enable rapid, tailored safety classifier creation.
Agile classifiers support safer online discourse and quick policy updates.
Abstract
Text-based safety classifiers are widely used for content moderation and increasingly to tune generative language model behavior - a topic of growing concern for the safety of digital assistants and chatbots. However, different policies require different classifiers, and safety policies themselves improve from iteration and adaptation. This paper introduces and evaluates methods for agile text classification, whereby classifiers are trained using small, targeted datasets that can be quickly developed for a particular policy. Experimenting with 7 datasets from three safety-related domains, comprising 15 annotation schemes, led to our key finding: prompt-tuning large language models, like PaLM 62B, with a labeled dataset of as few as 80 examples can achieve state-of-the-art performance. We argue that this enables a paradigm shift for text classification, especially for models supporting…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection · Text Readability and Simplification · Topic Modeling
MethodsPathways Language Model
