Like a Good Nearest Neighbor: Practical Content Moderation and Text Classification
Luke Bates, Iryna Gurevych

TL;DR
This paper introduces LaGoNN, a parameter-free modification to SetFit that enhances text classification and content moderation by incorporating nearest neighbor information, improving performance especially in multilingual and domain drift scenarios.
Contribution
LaGoNN is a novel, parameter-free method that modifies input texts with neighbor information to improve text classification and content moderation.
Findings
LaGoNN improves SetFit's performance in content moderation tasks.
LaGoNN effectively flags undesirable content across multiple languages.
The method enhances classification robustness under domain drift.
Abstract
Few-shot text classification systems have impressive capabilities but are infeasible to deploy and use reliably due to their dependence on prompting and billion-parameter language models. SetFit (Tunstall et al., 2022) is a recent, practical approach that fine-tunes a Sentence Transformer under a contrastive learning paradigm and achieves similar results to more unwieldy systems. Inexpensive text classification is important for addressing the problem of domain drift in all classification tasks, and especially in detecting harmful content, which plagues social media platforms. Here, we propose Like a Good Nearest Neighbor (LaGoNN), a modification to SetFit that introduces no learnable parameters but alters input text with information from its nearest neighbor, for example, the label and text, in the training data, making novel data appear similar to an instance on which the model was…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Label Smoothing · Dense Connections · Absolute Position Encodings · Adam · Softmax · Position-Wise Feed-Forward Layer · Residual Connection
