Goodtriever: Adaptive Toxicity Mitigation with Retrieval-augmented Models
Luiza Pozzobon, Beyza Ermis, Patrick Lewis, Sara Hooker

TL;DR
Goodtriever is a retrieval-augmented method for toxicity mitigation in language models that adapts to evolving language, reduces latency, and improves computational efficiency during toxicity-controlled text generation.
Contribution
It introduces a flexible retrieval-based approach for toxicity mitigation that adapts to language evolution and reduces inference latency.
Findings
Achieves 43% latency reduction during inference
Matches state-of-the-art toxicity mitigation performance
Supports toxicity-controlled text generation
Abstract
Considerable effort has been dedicated to mitigating toxicity, but existing methods often require drastic modifications to model parameters or the use of computationally intensive auxiliary models. Furthermore, previous approaches have often neglected the crucial factor of language's evolving nature over time. In this work, we present a comprehensive perspective on toxicity mitigation that takes into account its changing nature. We introduce Goodtriever, a flexible methodology that matches the current state-of-the-art toxicity mitigation while achieving 43% relative latency reduction during inference and being more computationally efficient. By incorporating a retrieval-based approach at decoding time, Goodtriever enables toxicity-controlled text generation. Our research advocates for an increased focus on adaptable mitigation techniques, which better reflect the data drift models face…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Natural Language Processing Techniques · Topic Modeling
MethodsFocus
