ReSeTOX: Re-learning attention weights for toxicity mitigation in   machine translation

Javier Garc\'ia Gilabert; Carlos Escolano; Marta R. Costa-Juss\`a

arXiv:2305.11761·cs.CL·May 22, 2023·1 cites

ReSeTOX: Re-learning attention weights for toxicity mitigation in machine translation

Javier Garc\'ia Gilabert, Carlos Escolano, Marta R. Costa-Juss\`a

PDF

Open Access 1 Repo

TL;DR

ReSeTOX is a method that dynamically adjusts attention weights during inference in neural machine translation to significantly reduce toxic language generation without retraining the model.

Contribution

It introduces a novel inference-time technique to mitigate toxicity in NMT by re-learning attention weights, avoiding the need for re-training.

Findings

01

57% reduction in added toxicity

02

Maintains 99.5% translation quality

03

Effective across 164 languages

Abstract

Our proposed method, ReSeTOX (REdo SEarch if TOXic), addresses the issue of Neural Machine Translation (NMT) generating translation outputs that contain toxic words not present in the input. The objective is to mitigate the introduction of toxic language without the need for re-training. In the case of identified added toxicity during the inference process, ReSeTOX dynamically adjusts the key-value self-attention weights and re-evaluates the beam search hypotheses. Experimental results demonstrate that ReSeTOX achieves a remarkable 57% reduction in added toxicity while maintaining an average translation quality of 99.5% across 164 languages.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mt-upc/resetox
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Adversarial Robustness in Machine Learning