Russian Texts Detoxification with Levenshtein Editing
Ilya Gusev

TL;DR
This paper presents a two-step tagging-based model for detoxifying toxic Russian texts by creating neutral versions, achieving state-of-the-art style transfer accuracy in a shared task.
Contribution
The paper introduces a novel tagging-based detoxification approach that outperforms larger sequence-to-sequence models on Russian texts.
Findings
Achieved highest style transfer accuracy in RUSSE Detox shared task.
Surpassed larger sequence-to-sequence models in detoxification quality.
Demonstrated effectiveness of a two-step tagging approach for text detoxification.
Abstract
Text detoxification is a style transfer task of creating neutral versions of toxic texts. In this paper, we use the concept of text editing to build a two-step tagging-based detoxification model using a parallel corpus of Russian texts. With this model, we achieved the best style transfer accuracy among all models in the RUSSE Detox shared task, surpassing larger sequence-to-sequence models.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling
