Towards A Friendly Online Community: An Unsupervised Style Transfer Framework for Profanity Redaction
Minh Tran, Yipeng Zhang, Mohammad Soleymani

TL;DR
This paper introduces an unsupervised style transfer framework that effectively redacts offensive language in social media comments, maintaining fluency and content integrity, and outperforms previous models in evaluations.
Contribution
The authors propose a novel unsupervised pipeline combining retrieval, generation, and editing to redact offensive language while preserving original content and fluency.
Findings
Outperforms previous models on human evaluations.
Consistently performs well on all automatic metrics.
Effectively redacts offensive language without losing content.
Abstract
Offensive and abusive language is a pressing problem on social media platforms. In this work, we propose a method for transforming offensive comments, statements containing profanity or offensive language, into non-offensive ones. We design a RETRIEVE, GENERATE and EDIT unsupervised style transfer pipeline to redact the offensive comments in a word-restricted manner while maintaining a high level of fluency and preserving the content of the original text. We extensively evaluate our method's performance and compare it to previous style transfer models using both automatic metrics and human evaluations. Experimental results show that our method outperforms other models on human evaluations and is the only approach that consistently performs well on all automatic evaluation metrics.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
