Abusive text transformation using LLMs
Rohitash Chandra, Jiyong Choi

TL;DR
This paper explores using large language models to transform abusive text into non-abusive versions while preserving original intent, evaluating multiple models' effectiveness in maintaining sentiment and semantics.
Contribution
It introduces a novel application of LLMs for abusive text transformation and compares the performance of several state-of-the-art models in this task.
Findings
Groq produces significantly different transformation results.
GPT-4o and DeepSeek-V3 show similar performance.
Transformed texts retain sentiment and semantics effectively.
Abstract
Although Large Language Models (LLMs) have demonstrated significant advancements in natural language processing tasks, their effectiveness in the classification and transformation of abusive text into non-abusive versions remains an area for exploration. In this study, we aim to use LLMs to transform abusive text (tweets and reviews) featuring hate speech and swear words into non-abusive text, while retaining the intent of the text. We evaluate the performance of two state-of-the-art LLMs, such as Gemini, GPT-4o, DeekSeek and Groq, on their ability to identify abusive text. We them to transform and obtain a text that is clean from abusive and inappropriate content but maintains a similar level of sentiment and semantics, i.e. the transformed text needs to maintain its message. Afterwards, we evaluate the raw and transformed datasets with sentiment analysis and semantic analysis. Our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsWeb Application Security Vulnerabilities · Software Engineering Research · Natural Language Processing Techniques
