ToxiShield: Promoting Inclusive Developer Communication through Real-Time Toxicity Filtering
MD Awsaf Alam Anindya, Showvik Biswas, Anindya Iqbal, Jaydeb Sarker, Amiangshu Bosu

TL;DR
ToxiShield is a real-time browser extension for GitHub that detects, explains, and reframes toxic comments in code reviews to promote inclusive and constructive developer communication.
Contribution
It introduces a multi-module system combining deep learning and large language models for toxicity detection, explanation, and rewriting in software engineering collaboration.
Findings
The BERT-based toxicity detector achieved 98% accuracy and 97% F1-score.
Claude 3.5 Sonnet effectively classifies toxicity with 42% F1.
Llama 3.2 generated 95.27% accurate, fluent, and content-preserving reframed comments.
Abstract
Toxic interactions during code reviews can undermine teamwork and hinder productivity in software engineering (SE) teams. While prior studies explore toxicity detection and empirical investigation, they lack real-time detoxification tools to support the SE community. To address this gap, we present ToxiShield, a browser extension for GitHub pull requests that is built using three modules: i) Toxicity Filter -- to identify whether a text is toxic, ii) Communication coach -- to facilitate just-in-time fine-grained toxicity categorization with explanations, and iii) The Reframer -- that generates a revised, constructive alternative of a toxic text. For each module, we trained and evaluated multiple deep learning and Large Language Models (LLMs) to identify the best choice. A BERT-based binary detection model, trained on 38,761 code review samples, achieves 98% accuracy and an F1-score of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
