ToxiSpanSE: An Explainable Toxicity Detection in Code Review Comments

Jaydeb Saker; Sayma Sultana; Steven R. Wilson; Amiangshu Bosu

arXiv:2307.03386·cs.SE·July 10, 2023

ToxiSpanSE: An Explainable Toxicity Detection in Code Review Comments

Jaydeb Saker, Sayma Sultana, Steven R. Wilson, Amiangshu Bosu

PDF

Open Access 1 Repo

TL;DR

ToxiSpanSE is an explainable toxicity detection tool for code review comments that highlights toxic spans, aiding moderators in understanding and managing toxic conversations in software engineering.

Contribution

It introduces the first toxic span detection model tailored for the SE domain, utilizing transformer-based models to improve explainability and accuracy.

Findings

01

Best model achieved 0.88 F1 score for toxic span detection

02

Fine-tuned RoBERTa outperformed other models in evaluation

03

Provides an explainable tool to help mitigate toxicity in SE communities

Abstract

Background: The existence of toxic conversations in open-source platforms can degrade relationships among software developers and may negatively impact software product quality. To help mitigate this, some initial work has been done to detect toxic comments in the Software Engineering (SE) domain. Aims: Since automatically classifying an entire text as toxic or non-toxic does not help human moderators to understand the specific reason(s) for toxicity, we worked to develop an explainable toxicity detector for the SE domain. Method: Our explainable toxicity detector can detect specific spans of toxic content from SE texts, which can help human moderators by automatically highlighting those spans. This toxic span detection model, ToxiSpanSE, is trained with the 19,651 code review (CR) comments with labeled toxic spans. Our annotators labeled the toxic spans within 3,757 toxic CR samples.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

wsu-seal/toxispanse
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Software Reliability and Analysis Research