Analyzing Toxicity in Open Source Software Communications Using Psycholinguistics and Moral Foundations Theory
Ramtin Ehsani, Rezvaneh Rezapour, Preetha Chatterjee

TL;DR
This paper explores a machine learning approach that uses psycholinguistic and moral foundation features to improve toxicity detection in open source software communications, addressing the unique language styles of these channels.
Contribution
It introduces a novel toxicity detection method leveraging moral and psycholinguistic features, achieving up to 7% better F1 scores over existing detectors in OSS contexts.
Findings
Moral features outperform linguistic cues in toxicity detection.
Achieved up to 67.50% F1 in code reviews and 64.83% in issue comments.
Context-specific models are crucial for effective toxicity detection in OSS.
Abstract
Studies have shown that toxic behavior can cause contributors to leave, and hinder newcomers' (especially from underrepresented communities) participation in Open Source Software (OSS) projects. Thus, detection of toxic language plays a crucial role in OSS collaboration and inclusivity. Off-the-shelf toxicity detectors are ineffective when applied to OSS communications, due to the distinct nature of toxicity observed in these channels (e.g., entitlement and arrogance are more frequently observed on GitHub than on Reddit or Twitter). In this paper, we investigate a machine learning-based approach for the automatic detection of toxic communications in OSS. We leverage psycholinguistic lexicons, and Moral Foundations Theory to analyze toxicity in two types of OSS communication channels; issue comments and code reviews. Our evaluation indicates that our approach can achieve a significant…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Techniques and Practices · Software Engineering Research · Information and Cyber Security
