Beyond Binary Moderation: Identifying Fine-Grained Sexist and Misogynistic Behavior on GitHub with Large Language Models
Tanni Dev, Sayma Sultana, Amiangshu Bosu

TL;DR
This paper presents a multi-class classification framework using instruction-tuned Large Language Models to detect twelve nuanced categories of sexist and misogynistic comments on GitHub, improving moderation accuracy.
Contribution
It introduces a fine-grained, multi-class approach with systematic prompt refinement, advancing beyond binary moderation tools for nuanced harm detection.
Findings
Optimized GPT-4 model achieved MCC of 0.501
Prompt design significantly improved detection accuracy
Low false positives but challenges with nuanced context understanding
Abstract
Background: Sexist and misogynistic behavior significantly hinders inclusion in technical communities like GitHub, causing developers, especially minorities, to leave due to subtle biases and microaggressions. Current moderation tools primarily rely on keyword filtering or binary classifiers, limiting their ability to detect nuanced harm effectively. Aims: This study introduces a fine-grained, multi-class classification framework that leverages instruction-tuned Large Language Models (LLMs) to identify twelve distinct categories of sexist and misogynistic comments on GitHub. Method: We utilized an instruction-tuned LLM-based framework with systematic prompt refinement across 20 iterations, evaluated on 1,440 labeled GitHub comments across twelve sexism/misogyny categories. Model performances were rigorously compared using precision, recall, F1-score, and the Matthews Correlation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
