Explaining Matters: Leveraging Definitions and Semantic Expansion for Sexism Detection

Sahrish Khan; Arshad Jhumka; Gabriele Pergola

arXiv:2506.06238·cs.CL·June 9, 2025

Explaining Matters: Leveraging Definitions and Semantic Expansion for Sexism Detection

Sahrish Khan, Arshad Jhumka, Gabriele Pergola

PDF

Open Access 1 Video

TL;DR

This paper introduces prompt-based data augmentation techniques and ensemble strategies to improve sexism detection in online content, addressing data sparsity and nuanced language challenges, achieving state-of-the-art results.

Contribution

It proposes novel definition-based and semantic expansion augmentation methods, along with an ensemble approach, to enhance fine-grained sexism classification performance.

Findings

01

Achieved 1.5 point macro F1 improvement in binary sexism detection.

02

Achieved 4.1 point macro F1 improvement in fine-grained classification.

03

Demonstrated effectiveness of augmentation and ensemble methods on EDOS dataset.

Abstract

The detection of sexism in online content remains an open problem, as harmful language disproportionately affects women and marginalized groups. While automated systems for sexism detection have been developed, they still face two key challenges: data sparsity and the nuanced nature of sexist language. Even in large, well-curated datasets like the Explainable Detection of Online Sexism (EDOS), severe class imbalance hinders model generalization. Additionally, the overlapping and ambiguous boundaries of fine-grained categories introduce substantial annotator disagreement, reflecting the difficulty of interpreting nuanced expressions of sexism. To address these challenges, we propose two prompt-based data augmentation techniques: Definition-based Data Augmentation (DDA), which leverages category-specific definitions to generate semantically-aligned synthetic examples, and Contextual…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Explaining Matters: Leveraging Definitions and Semantic Expansion for Sexism Detection· underline

Taxonomy

TopicsHate Speech and Cyberbullying Detection · Authorship Attribution and Profiling · Topic Modeling