Sexism Detection on a Data Diet

Rabiraj Bandyopadhyay; Dennis Assenmacher; Jose M.Alonso Moral,; Claudia Wagner

arXiv:2406.04892·cs.CL·June 10, 2024

Sexism Detection on a Data Diet

Rabiraj Bandyopadhyay, Dennis Assenmacher, Jose M.Alonso Moral,, Claudia Wagner

PDF

Open Access

TL;DR

This paper explores data pruning strategies for sexism detection in social media, showing that removing many data points can maintain performance but also risks amplifying class imbalance and losing hateful content detection.

Contribution

It introduces influence-based data pruning for sexism detection and evaluates its effectiveness and pitfalls compared to prior approaches in NLP.

Findings

01

Large data subsets can be pruned without performance loss.

02

Pruning strategies may worsen class imbalance in harmful content detection.

03

Some pruning methods eliminate all hateful instances, reducing model effectiveness.

Abstract

There is an increase in the proliferation of online hate commensurate with the rise in the usage of social media. In response, there is also a significant advancement in the creation of automated tools aimed at identifying harmful text content using approaches grounded in Natural Language Processing and Deep Learning. Although it is known that training Deep Learning models require a substantial amount of annotated data, recent line of work suggests that models trained on specific subsets of the data still retain performance comparable to the model that was trained on the full dataset. In this work, we show how we can leverage influence scores to estimate the importance of a data point while training a model and designing a pruning strategy applied to the case of sexism detection. We evaluate the model performance trained on data pruned with different pruning strategies on three…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCulinary Culture and Tourism

MethodsPruning