Online Anti-sexist Speech: Identifying Resistance to Gender Bias in Political Discourse
Aditi Dutta, Susan Banducci

TL;DR
This paper investigates how large language models struggle to accurately classify anti-sexist political speech online, risking the silencing of resistance voices during sensitive, high-stakes events.
Contribution
It reveals the limitations of current LLMs in recognizing anti-sexist speech and proposes sociotechnical solutions for improved moderation that respects resistance to sexism.
Findings
Models often misclassify anti-sexist speech as harmful.
Misclassification increases during politically charged events.
Recommendations include human review and training data adjustments.
Abstract
Anti-sexist speech, i.e., public expressions that challenge or resist gendered abuse and sexism, plays a vital role in shaping democratic debate online. Yet automated content moderation systems, increasingly powered by large language models (LLMs), may struggle to distinguish such resistance from the sexism it opposes. This study examines how five LLMs classify sexist, anti-sexist, and neutral political tweets from the UK, focusing on high-salience trigger events involving female Members of Parliament in the year 2022. Our analysis show that models frequently misclassify anti-sexist speech as harmful, particularly during politically charged events where rhetorical styles of harm and resistance converge. These errors risk silencing those who challenge sexism, with disproportionate consequences for marginalised voices. We argue that moderation design must move beyond binary…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
