Detecting Sexism in German Online Newspaper Comments with Open-Source   Text Embeddings (Team GDA, GermEval2024 Shared Task 1: GerMS-Detect, Subtasks   1 and 2, Closed Track)

Florian Bremm; Patrick Gustav Blaneck; Tobias Bornheim; Niklas; Grieger; Stephan Bialonski

arXiv:2409.10341·cs.CL·October 3, 2024

Detecting Sexism in German Online Newspaper Comments with Open-Source Text Embeddings (Team GDA, GermEval2024 Shared Task 1: GerMS-Detect, Subtasks 1 and 2, Closed Track)

Florian Bremm, Patrick Gustav Blaneck, Tobias Bornheim, Niklas, Grieger, Stephan Bialonski

PDF

Open Access 1 Repo

TL;DR

This paper presents a method using open-source text embeddings to detect sexism in German online comments, achieving competitive results in a shared task and demonstrating potential for scalable, multilingual applications.

Contribution

It introduces a classifier trained on text embeddings that closely mimics human judgments of sexism, showing robust performance in a competitive benchmark.

Findings

01

Achieved an average macro F1 score of 0.597 in sexism detection

02

Predicted human annotation distributions with Jensen-Shannon distance of 0.301

03

Demonstrated computational efficiency suitable for scalable multilingual use

Abstract

Sexism in online media comments is a pervasive challenge that often manifests subtly, complicating moderation efforts as interpretations of what constitutes sexism can vary among individuals. We study monolingual and multilingual open-source text embeddings to reliably detect sexism and misogyny in German-language online comments from an Austrian newspaper. We observed classifiers trained on text embeddings to mimic closely the individual judgements of human annotators. Our method showed robust performance in the GermEval 2024 GerMS-Detect Subtask 1 challenge, achieving an average macro F1 score of 0.597 (4th place, as reported on Codabench). It also accurately predicted the distribution of human annotations in GerMS-Detect Subtask 2, with an average Jensen-Shannon distance of 0.301 (2nd place). The computational efficiency of our approach suggests potential for scalable applications…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

dslaborg/germeval2024
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection · Gender Studies in Language · Wikis in Education and Collaboration