Label-aware Hard Negative Sampling Strategies with Momentum Contrastive   Learning for Implicit Hate Speech Detection

Jaehoon Kim; Seungwan Jin; Sohyun Park; Someen Park; Kyungsik Han

arXiv:2406.07886·cs.CL·June 13, 2024

Label-aware Hard Negative Sampling Strategies with Momentum Contrastive Learning for Implicit Hate Speech Detection

Jaehoon Kim, Seungwan Jin, Sohyun Park, Someen Park, Kyungsik Han

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces Label-aware Hard Negative sampling strategies with momentum contrastive learning to improve implicit hate speech detection, addressing the challenge of learning from hard negatives and outperforming existing models.

Contribution

It proposes LAHN, a novel hard negative sampling method with momentum contrastive learning, enhancing the detection of implicit hate speech beyond prior contrastive learning approaches.

Findings

01

LAHN outperforms existing models on multiple datasets.

02

Hard negative sampling improves feature learning for implicit hate speech.

03

Momentum contrastive learning enhances model robustness.

Abstract

Detecting implicit hate speech that is not directly hateful remains a challenge. Recent research has attempted to detect implicit hate speech by applying contrastive learning to pre-trained language models such as BERT and RoBERTa, but the proposed models still do not have a significant advantage over cross-entropy loss-based learning. We found that contrastive learning based on randomly sampled batch data does not encourage the model to learn hard negative samples. In this work, we propose Label-aware Hard Negative sampling strategies (LAHN) that encourage the model to learn detailed features from hard negative samples, instead of naive negative samples in random batch, using momentum-integrated contrastive learning. LAHN outperforms the existing models for implicit hate speech detection both in- and cross-datasets. The code is available at https://github.com/Hanyang-HCC-Lab/LAHN

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hanyang-hcc-lab/lahn
pytorchOfficial

Videos

Label-aware Hard Negative Sampling Strategies with Momentum Contrastive Learning for Implicit Hate Speech Detection· underline

Taxonomy

TopicsHate Speech and Cyberbullying Detection · Speech and Audio Processing · Speech Recognition and Synthesis

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · WordPiece · Linear Warmup With Linear Decay · Adam · Attention Dropout · Weight Decay · Linear Layer · Multi-Head Attention · Dropout