KTCR: Improving Implicit Hate Detection with Knowledge Transfer driven   Concept Refinement

Samarth Garg; Vivek Hruday Kavuri; Gargi Shroff; Rahul Mishra

arXiv:2410.15314·cs.CL·April 2, 2025

KTCR: Improving Implicit Hate Detection with Knowledge Transfer driven Concept Refinement

Samarth Garg, Vivek Hruday Kavuri, Gargi Shroff, Rahul Mishra

PDF

Open Access

TL;DR

This paper introduces a Knowledge Transfer-driven Concept Refinement method that enhances implicit hate detection by refining hate-related concepts and selectively augmenting data, leading to improved performance and better generalization across datasets.

Contribution

It proposes a novel concept refinement approach using prototype alignment and concept losses to improve implicit hate detection beyond existing data augmentation methods.

Findings

01

Enhanced detection accuracy on implicit hate datasets

02

Improved cross-dataset generalization capabilities

03

Outperformed baseline models with refined concept techniques

Abstract

The constant shifts in social and political contexts, driven by emerging social movements and political events, lead to new forms of hate content and previously unrecognized hate patterns that machine learning models may not have captured. Some recent literature proposes data augmentation-based techniques to enrich existing hate datasets by incorporating samples that reveal new implicit hate patterns. This approach aims to improve the model's performance on out-of-domain implicit hate instances. It is observed, that further addition of more samples for augmentation results in the decrease of the performance of the model. In this work, we propose a Knowledge Transfer-driven Concept Refinement method that distills and refines the concepts related to implicit hate samples through novel prototype alignment and concept losses, alongside data augmentation based on concept activation vectors.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection