Multimodal Zero-Shot Framework for Deepfake Hate Speech Detection in Low-Resource Languages
Rishabh Ranjan, Likhith Ayinala, Mayank Vatsa, Richa Singh

TL;DR
This paper presents a multimodal deepfake hate speech detection framework that leverages contrastive learning to align audio and text representations, excelling in low-resource and zero-shot multilingual scenarios.
Contribution
It introduces the first benchmark dataset for multilingual deepfake hate speech detection and proposes a shared semantic embedding model that improves cross-lingual and cross-modal classification.
Findings
Outperforms baseline models with accuracies of 0.819 and 0.701 on two test sets.
Generalizes effectively to unseen languages.
Demonstrates the effectiveness of multimodal contrastive learning in low-resource settings.
Abstract
This paper introduces a novel multimodal framework for hate speech detection in deepfake audio, excelling even in zero-shot scenarios. Unlike previous approaches, our method uses contrastive learning to jointly align audio and text representations across languages. We present the first benchmark dataset with 127,290 paired text and synthesized speech samples in six languages: English and five low-resource Indian languages (Hindi, Bengali, Marathi, Tamil, Telugu). Our model learns a shared semantic embedding space, enabling robust cross-lingual and cross-modal classification. Experiments on two multilingual test sets show our approach outperforms baselines, achieving accuracies of 0.819 and 0.701, and generalizes well to unseen languages. This demonstrates the advantage of combining modalities for hate speech detection in synthetic media, especially in low-resource settings where…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection · Emotion and Mood Recognition · Adversarial Robustness in Machine Learning
