Multimodal Zero-Shot Framework for Deepfake Hate Speech Detection in Low-Resource Languages

Rishabh Ranjan; Likhith Ayinala; Mayank Vatsa; Richa Singh

arXiv:2506.08372·cs.SD·June 11, 2025

Multimodal Zero-Shot Framework for Deepfake Hate Speech Detection in Low-Resource Languages

Rishabh Ranjan, Likhith Ayinala, Mayank Vatsa, Richa Singh

PDF

Open Access

TL;DR

This paper presents a multimodal deepfake hate speech detection framework that leverages contrastive learning to align audio and text representations, excelling in low-resource and zero-shot multilingual scenarios.

Contribution

It introduces the first benchmark dataset for multilingual deepfake hate speech detection and proposes a shared semantic embedding model that improves cross-lingual and cross-modal classification.

Findings

01

Outperforms baseline models with accuracies of 0.819 and 0.701 on two test sets.

02

Generalizes effectively to unseen languages.

03

Demonstrates the effectiveness of multimodal contrastive learning in low-resource settings.

Abstract

This paper introduces a novel multimodal framework for hate speech detection in deepfake audio, excelling even in zero-shot scenarios. Unlike previous approaches, our method uses contrastive learning to jointly align audio and text representations across languages. We present the first benchmark dataset with 127,290 paired text and synthesized speech samples in six languages: English and five low-resource Indian languages (Hindi, Bengali, Marathi, Tamil, Telugu). Our model learns a shared semantic embedding space, enabling robust cross-lingual and cross-modal classification. Experiments on two multilingual test sets show our approach outperforms baselines, achieving accuracies of 0.819 and 0.701, and generalizes well to unseen languages. This demonstrates the advantage of combining modalities for hate speech detection in synthetic media, especially in low-resource settings where…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection · Emotion and Mood Recognition · Adversarial Robustness in Machine Learning