Few-Shot Contrastive Adaptation for Audio Abuse Detection in Low-Resource Indic Languages

Aditya Narayan Sankaran; Reza Farahbakhsh; Noel Crespi

arXiv:2604.09094·cs.SD·April 13, 2026

Few-Shot Contrastive Adaptation for Audio Abuse Detection in Low-Resource Indic Languages

Aditya Narayan Sankaran, Reza Farahbakhsh, Noel Crespi

PDF

TL;DR

This paper explores the use of contrastive language-audio pre-training (CLAP) for detecting abusive speech directly from audio in low-resource, multilingual settings, showing promising cross-lingual representations and adaptation strategies.

Contribution

It demonstrates that CLAP-based audio representations can support effective cross-lingual abuse detection with minimal supervised data, highlighting the potential of contrastive audio-text models in low-resource languages.

Findings

01

CLAP yields strong cross-lingual audio representations across ten Indic languages.

02

Lightweight projection-only adaptation performs competitively with fully supervised systems.

03

Few-shot adaptation benefits are language-dependent and not strictly increasing with more data.

Abstract

Abusive speech detection is becoming increasingly important as social media shifts towards voice-based interaction, particularly in multilingual and low-resource settings. Most current systems rely on automatic speech recognition (ASR) followed by text-based hate speech classification, but this pipeline is vulnerable to transcription errors and discards prosodic information carried in speech. We investigate whether Contrastive Language-Audio Pre-training (CLAP) can support abusive speech detection directly from audio. Using the ADIMA dataset, we evaluate CLAP-based representations under few-shot supervised contrastive adaptation in cross-lingual and leave-one-language-out settings, with zero-shot prompting included as an auxiliary analysis. Our results show that CLAP yields strong cross-lingual audio representations across ten Indic languages, and that lightweight projection-only…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.