Few-Shot Contrastive Adaptation for Audio Abuse Detection in Low-Resource Indic Languages
Aditya Narayan Sankaran, Reza Farahbakhsh, Noel Crespi

TL;DR
This paper explores the use of contrastive language-audio pre-training (CLAP) for detecting abusive speech directly from audio in low-resource, multilingual settings, showing promising cross-lingual representations and adaptation strategies.
Contribution
It demonstrates that CLAP-based audio representations can support effective cross-lingual abuse detection with minimal supervised data, highlighting the potential of contrastive audio-text models in low-resource languages.
Findings
CLAP yields strong cross-lingual audio representations across ten Indic languages.
Lightweight projection-only adaptation performs competitively with fully supervised systems.
Few-shot adaptation benefits are language-dependent and not strictly increasing with more data.
Abstract
Abusive speech detection is becoming increasingly important as social media shifts towards voice-based interaction, particularly in multilingual and low-resource settings. Most current systems rely on automatic speech recognition (ASR) followed by text-based hate speech classification, but this pipeline is vulnerable to transcription errors and discards prosodic information carried in speech. We investigate whether Contrastive Language-Audio Pre-training (CLAP) can support abusive speech detection directly from audio. Using the ADIMA dataset, we evaluate CLAP-based representations under few-shot supervised contrastive adaptation in cross-lingual and leave-one-language-out settings, with zero-shot prompting included as an auxiliary analysis. Our results show that CLAP yields strong cross-lingual audio representations across ten Indic languages, and that lightweight projection-only…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
