Towards Cross-Lingual Audio Abuse Detection in Low-Resource Settings   with Few-Shot Learning

Aditya Narayan Sankaran; Reza Farahbakhsh; Noel Crespi

arXiv:2412.01408·cs.CL·December 16, 2024

Towards Cross-Lingual Audio Abuse Detection in Low-Resource Settings with Few-Shot Learning

Aditya Narayan Sankaran, Reza Farahbakhsh, Noel Crespi

PDF

Open Access 1 Repo

TL;DR

This paper explores cross-lingual audio abuse detection in low-resource languages using pre-trained models and few-shot learning, demonstrating promising generalization and classification capabilities across multiple Indian languages.

Contribution

It introduces a novel approach combining pre-trained audio representations with meta-learning for abusive language detection in low-resource multilingual audio settings.

Findings

01

Pre-trained models like Wav2Vec and Whisper effectively generalize in low-resource abuse detection.

02

Few-shot learning with 50-200 samples achieves competitive classification performance.

03

Feature visualization provides insights into model behaviour and decision boundaries.

Abstract

Online abusive content detection, particularly in low-resource settings and within the audio modality, remains underexplored. We investigate the potential of pre-trained audio representations for detecting abusive language in low-resource languages, in this case, in Indian languages using Few Shot Learning (FSL). Leveraging powerful representations from models such as Wav2Vec and Whisper, we explore cross-lingual abuse detection using the ADIMA dataset with FSL. Our approach integrates these representations within the Model-Agnostic Meta-Learning (MAML) framework to classify abusive language in 10 languages. We experiment with various shot sizes (50-200) evaluating the impact of limited data on performance. Additionally, a feature visualization study was conducted to better understand model behaviour. This study highlights the generalization ability of pre-trained models in low-resource…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

callmesanfornow/fsl-audio-abuse
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection