The Reasonable Effectiveness of Speaker Embeddings for Violence Detection
Sarthak Jain, Orchid Chetia Phukan, Arun Balaji Buduru, Rajesh Sharma

TL;DR
This paper demonstrates that smaller speaker recognition embeddings can effectively detect audio violence, outperforming larger self-supervised models and enabling more practical deployment in resource-constrained environments.
Contribution
The study introduces the use of speaker recognition embeddings for violence detection, showing they outperform state-of-the-art SSL models in accuracy and efficiency.
Findings
Speaker recognition embeddings outperform SSL models in violence detection accuracy.
SVM and Random Forest classifiers effectively utilize speaker embeddings.
Proposed approach enables practical deployment in resource-limited settings.
Abstract
In this paper, we focus on audio violence detection (AVD). AVD is necessary for several reasons, especially in the context of maintaining safety, preventing harm, and ensuring security in various environments. This calls for accurate AVD systems. Like many related applications in audio processing, the most common approach for improving the performance, would be by leveraging self-supervised (SSL) pre-trained models (PTMs). However, as these SSL models are very large models with million of parameters and this can hinder real-world deployment especially in compute-constraint environment. To resolve this, we propose the usage of speaker recognition models which are much smaller compared to the SSL models. Experimentation with speaker recognition model embeddings with SVM & Random Forest as classifiers, we show that speaker recognition model embeddings perform the best in comparison to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection · Dispute Resolution and Class Actions · Law, Rights, and Freedoms
MethodsSupport Vector Machine · Focus
