Detection of manatee vocalisations using the Audio Spectrogram Transformer
Stefano Schiappacasse, Taco de Wolff, Yann Henaut, Regina, Cervera, Aviva Charles, Felipe Tobar

TL;DR
This paper introduces a novel deep learning model based on the Audio Spectrogram Transformer for detecting endangered manatee vocalisations in underwater audio, improving detection accuracy and reducing manual labelling effort.
Contribution
It presents the first application of AST for manatee call detection, demonstrating competitive performance without manual denoising and aiding conservation efforts.
Findings
Model performs on par with state-of-the-art methods.
It can identify missed vocalisations, reducing labelling workload.
The approach is effective with real-world, partially labeled data.
Abstract
The Antillean manatee (\emph{Trichechus manatus}) is an endangered herbivorous aquatic mammal whose role as an ecological balancer and umbrella species underscores the importance of its conservation. An innovative approach to monitor manatee populations is passive acoustic monitoring (PAM), where vocalisations are extracted from submarine audio. We propose a novel end-to-end approach to detect manatee vocalisations building on the Audio Spectrogram Transformer (AST). In a transfer learning spirit, we fine-tune AST to detect manatee calls by redesigning its filterbanks and adapting a real-world dataset containing partial positive labels. Our experimental evaluation reveals the two key features of the proposed model: i) it performs on par with the state of the art without requiring hand-tuned denoising or detection stages, and ii) it can successfully identify missed vocalisations in the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis
MethodsAttention Is All You Need · Adam · Label Smoothing · Linear Layer · Byte Pair Encoding · Layer Normalization · Softmax · Position-Wise Feed-Forward Layer · Absolute Position Encodings · Dense Connections
