Detection of manatee vocalisations using the Audio Spectrogram   Transformer

Stefano Schiappacasse; Taco de Wolff; Yann Henaut; Regina; Cervera; Aviva Charles; Felipe Tobar

arXiv:2407.18083·eess.AS·July 26, 2024·MLSP

Detection of manatee vocalisations using the Audio Spectrogram Transformer

Stefano Schiappacasse, Taco de Wolff, Yann Henaut, Regina, Cervera, Aviva Charles, Felipe Tobar

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel deep learning model based on the Audio Spectrogram Transformer for detecting endangered manatee vocalisations in underwater audio, improving detection accuracy and reducing manual labelling effort.

Contribution

It presents the first application of AST for manatee call detection, demonstrating competitive performance without manual denoising and aiding conservation efforts.

Findings

01

Model performs on par with state-of-the-art methods.

02

It can identify missed vocalisations, reducing labelling workload.

03

The approach is effective with real-world, partially labeled data.

Abstract

The Antillean manatee (\emph{Trichechus manatus}) is an endangered herbivorous aquatic mammal whose role as an ecological balancer and umbrella species underscores the importance of its conservation. An innovative approach to monitor manatee populations is passive acoustic monitoring (PAM), where vocalisations are extracted from submarine audio. We propose a novel end-to-end approach to detect manatee vocalisations building on the Audio Spectrogram Transformer (AST). In a transfer learning spirit, we fine-tune AST to detect manatee calls by redesigning its filterbanks and adapting a real-world dataset containing partial positive labels. Our experimental evaluation reveals the two key features of the proposed model: i) it performs on par with the state of the art without requiring hand-tuned denoising or detection stages, and ii) it can successfully identify missed vocalisations in the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

tdewolff/manatees
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis

MethodsAttention Is All You Need · Adam · Label Smoothing · Linear Layer · Byte Pair Encoding · Layer Normalization · Softmax · Position-Wise Feed-Forward Layer · Absolute Position Encodings · Dense Connections