Multi-Representation Attention Framework for Underwater Bioacoustic Denoising and Recognition
Amine Razig, Youssef Soulaymani, Loubna Benabbou, Pierre Cauchy

TL;DR
This paper presents a multi-representation attention framework for underwater bioacoustic denoising and recognition, improving marine mammal call detection accuracy and robustness across diverse environmental conditions and signal-to-noise ratios.
Contribution
The study introduces a novel multi-step, attention-guided framework with segmentation-driven attention and mid-level fusion, enhancing signal discrimination and generalization in marine mammal monitoring.
Findings
Segmentation-driven attention improves detection accuracy.
Mid-level fusion enhances robustness to environmental noise.
The framework maintains performance under distributional shifts.
Abstract
Automated monitoring of marine mammals in the St. Lawrence Estuary faces extreme challenges: calls span low-frequency moans to ultrasonic clicks, often overlap, and are embedded in variable anthropogenic and environmental noise. We introduce a multi-step, attention-guided framework that first segments spectrograms to generate soft masks of biologically relevant energy and then fuses these masks with the raw inputs for multi-band, denoised classification. Image and mask embeddings are integrated via mid-level fusion, enabling the model to focus on salient spectrogram regions while preserving global context. Using real-world recordings from the Saguenay St. Lawrence Marine Park Research Station in Canada, we demonstrate that segmentation-driven attention and mid-level fusion improve signal discrimination, reduce false positive detections, and produce reliable representations for…
Peer Reviews
Decision·Submitted to ICLR 2026
The problem statement is interesting and attractive setting for studying practical applications of machine learning. The study can be of interest for the community working in bioacustics.
The technical contribution is very small. Some of the formulation is non-standard. For example: equations 2 to 3 seem to be describing a convolution between the binary mask an a gaussian kernel (normalized so that the sum is equal to 1), but it is done in a complicated way. Equation 5 is not needed as this is a very standard error metric.
- Presented framework is novel and presented results show significant improvements over baselines. - The paper is generally well-written and easy to follow.
- The experimental design is hard to follow, it is unclear for me: - What data is exactly used for which parts of the training and how it is split? - Which architecture / pretraining checkpoint is used in the "Multimodal" model? - The presented framework is evaluated on a new dataset only making it hard to assess the significance of the results. - The presented baselines are from the image domain only, no audio or bioacoustic baselines are included. - Formatting glitch in l.250, l.513
- Practical, domain-aware design: The idea of using segmentation-derived soft masks is a good idea for bioacoustics. It provides a way to guide the model's attention towards biologically relevant signal components while preserving global context. - Strong in-distribution performance: The paper demonstrates clear performance gains on in-distribution data, with the cross-attention fusion strategy showing the most substantial improvements. - Ablations: The authors provide ablations that compare dif
**Presentation and related work** - **Insufficient related work:** The literature review is very limited (only a handful of cited papers), failing to properly position the work within the current landscape of bioacoustic machine learning. Especially other marine models (e.g., Surfperch) and other bioacoustic tasks like avian bioacoustics provides a great body of literature that would help positioning this paper. - **Poor citation practices and formatting:** The paper contains numerous errors, i
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMarine animal studies overview · Underwater Acoustics Research · Ichthyology and Marine Biology
