CoarseSoundNet: Building a reliable model for ecological soundscape analysis

Alexander Gebhard; Andreas Triantafyllopoulos; Dominik Arend; Sandra M\"uller; Svenja Schmidt; Michael Scherer-Lorenzen; Bj\"orn W. Schuller

arXiv:2605.21143·cs.SD·May 22, 2026

CoarseSoundNet: Building a reliable model for ecological soundscape analysis

Alexander Gebhard, Andreas Triantafyllopoulos, Dominik Arend, Sandra M\"uller, Svenja Schmidt, Michael Scherer-Lorenzen, Bj\"orn W. Schuller

PDF

TL;DR

This paper introduces CoarseSoundNet, a deep learning model designed to reliably classify ecological soundscape components—biophony, geophony, and anthropophony—in noisy passive acoustic monitoring data, aiding ecological analysis.

Contribution

The study presents a reproducible ML framework and the CoarseSoundNet model for coarse soundscape classification, with systematic analysis of architecture, data, and evaluation strategies under realistic conditions.

Findings

01

Model performance improves with more PAM data, especially similar to target domain.

02

Introducing an explicit silence class during training enhances classification accuracy.

03

Pre-filtering with CoarseSoundNet yields ecological indices comparable to ground-truth filtering.

Abstract

A soundscape is composed of three types of sound: biophony (sounds made by animals), geophony (natural abiotic sounds) and anthropophony (sounds made by humans). A key research question in the field of soundscape ecology is how these components interact with each other, specifically how biophony responds to geophony and anthropophony. Nevertheless, as of today, there are not many analytical instruments that enable the distinct quantification of these elements. Recent machine learning (ML) approaches aim to support automated analysis but often rely on task-specific or clean data, limiting generalisation to noisy passive acoustic monitoring (PAM) recordings. This study presents a clear and reproducible structure to build ML models for coarse soundscape classification and introduces CoarseSoundNet, a deep learning model trained to distinguish biophony, geophony, and anthropophony under…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.