# Automated Classification of Humpback Whale Calls Using Deep Learning: A Comparative Study of Neural Architectures and Acoustic Feature Representations

**Authors:** Jack C. Johnson, Yue Rong

PMC · DOI: 10.3390/s26020715 · Sensors (Basel, Switzerland) · 2026-01-21

## TL;DR

This study compares different deep learning models and audio features for automatically detecting humpback whale calls, finding that mel spectrograms with MobileNetV2 perform best.

## Contribution

A novel data-processing pipeline and comparison of neural architectures and acoustic features for humpback whale call classification.

## Key findings

- Mel spectrogram inputs outperformed MFCC features across all model types.
- MobileNetV2 with mel spectrograms achieved 99.01% test accuracy and high precision/recall.
- Custom CNN with mel spectrograms also showed strong performance with low false negatives.

## Abstract

Passive acoustic monitoring (PAM) using hydrophones enables collecting acoustic data to be collected in large and diverse quantities, necessitating the need for a reliable automated classification system. This paper presents a data-processing pipeline and a set of neural networks designed for a humpback-whale-detection system. A collection of audio segments is compiled using publicly available audio repositories and extensively curated via manual methods, undertaking thorough examination, editing and clipping to produce a dataset minimizing bias or categorization errors. An array of standard data-augmentation techniques are applied to the collected audio, diversifying and expanding the original dataset. Multiple neural networks are designed and trained using TensorFlow 2.20.0 and Keras 3.13.1 frameworks, resulting in a custom curated architecture layout based on research and iterative improvements. The pre-trained model MobileNetV2 is also included for further analysis. Model performance demonstrates a strong dependence on both feature representation and network architecture. Mel spectrogram inputs consistently outperformed MFCC (Mel-Frequency Cepstral Coefficients) features across all model types. The highest performance was achieved by the pretrained MobileNetV2 using mel spectrograms without augmentation, reaching a test accuracy of 99.01% with balanced precision and recall of 99% and a Matthews correlation coefficient of 0.98. The custom CNN with mel spectrograms also achieved strong performance, with 98.92% accuracy and a false negative rate of only 0.75%. In contrast, models trained with MFCC representations exhibited consistently lower robustness and higher false negative rates. These results highlight the comparative strengths of the evaluated feature representations and network architectures for humpback whale detection.

## Full-text entities

- **Species:** Megaptera novaeangliae (humpback whale, species) [taxon 9773]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12845957/full.md

## Figures

25 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12845957/full.md

## References

42 references — full list in the complete paper: https://tomesphere.com/paper/PMC12845957/full.md

---
Source: https://tomesphere.com/paper/PMC12845957