Distilling Spectrograms into Tokens: Fast and Lightweight Bioacoustic Classification for BirdCLEF+ 2025

Anthony Miyaguchi; Murilo Gustineli; and Adrian Cheung

arXiv:2507.08236·cs.SD·July 14, 2025

Distilling Spectrograms into Tokens: Fast and Lightweight Bioacoustic Classification for BirdCLEF+ 2025

Anthony Miyaguchi, Murilo Gustineli, and Adrian Cheung

PDF

TL;DR

This paper presents a fast, lightweight bioacoustic classification pipeline for BirdCLEF+ 2025, combining optimized pre-trained models and a novel spectrogram tokenization method to meet strict inference time constraints.

Contribution

The paper introduces Spectrogram Token Skip-Gram (STSG), a new sequence modeling approach using spectrogram tokens and static embeddings for efficient bioacoustic classification.

Findings

01

TFLite optimization achieved 10x inference speedup on the Perch model.

02

The STSG method provided a viable fast classification with ROC-AUC scores above 0.5.

03

Optimized pre-trained models achieved competitive scores within 90-minute CPU inference limit.

Abstract

The BirdCLEF+ 2025 challenge requires classifying 206 species, including birds, mammals, insects, and amphibians, from soundscape recordings under a strict 90-minute CPU-only inference deadline, making many state-of-the-art deep learning approaches impractical. To address this constraint, the DS@GT BirdCLEF team explored two strategies. First, we establish competitive baselines by optimizing pre-trained models from the Bioacoustics Model Zoo for CPU inference. Using TFLite, we achieved a nearly 10x inference speedup for the Perch model, enabling it to run in approximately 16 minutes and achieve a final ROC-AUC score of 0.729 on the public leaderboard post-competition and 0.711 on the private leaderboard. The best model from the zoo was BirdSetEfficientNetB1, with a public score of 0.810 and a private score of 0.778. Second, we introduce a novel, lightweight pipeline named Spectrogram…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.