ASCMamba: Multimodal Time-Frequency Mamba for Acoustic Scene Classification

Bochao Sun; Dong Wang; ZhanLong Yang; Jun Yang; Han Yin

arXiv:2508.15632·cs.SD·August 26, 2025

ASCMamba: Multimodal Time-Frequency Mamba for Acoustic Scene Classification

Bochao Sun, Dong Wang, ZhanLong Yang, Jun Yang, Han Yin

PDF

TL;DR

This paper introduces ASCMamba, a multimodal neural network that combines audio and textual data for improved acoustic scene classification, achieving state-of-the-art results in a challenging competition setting.

Contribution

We propose a novel multimodal network architecture, ASCMamba, integrating hierarchical spectral features and long-range dependencies for enhanced acoustic scene understanding.

Findings

01

Outperforms all participating teams in the challenge

02

Achieves 6.2% improvement over baseline

03

Demonstrates effectiveness of multimodal integration

Abstract

Acoustic Scene Classification (ASC) is a fundamental problem in computational audition, which seeks to classify environments based on the distinctive acoustic features. In the ASC task of the APSIPA ASC 2025 Grand Challenge, the organizers introduce a multimodal ASC task. Unlike traditional ASC systems that rely solely on audio inputs, this challenge provides additional textual information as inputs, including the location where the audio is recorded and the time of recording. In this paper, we present our proposed system for the ASC task in the APSIPA ASC 2025 Grand Challenge. Specifically, we propose a multimodal network, ASCMamba, which integrates audio and textual information for fine-grained acoustic scene understanding and effective multimodal ASC. The proposed ASCMamba employs a DenseEncoder to extract hierarchical spectral features from spectrograms, followed by a dual-path…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.