TF-SepNet: An Efficient 1D Kernel Design in CNNs for Low-Complexity Acoustic Scene Classification
Yiqiang Cai, Peihong Zhang, Shengchen Li

TL;DR
TF-SepNet introduces a novel CNN architecture for acoustic scene classification that uses separate 1D kernels along time and frequency, achieving higher efficiency and better performance than traditional 2D kernel-based models.
Contribution
The paper proposes TF-SepNet, a new CNN design with separate 1D kernels for time and frequency, enhancing efficiency and effectiveness in ASC tasks.
Findings
Outperforms state-of-the-art models with consecutive kernels
Uses less computational resources due to 1D kernel design
Achieves larger effective receptive field for better feature capture
Abstract
Recent studies focus on developing efficient systems for acoustic scene classification (ASC) using convolutional neural networks (CNNs), which typically consist of consecutive kernels. This paper highlights the benefits of using separate kernels as a more powerful and efficient design approach in ASC tasks. Inspired by the time-frequency nature of audio signals, we propose TF-SepNet, a CNN architecture that separates the feature processing along the time and frequency dimensions. Features resulted from the separate paths are then merged by channels and directly forwarded to the classifier. Instead of the conventional two dimensional (2D) kernel, TF-SepNet incorporates one dimensional (1D) kernels to reduce the computational costs. Experiments have been conducted using the TAU Urban Acoustic Scene 2022 Mobile development dataset. The results show that TF-SepNet outperforms similar…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech and Audio Processing · Animal Vocal Communication and Behavior
