TC-SKNet with GridMask for Low-complexity Classification of Acoustic   scene

Luyuan Xie; Yan Zhong; Lin Yang; Zhaoyu Yan; Zhonghai Wu; Junjie Wang

arXiv:2210.02287·cs.SD·October 6, 2022

TC-SKNet with GridMask for Low-complexity Classification of Acoustic scene

Luyuan Xie, Yan Zhong, Lin Yang, Zhaoyu Yan, Zhonghai Wu, Junjie Wang

PDF

Open Access

TL;DR

This paper introduces TC-SKNet with GridMask, a low-complexity CNN model for acoustic scene classification that adapts to variable speech lengths and uses AutoML for optimal structure, achieving competitive accuracy with fewer parameters.

Contribution

It proposes a novel combination of Selective Kernel Network with Temporal-Convolution and GridMask data augmentation, optimized via AutoML, for efficient acoustic scene classification.

Findings

01

Achieved 59.87% accuracy with only 20.9K parameters.

02

GridMask outperforms spectrum augmentation in performance gains.

03

AutoML effectively optimizes model structure and hyperparameters.

Abstract

Convolution neural networks (CNNs) have good performance in low-complexity classification tasks such as acoustic scene classifications (ASCs). However, there are few studies on the relationship between the length of target speech and the size of the convolution kernels. In this paper, we combine Selective Kernel Network with Temporal-Convolution (TC-SKNet) to adjust the receptive field of convolution kernels to solve the problem of variable length of target voice while keeping low-complexity. GridMask is a data augmentation strategy by masking part of the raw data or feature area. It can enhance the generalization of the model as the role of dropout. In our experiments, the performance gain brought by GridMask is stronger than spectrum augmentation in ASCs. Finally, we adopt AutoML to search best structure of TC-SKNet and hyperparameters of GridMask for improving the classification…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing

Methods*Communicated@Fast*How Do I Communicate to Expedia? · Softmax · Dilated Convolution · guidence~How to file a complaint against Expedia? · Selective Kernel Convolution · Convolution · Batch Normalization · 1x1 Convolution · Selective Kernel · GridMask