Dynamic Multi-scale Convolution for Dialect Identification
Tianlong Kong, Shouyi Yin, Dawei Zhang, Wang Geng, Xin Wang, Dandan, Song, Jinwen Huang, Huiyu Shi, Xiaorui Wang

TL;DR
This paper introduces a dynamic multi-scale convolution architecture for dialect identification that adaptively captures features at various scales, significantly improving accuracy while reducing model size.
Contribution
It proposes a novel dynamic multi-scale convolution method combining dynamic kernels, local multi-scale learning, and global pooling, outperforming existing systems in dialect recognition.
Findings
Achieves 9% lower Cavg than previous best
Reduces model parameters by 91%
Improves EER by 45% over state-of-the-art
Abstract
Time Delay Neural Networks (TDNN)-based methods are widely used in dialect identification. However, in previous work with TDNN application, subtle variant is being neglected in different feature scales. To address this issue, we propose a new architecture, named dynamic multi-scale convolution, which consists of dynamic kernel convolution, local multi-scale learning, and global multi-scale pooling. Dynamic kernel convolution captures features between short-term and long-term context adaptively. Local multi-scale learning, which represents multi-scale features at a granular level, is able to increase the range of receptive fields for convolution operation. Besides, global multi-scale pooling is applied to aggregate features from different bottleneck layers in order to collect information from multiple aspects. The proposed architecture significantly outperforms state-of-the-art system on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Music and Audio Processing
MethodsConvolution
