Domain Generalization with Relaxed Instance Frequency-wise Normalization for Multi-device Acoustic Scene Classification
Byeonggeun Kim, Seunghan Yang, Jangho Kim, Hyunsin Park, Juntae Lee,, Simyung Chang

TL;DR
This paper introduces Relaxed Instance Frequency-wise Normalization (RFN), a novel normalization technique that improves domain generalization in multi-device acoustic scene classification by focusing on frequency statistics, leading to better robustness and winning a challenge.
Contribution
The paper proposes RFN, a new normalization module along the frequency axis, specifically designed for audio features, enhancing domain invariance and robustness in acoustic scene classification.
Findings
RFN outperforms previous domain generalization methods.
RFN improves robustness across multiple audio devices.
RFN achieved victory in the DCASE2021 challenge.
Abstract
While using two-dimensional convolutional neural networks (2D-CNNs) in image processing, it is possible to manipulate domain information using channel statistics, and instance normalization has been a promising way to get domain-invariant features. However, unlike image processing, we analyze that domain-relevant information in an audio feature is dominant in frequency statistics rather than channel statistics. Motivated by our analysis, we introduce Relaxed Instance Frequency-wise Normalization (RFN): a plug-and-play, explicit normalization module along the frequency axis which can eliminate instance-specific domain discrepancy in an audio feature while relaxing undesirable loss of useful discriminative information. Empirically, simply adding RFN to networks shows clear margins compared to previous domain generalization approaches on acoustic scene classification and yields improved…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech and Audio Processing · Speech Recognition and Synthesis
MethodsInstance Normalization
