Domain Generalization with Relaxed Instance Frequency-wise Normalization   for Multi-device Acoustic Scene Classification

Byeonggeun Kim; Seunghan Yang; Jangho Kim; Hyunsin Park; Juntae Lee,; Simyung Chang

arXiv:2206.12513·cs.SD·June 28, 2022

Domain Generalization with Relaxed Instance Frequency-wise Normalization for Multi-device Acoustic Scene Classification

Byeonggeun Kim, Seunghan Yang, Jangho Kim, Hyunsin Park, Juntae Lee,, Simyung Chang

PDF

Open Access

TL;DR

This paper introduces Relaxed Instance Frequency-wise Normalization (RFN), a novel normalization technique that improves domain generalization in multi-device acoustic scene classification by focusing on frequency statistics, leading to better robustness and winning a challenge.

Contribution

The paper proposes RFN, a new normalization module along the frequency axis, specifically designed for audio features, enhancing domain invariance and robustness in acoustic scene classification.

Findings

01

RFN outperforms previous domain generalization methods.

02

RFN improves robustness across multiple audio devices.

03

RFN achieved victory in the DCASE2021 challenge.

Abstract

While using two-dimensional convolutional neural networks (2D-CNNs) in image processing, it is possible to manipulate domain information using channel statistics, and instance normalization has been a promising way to get domain-invariant features. However, unlike image processing, we analyze that domain-relevant information in an audio feature is dominant in frequency statistics rather than channel statistics. Motivated by our analysis, we introduce Relaxed Instance Frequency-wise Normalization (RFN): a plug-and-play, explicit normalization module along the frequency axis which can eliminate instance-specific domain discrepancy in an audio feature while relaxing undesirable loss of useful discriminative information. Empirically, simply adding RFN to networks shows clear margins compared to previous domain generalization approaches on acoustic scene classification and yields improved…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech and Audio Processing · Speech Recognition and Synthesis

MethodsInstance Normalization