QTI Submission to DCASE 2021: residual normalization for device-imbalanced acoustic scene classification with efficient design
Byeonggeun Kim, Seunghan Yang, Jangho Kim, Simyung Chang

TL;DR
This paper presents a device-imbalanced acoustic scene classification system using residual normalization, efficient architecture, data augmentation, and model compression, achieving high accuracy with low complexity.
Contribution
It introduces Residual Normalization and an efficient BC-ResNet-Mod architecture, combined with data augmentation and compression techniques, for improved acoustic scene classification under device constraints.
Findings
Achieved 76.3% accuracy on TAU Urban Acoustic Scenes dataset.
Reduced model size to 61.0KB with maintained accuracy.
Demonstrated effectiveness of residual normalization and data augmentation.
Abstract
This technical report describes the details of our TASK1A submission of the DCASE2021 challenge. The goal of the task is to design an audio scene classification system for device-imbalanced datasets under the constraints of model complexity. This report introduces four methods to achieve the goal. First, we propose Residual Normalization, a novel feature normalization method that uses instance normalization with a shortcut path to discard unnecessary device-specific information without losing useful information for classification. Second, we design an efficient architecture, BC-ResNet-Mod, a modified version of the baseline architecture with a limited receptive field. Third, we exploit spectrogram-to-spectrogram translation from one to multiple devices to augment training data. Finally, we utilize three model compression schemes: pruning, quantization, and knowledge distillation to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech and Audio Processing · Speech Recognition and Synthesis
MethodsTest · Knowledge Distillation · Instance Normalization
