QTI Submission to DCASE 2021: residual normalization for   device-imbalanced acoustic scene classification with efficient design

Byeonggeun Kim; Seunghan Yang; Jangho Kim; Simyung Chang

arXiv:2206.13909·cs.SD·October 26, 2022·27 cites

QTI Submission to DCASE 2021: residual normalization for device-imbalanced acoustic scene classification with efficient design

Byeonggeun Kim, Seunghan Yang, Jangho Kim, Simyung Chang

PDF

Open Access

TL;DR

This paper presents a device-imbalanced acoustic scene classification system using residual normalization, efficient architecture, data augmentation, and model compression, achieving high accuracy with low complexity.

Contribution

It introduces Residual Normalization and an efficient BC-ResNet-Mod architecture, combined with data augmentation and compression techniques, for improved acoustic scene classification under device constraints.

Findings

01

Achieved 76.3% accuracy on TAU Urban Acoustic Scenes dataset.

02

Reduced model size to 61.0KB with maintained accuracy.

03

Demonstrated effectiveness of residual normalization and data augmentation.

Abstract

This technical report describes the details of our TASK1A submission of the DCASE2021 challenge. The goal of the task is to design an audio scene classification system for device-imbalanced datasets under the constraints of model complexity. This report introduces four methods to achieve the goal. First, we propose Residual Normalization, a novel feature normalization method that uses instance normalization with a shortcut path to discard unnecessary device-specific information without losing useful information for classification. Second, we design an efficient architecture, BC-ResNet-Mod, a modified version of the baseline architecture with a limited receptive field. Third, we exploit spectrogram-to-spectrogram translation from one to multiple devices to augment training data. Finally, we utilize three model compression schemes: pruning, quantization, and knowledge distillation to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech and Audio Processing · Speech Recognition and Synthesis

MethodsTest · Knowledge Distillation · Instance Normalization