Integrating the Data Augmentation Scheme with Various Classifiers for Acoustic Scene Modeling
Hangting Chen, Zuozhen Liu, Zongming Liu, Pengyuan Zhang, Yonghong Yan

TL;DR
This paper presents an acoustic scene classification system that integrates data augmentation using generative adversarial networks with multiple classifiers, achieving over 85% accuracy on the DCASE2019 challenge dataset.
Contribution
It introduces a novel combination of data augmentation with GANs and various classifiers, including deep CNNs with scalogram and Mel features, for improved acoustic scene modeling.
Findings
Achieved over 85% accuracy on evaluation dataset.
Demonstrated effectiveness of GAN-based data augmentation.
Enhanced classification performance through classifier fusion.
Abstract
This technical report describes the IOA team's submission for TASK1A of DCASE2019 challenge. Our acoustic scene classification (ASC) system adopts a data augmentation scheme employing generative adversary networks. Two major classifiers, 1D deep convolutional neural network integrated with scalogram features and 2D fully convolutional neural network integrated with Mel filter bank features, are deployed in the scheme. Other approaches, such as adversary city adaptation, temporal module based on discrete cosine transform and hybrid architectures, have been developed for further fusion. The results of our experiments indicates that the final fusion systems A-D could achieve an accuracy higher than 85% on the officially provided fold 1 evaluation dataset.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech and Audio Processing · Speech Recognition and Synthesis
MethodsDiscrete Cosine Transform
