ACGAN-based Data Augmentation Integrated with Long-term Scalogram for Acoustic Scene Classification
Hangting Chen, Zuozhen Liu, Zongming Liu, Pengyuan Zhang

TL;DR
This paper introduces a novel long-term wavelet feature and an ACGAN-based data augmentation method to enhance acoustic scene classification, achieving state-of-the-art results on DCASE datasets.
Contribution
It proposes a new long-term wavelet feature for ASC and a data augmentation scheme using ACGANs to improve model generalization and accuracy.
Findings
Improved classification accuracy on DCASE datasets.
Achieved first place in DCASE19 competition.
Surpassed top accuracies on DCASE17 dataset.
Abstract
In acoustic scene classification (ASC), acoustic features play a crucial role in the extraction of scene information, which can be stored over different time scales. Moreover, the limited size of the dataset may lead to a biased model with a poor performance for records from unseen cities and confusing scene classes. In order to overcome this, we propose a long-term wavelet feature that requires a lower storage capacity and can be classified faster and more accurately compared with classic Mel filter bank coefficients (FBank). This feature can be extracted with predefined wavelet scales similar to the FBank. Furthermore, a novel data augmentation scheme based on generative adversarial neural networks with auxiliary classifiers (ACGANs) is adopted to improve the generalization of the ASC systems. The scheme, which contains ACGANs and a sample filter, extends the database iteratively by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech and Audio Processing · Speech Recognition and Synthesis
