Sample Mixed-Based Data Augmentation for Domestic Audio Tagging
Shengyun Wei, Kele Xu, Dezhi Wang, Feifan Liao, Huaimin Wang, Qiuqiang, Kong

TL;DR
This paper introduces a novel sample mixed data augmentation approach for domestic audio tagging, significantly improving model performance and achieving state-of-the-art results on the DCASE 2016 dataset.
Contribution
It explores and applies mixup, SamplePairing, and extrapolation data augmentation methods to enhance deep learning models for audio tagging.
Findings
Mixup approach achieves EER of 0.10 on DCASE 2016 dataset.
Data augmentation outperforms baseline without augmentation.
Sample mixed augmentation improves generalization in audio tagging.
Abstract
Audio tagging has attracted increasing attention since last decade and has various potential applications in many fields. The objective of audio tagging is to predict the labels of an audio clip. Recently deep learning methods have been applied to audio tagging and have achieved state-of-the-art performance, which provides a poor generalization ability on new data. However due to the limited size of audio tagging data such as DCASE data, the trained models tend to result in overfitting of the network. Previous data augmentation methods such as pitch shifting, time stretching and adding background noise do not show much improvement in audio tagging. In this paper, we explore the sample mixed data augmentation for the domestic audio tagging task, including mixup, SamplePairing and extrapolation. We apply a convolutional recurrent neural network (CRNN) with attention module with log-scaled…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Video Analysis and Summarization · Speech Recognition and Synthesis
