Urban Sound Tagging using Convolutional Neural Networks

Sainath Adapa

arXiv:1909.12699·cs.SD·September 30, 2019

Urban Sound Tagging using Convolutional Neural Networks

Sainath Adapa

PDF

1 Repo

TL;DR

This paper presents a CNN-based framework for urban sound tagging that leverages pre-trained models and data augmentation, achieving top performance in a low-data setting for environmental sound classification.

Contribution

It introduces a modified MobileNetV2 model with data augmentation techniques for urban sound tagging, demonstrating superior results in a low-data environment.

Findings

01

Achieved first place on DCASE 2019 leaderboard

02

Micro-AUPRC of 0.751 for fine tags

03

Micro-AUPRC of 0.860 for coarse tags

Abstract

In this paper, we propose a framework for environmental sound classification in a low-data context (less than 100 labeled examples per class). We show that using pre-trained image classification models along with the usage of data augmentation techniques results in higher performance over alternative approaches. We applied this system to the task of Urban Sound Tagging, part of the DCASE 2019. The objective was to label different sources of noise from raw audio data. A modified form of MobileNetV2, a convolutional neural network (CNN) model was trained to classify both coarse and fine tags jointly. The proposed model uses log-scaled Mel-spectrogram as the representation format for the audio data. Mixup, Random erasing, scaling, and shifting are used as data augmentation techniques. A second model that uses scaled labels was built to account for human errors in the annotations. The…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sainathadapa/urban-sound-tagging
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsDepthwise Convolution · Pointwise Convolution · Depthwise Separable Convolution · 1x1 Convolution · Batch Normalization · Inverted Residual Block · Convolution · Average Pooling · Tether Customer Service Number +1-833-534-1729 · Mixup