Learning Environmental Sounds with Multi-scale Convolutional Neural   Network

Boqing Zhu; Changjian Wang; Feng Liu; Jin Lei; Zengquan Lu; Yuxing; Peng

arXiv:1803.10219·cs.SD·March 29, 2018·6 cites

Learning Environmental Sounds with Multi-scale Convolutional Neural Network

Boqing Zhu, Changjian Wang, Feng Liu, Jin Lei, Zengquan Lu, Yuxing, Peng

PDF

Open Access 1 Repo 2 Datasets

TL;DR

This paper introduces WaveMsNet, an end-to-end neural network utilizing multi-scale convolution and a two-phase feature fusion method to improve environmental sound classification accuracy from raw waveforms and spectrograms.

Contribution

The paper proposes a novel multi-scale convolution operation and a two-phase feature fusion approach within an end-to-end network for environmental sound recognition.

Findings

01

Achieved 93.75% accuracy on ESC-10 dataset.

02

Achieved 79.10% accuracy on ESC-50 dataset.

03

Significantly outperforms previous methods.

Abstract

Deep learning has dramatically improved the performance of sounds recognition. However, learning acoustic models directly from the raw waveform is still challenging. Current waveform-based models generally use time-domain convolutional layers to extract features. The features extracted by single size filters are insufficient for building discriminative representation of audios. In this paper, we propose multi-scale convolution operation, which can get better audio representation by improving the frequency resolution and learning filters cross all frequency area. For leveraging the waveform-based features and spectrogram-based features in a single model, we introduce two-phase method to fuse the different features. Finally, we propose a novel end-to-end network called WaveMsNet based on the multi-scale convolution operation and two-phase method. On the environmental sounds classification…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Cocoxili/WaveMsNet
pytorch

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech and Audio Processing · Speech Recognition and Synthesis