End-to-End Auditory Object Recognition via Inception Nucleus

Mohammad K. Ebrahimpour; Timothy Shea; Andreea Danielescu; David C.; Noelle; Christopher T. Kello

arXiv:2005.12195·cs.SD·May 26, 2020

End-to-End Auditory Object Recognition via Inception Nucleus

Mohammad K. Ebrahimpour, Timothy Shea, Andreea Danielescu, David C., Noelle, Christopher T. Kello

PDF

TL;DR

This paper introduces a novel end-to-end deep neural network with an inception nucleus for auditory object recognition, achieving superior accuracy and reducing engineering effort by learning optimal filter sizes directly from raw waveforms.

Contribution

The paper presents a new deep neural network architecture with an inception nucleus that automatically optimizes convolutional filter sizes for end-to-end auditory classification.

Findings

01

Outperforms state-of-the-art by 10.4% on Urbansound8k

02

Filters learn wavelet-like transforms in early layers

03

Reduces engineering effort in feature design

Abstract

Machine learning approaches to auditory object recognition are traditionally based on engineered features such as those derived from the spectrum or cepstrum. More recently, end-to-end classification systems in image and auditory recognition systems have been developed to learn features jointly with classification and result in improved classification accuracy. In this paper, we propose a novel end-to-end deep neural network to map the raw waveform inputs to sound class labels. Our network includes an "inception nucleus" that optimizes the size of convolutional filters on the fly that results in reducing engineering efforts dramatically. Classification results compared favorably against current state-of-the-art approaches, besting them by 10.4 percentage points on the Urbansound8k dataset. Analyses of learned representations revealed that filters in the earlier hidden layers learned…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.