End-to-End Auditory Object Recognition via Inception Nucleus
Mohammad K. Ebrahimpour, Timothy Shea, Andreea Danielescu, David C., Noelle, Christopher T. Kello

TL;DR
This paper introduces a novel end-to-end deep neural network with an inception nucleus for auditory object recognition, achieving superior accuracy and reducing engineering effort by learning optimal filter sizes directly from raw waveforms.
Contribution
The paper presents a new deep neural network architecture with an inception nucleus that automatically optimizes convolutional filter sizes for end-to-end auditory classification.
Findings
Outperforms state-of-the-art by 10.4% on Urbansound8k
Filters learn wavelet-like transforms in early layers
Reduces engineering effort in feature design
Abstract
Machine learning approaches to auditory object recognition are traditionally based on engineered features such as those derived from the spectrum or cepstrum. More recently, end-to-end classification systems in image and auditory recognition systems have been developed to learn features jointly with classification and result in improved classification accuracy. In this paper, we propose a novel end-to-end deep neural network to map the raw waveform inputs to sound class labels. Our network includes an "inception nucleus" that optimizes the size of convolutional filters on the fly that results in reducing engineering efforts dramatically. Classification results compared favorably against current state-of-the-art approaches, besting them by 10.4 percentage points on the Urbansound8k dataset. Analyses of learned representations revealed that filters in the earlier hidden layers learned…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
