Conditioned Time-Dilated Convolutions for Sound Event Detection

Konstantinos Drossos; Stylianos I. Mimilakis; Tuomas Virtanen

arXiv:2007.05183·cs.SD·July 13, 2020

Conditioned Time-Dilated Convolutions for Sound Event Detection

Konstantinos Drossos, Stylianos I. Mimilakis, Tuomas Virtanen

PDF

Open Access

TL;DR

This paper introduces conditioned time-dilated convolutions for sound event detection, improving performance by integrating prediction embeddings into the convolution process, leading to higher accuracy and lower error rates.

Contribution

It proposes a novel conditioning algorithm for time-dilated convolutions in SED, enhancing detection accuracy over previous methods.

Findings

01

Achieved a 2% increase in F1 score (0.63 to 0.65)

02

Reduced error rate by 3% (0.50 to 0.47)

03

Validated on TUT-SED Synthetic dataset

Abstract

Sound event detection (SED) is the task of identifying sound events along with their onset and offset times. A recent, convolutional neural networks based SED method, proposed the usage of depthwise separable (DWS) and time-dilated convolutions. DWS and time-dilated convolutions yielded state-of-the-art results for SED, with considerable small amount of parameters. In this work we propose the expansion of the time-dilated convolutions, by conditioning them with jointly learned embeddings of the SED predictions by the SED classifier. We present a novel algorithm for the conditioning of the time-dilated convolutions which functions similarly to language modelling, and enhances the performance of the these convolutions. We employ the freely available TUT-SED Synthetic dataset, and we assess the performance of our method using the average per-frame $F_{1}$ score and average per-frame…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech and Audio Processing · Speech Recognition and Synthesis