Maximizing Audio Event Detection Model Performance on Small Datasets   Through Knowledge Transfer, Data Augmentation, And Pretraining: An Ablation   Study

Daniel Tompkins; Kshitiz Kumar; Jian Wu

arXiv:2202.03514·cs.SD·February 9, 2022

Maximizing Audio Event Detection Model Performance on Small Datasets Through Knowledge Transfer, Data Augmentation, And Pretraining: An Ablation Study

Daniel Tompkins, Kshitiz Kumar, Jian Wu

PDF

Open Access

TL;DR

This paper investigates how knowledge transfer, data augmentation, and pretraining improve audio event detection on small datasets, demonstrating their individual contributions and proposing a smaller model that nearly achieves state-of-the-art results.

Contribution

It provides an ablation study analyzing the impact of different components on performance and introduces a compact model with competitive accuracy.

Findings

01

Knowledge transfer from ImageNet improves accuracy.

02

Pretraining on AudioSet enhances performance.

03

A smaller model achieves near SOTA results with fewer parameters.

Abstract

An Xception model reaches state-of-the-art (SOTA) accuracy on the ESC-50 dataset for audio event detection through knowledge transfer from ImageNet weights, pretraining on AudioSet, and an on-the-fly data augmentation pipeline. This paper presents an ablation study that analyzes which components contribute to the boost in performance and training time. A smaller Xception model is also presented which nears SOTA performance with almost a third of the parameters.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech Recognition and Synthesis · Speech and Audio Processing

MethodsPointwise Convolution · Residual Connection · Depthwise Convolution · 1x1 Convolution · Average Pooling · Softmax · Global Average Pooling · Depthwise Separable Convolution · Max Pooling · Dense Connections