Filler Word Detection and Classification: A Dataset and Benchmark

Ge Zhu; Juan-Pablo Caceres; Justin Salamon

arXiv:2203.15135·cs.CL·July 5, 2022

Filler Word Detection and Classification: A Dataset and Benchmark

Ge Zhu, Juan-Pablo Caceres, Justin Salamon

PDF

Open Access 1 Repo 4 Datasets

TL;DR

This paper introduces PodcastFillers, a large annotated dataset for filler word detection in podcasts, and proposes a pipeline combining VAD and ASR that achieves state-of-the-art results, facilitating future research in this area.

Contribution

The paper provides the first large-scale annotated dataset for filler words and presents a novel detection pipeline that outperforms keyword spotting methods.

Findings

01

The proposed pipeline achieves state-of-the-art detection accuracy.

02

Leveraging ASR significantly improves filler word classification.

03

The dataset and benchmark facilitate future research in filler word detection.

Abstract

Filler words such as `uh' or `um' are sounds or words people use to signal they are pausing to think. Finding and removing filler words from recordings is a common and tedious task in media editing. Automatically detecting and classifying filler words could greatly aid in this task, but few studies have been published on this problem to date. A key reason is the absence of a dataset with annotated filler words for model training and evaluation. In this work, we present a novel speech dataset, PodcastFillers, with 35K annotated filler words and 50K annotations of other sounds that commonly occur in podcasts such as breaths, laughter, and word repetitions. We propose a pipeline that leverages VAD and ASR to detect filler candidates and a classifier to distinguish between filler word types. We evaluate our proposed pipeline on PodcastFillers, compare to several baselines, and present a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

gzhu06/PodcastFillers_Utils
noneOfficial

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Radio, Podcasts, and Digital Media · Speech Recognition and Synthesis