Unsupervised Discovery of Recurring Speech Patterns Using Probabilistic   Adaptive Metrics

Okko R\"as\"anen; Mar\'ia Andrea Cruz Bland\'on

arXiv:2008.00731·eess.AS·August 4, 2020

Unsupervised Discovery of Recurring Speech Patterns Using Probabilistic Adaptive Metrics

Okko R\"as\"anen, Mar\'ia Andrea Cruz Bland\'on

PDF

2 Repos

TL;DR

This paper introduces PDTW, a probabilistic method for unsupervised speech pattern discovery that adaptively evaluates alignment quality, improving over traditional DTW approaches and performing well across multiple languages without dataset-specific tuning.

Contribution

The paper presents a novel probabilistic approach to DTW-based unsupervised spoken term discovery that automatically adapts to different datasets, enhancing robustness and applicability.

Findings

01

PDTW outperforms previous DTW-based systems in pattern coverage.

02

The method maintains consistent performance across five languages.

03

Fixed hyperparameters work effectively without dataset-specific tuning.

Abstract

Unsupervised spoken term discovery (UTD) aims at finding recurring segments of speech from a corpus of acoustic speech data. One potential approach to this problem is to use dynamic time warping (DTW) to find well-aligning patterns from the speech data. However, automatic selection of initial candidate segments for the DTW-alignment and detection of "sufficiently good" alignments among those require some type of pre-defined criteria, often operationalized as threshold parameters for pair-wise distance metrics between signal representations. In the existing UTD systems, the optimal hyperparameters may differ across datasets, limiting their applicability to new corpora and truly low-resource scenarios. In this paper, we propose a novel probabilistic approach to DTW-based UTD named as PDTW. In PDTW, distributional characteristics of the processed corpus are utilized for adaptive evaluation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.