TL;DR
This paper introduces PDTW, a probabilistic method for unsupervised speech pattern discovery that adaptively evaluates alignment quality, improving over traditional DTW approaches and performing well across multiple languages without dataset-specific tuning.
Contribution
The paper presents a novel probabilistic approach to DTW-based unsupervised spoken term discovery that automatically adapts to different datasets, enhancing robustness and applicability.
Findings
PDTW outperforms previous DTW-based systems in pattern coverage.
The method maintains consistent performance across five languages.
Fixed hyperparameters work effectively without dataset-specific tuning.
Abstract
Unsupervised spoken term discovery (UTD) aims at finding recurring segments of speech from a corpus of acoustic speech data. One potential approach to this problem is to use dynamic time warping (DTW) to find well-aligning patterns from the speech data. However, automatic selection of initial candidate segments for the DTW-alignment and detection of "sufficiently good" alignments among those require some type of pre-defined criteria, often operationalized as threshold parameters for pair-wise distance metrics between signal representations. In the existing UTD systems, the optimal hyperparameters may differ across datasets, limiting their applicability to new corpora and truly low-resource scenarios. In this paper, we propose a novel probabilistic approach to DTW-based UTD named as PDTW. In PDTW, distributional characteristics of the processed corpus are utilized for adaptive evaluation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
