Unsupervised Pattern Discovery from Thematic Speech Archives Based on Multilingual Bottleneck Features
Man-Ling Sung, Siyuan Feng, Tan Lee

TL;DR
This paper introduces an unsupervised method for discovering spoken keywords in untranscribed multilingual speech archives, leveraging deep neural networks and pattern mining to identify topic-related words without transcriptions.
Contribution
It presents a novel two-stage approach combining unsupervised acoustic modeling with pattern mining, enabling keyword discovery in low-resource and multilingual speech data.
Findings
Effective extraction of topic-related words from lecture recordings
Utilizes multilingual deep neural networks for acoustic modeling
Achieves promising results without transcriptions
Abstract
The present study tackles the problem of automatically discovering spoken keywords from untranscribed audio archives without requiring word-by-word speech transcription by automatic speech recognition (ASR) technology. The problem is of practical significance in many applications of speech analytics, including those concerning low-resource languages, and large amount of multilingual and multi-genre data. We propose a two-stage approach, which comprises unsupervised acoustic modeling and decoding, followed by pattern mining in acoustic unit sequences. The whole process starts by deriving and modeling a set of subword-level speech units with untranscribed data. With the unsupervisedly trained acoustic models, a given audio archive is represented by a pseudo transcription, from which spoken keywords can be discovered by string mining algorithms. For unsupervised acoustic modeling, a deep…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
