Unsupervised Pattern Discovery from Thematic Speech Archives Based on   Multilingual Bottleneck Features

Man-Ling Sung; Siyuan Feng; Tan Lee

arXiv:2011.01986·eess.AS·November 5, 2020

Unsupervised Pattern Discovery from Thematic Speech Archives Based on Multilingual Bottleneck Features

Man-Ling Sung, Siyuan Feng, Tan Lee

PDF

TL;DR

This paper introduces an unsupervised method for discovering spoken keywords in untranscribed multilingual speech archives, leveraging deep neural networks and pattern mining to identify topic-related words without transcriptions.

Contribution

It presents a novel two-stage approach combining unsupervised acoustic modeling with pattern mining, enabling keyword discovery in low-resource and multilingual speech data.

Findings

01

Effective extraction of topic-related words from lecture recordings

02

Utilizes multilingual deep neural networks for acoustic modeling

03

Achieves promising results without transcriptions

Abstract

The present study tackles the problem of automatically discovering spoken keywords from untranscribed audio archives without requiring word-by-word speech transcription by automatic speech recognition (ASR) technology. The problem is of practical significance in many applications of speech analytics, including those concerning low-resource languages, and large amount of multilingual and multi-genre data. We propose a two-stage approach, which comprises unsupervised acoustic modeling and decoding, followed by pattern mining in acoustic unit sequences. The whole process starts by deriving and modeling a set of subword-level speech units with untranscribed data. With the unsupervisedly trained acoustic models, a given audio archive is represented by a pseudo transcription, from which spoken keywords can be discovered by string mining algorithms. For unsupervised acoustic modeling, a deep…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.