Unsupervised Speech Recognition with N-Skipgram and Positional Unigram   Matching

Liming Wang; Mark Hasegawa-Johnson; Chang D. Yoo

arXiv:2310.02382·cs.CL·October 5, 2023

Unsupervised Speech Recognition with N-Skipgram and Positional Unigram Matching

Liming Wang, Mark Hasegawa-Johnson, Chang D. Yoo

PDF

Open Access 1 Repo

TL;DR

This paper introduces ESPUM, an unsupervised speech recognition system that leverages N-skipgrams and positional unigram statistics to address training challenges, demonstrating competitive results on TIMIT.

Contribution

The novel ESPUM system combines N-skipgrams with positional unigram statistics for unsupervised speech recognition, reducing memory use and improving stability.

Findings

01

Competitive performance on TIMIT benchmark

02

Effective in phoneme segmentation tasks

03

Addresses GAN instability and memory issues

Abstract

Training unsupervised speech recognition systems presents challenges due to GAN-associated instability, misalignment between speech and text, and significant memory demands. To tackle these challenges, we introduce a novel ASR system, ESPUM. This system harnesses the power of lower-order N-skipgrams (up to N=3) combined with positional unigram statistics gathered from a small batch of samples. Evaluated on the TIMIT benchmark, our model showcases competitive performance in ASR and phoneme segmentation tasks. Access our publicly available code at https://github.com/lwang114/GraphUnsupASR.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lwang114/graphunsupasr
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis