Unsupervised Speech Recognition with N-Skipgram and Positional Unigram Matching
Liming Wang, Mark Hasegawa-Johnson, Chang D. Yoo

TL;DR
This paper introduces ESPUM, an unsupervised speech recognition system that leverages N-skipgrams and positional unigram statistics to address training challenges, demonstrating competitive results on TIMIT.
Contribution
The novel ESPUM system combines N-skipgrams with positional unigram statistics for unsupervised speech recognition, reducing memory use and improving stability.
Findings
Competitive performance on TIMIT benchmark
Effective in phoneme segmentation tasks
Addresses GAN instability and memory issues
Abstract
Training unsupervised speech recognition systems presents challenges due to GAN-associated instability, misalignment between speech and text, and significant memory demands. To tackle these challenges, we introduce a novel ASR system, ESPUM. This system harnesses the power of lower-order N-skipgrams (up to N=3) combined with positional unigram statistics gathered from a small batch of samples. Evaluated on the TIMIT benchmark, our model showcases competitive performance in ASR and phoneme segmentation tasks. Access our publicly available code at https://github.com/lwang114/GraphUnsupASR.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis
