Automatic Speech Recognition with Very Large Conversational Finnish and Estonian Vocabularies
Seppo Enarvi, Peter Smit, Sami Virpioja, Mikko Kurimo

TL;DR
This paper addresses the challenges of large vocabulary speech recognition in agglutinative languages like Finnish and Estonian, emphasizing the need for extensive vocabularies to handle colloquial variations, inflections, and rare words.
Contribution
It introduces methods for managing very large vocabularies in speech recognition systems for Finnish and Estonian, focusing on handling linguistic complexity and out-of-vocabulary issues.
Findings
Large vocabularies improve recognition of colloquial and rare words.
Handling millions of word forms enhances speech recognition accuracy.
The approach supports agglutinative language complexities.
Abstract
Today, the vocabulary size for language models in large vocabulary speech recognition is typically several hundreds of thousands of words. While this is already sufficient in some applications, the out-of-vocabulary words are still limiting the usability in others. In agglutinative languages the vocabulary for conversational speech should include millions of word forms to cover the spelling variations due to colloquial pronunciations, in addition to the word compounding and inflections. Very large vocabularies are also needed, for example, when the recognition of rare proper names is important.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
