Word-level Speech Recognition with a Letter to Word Encoder

Ronan Collobert; Awni Hannun; Gabriel Synnaeve

arXiv:1906.04323·cs.CL·July 16, 2020·1 cites

Word-level Speech Recognition with a Letter to Word Encoder

Ronan Collobert, Awni Hannun, Gabriel Synnaeve

PDF

Open Access 1 Video

TL;DR

This paper introduces a direct-to-word speech recognition model that learns word embeddings from letters, improving accuracy and efficiency over sub-word models while handling unseen words without retraining.

Contribution

The paper presents a novel word-level sequence model that integrates a word network with letter-based embeddings, compatible with various sequence modeling architectures.

Findings

01

Achieves lower word error rates than sub-word models.

02

Can predict unseen words without retraining.

03

Uses larger stride for efficiency without accuracy loss.

Abstract

We propose a direct-to-word sequence model which uses a word network to learn word embeddings from letters. The word network can be integrated seamlessly with arbitrary sequence models including Connectionist Temporal Classification and encoder-decoder models with attention. We show our direct-to-word model can achieve word error rate gains over sub-word level models for speech recognition. We also show that our direct-to-word approach retains the ability to predict words not seen at training time without any retraining. Finally, we demonstrate that a word-level model can use a larger stride than a sub-word level model while maintaining accuracy. This makes the model more efficient both for training and inference.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Word-Level Speech Recognition With a Letter to Word Encoder· slideslive

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing