Differentiable Allophone Graphs for Language-Universal Speech   Recognition

Brian Yan; Siddharth Dalmia; David R. Mortensen; Florian Metze; Shinji; Watanabe

arXiv:2107.11628·cs.CL·July 27, 2021·1 cites

Differentiable Allophone Graphs for Language-Universal Speech Recognition

Brian Yan, Siddharth Dalmia, David R. Mortensen, Florian Metze, Shinji, Watanabe

PDF

Open Access 1 Repo

TL;DR

This paper introduces a framework for creating universal speech recognition models by deriving phone-level supervision from phonemic transcriptions using differentiable allophone graphs, enabling multilingual and interpretable phoneme-to-allophone mappings.

Contribution

The work presents a novel differentiable allophone graph approach that learns language-specific phoneme-to-allophone mappings from phonemic transcriptions, facilitating universal and interpretable speech recognition.

Findings

01

Trained on 7 diverse languages, the system effectively models pronunciation variations.

02

The approach enables linguists to document languages and build lexicons with rich pronunciation data.

03

The model provides interpretable probabilistic mappings for each language.

Abstract

Building language-universal speech recognition systems entails producing phonological units of spoken sound that can be shared across languages. While speech annotations at the language-specific phoneme or surface levels are readily available, annotations at a universal phone level are relatively rare and difficult to produce. In this work, we present a general framework to derive phone-level supervision from only phonemic transcriptions and phone-to-phoneme mappings with learnable weights represented using weighted finite-state transducers, which we call differentiable allophone graphs. By training multilingually, we build a universal phone-based speech recognition model with interpretable probabilistic phone-to-phoneme mappings for each language. These phone-based systems with learned allophone graphs can be used by linguists to document new languages, build phone-based lexicons that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

cmu-llab/meloni-2021-reimplementation
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing