Autosegmental Neural Nets: Should Phones and Tones be Synchronous or   Asynchronous?

Jialu Li; Mark Hasegawa-Johnson

arXiv:2007.14351·eess.AS·March 29, 2022

Autosegmental Neural Nets: Should Phones and Tones be Synchronous or Asynchronous?

Jialu Li, Mark Hasegawa-Johnson

PDF

TL;DR

This study compares synchronous and asynchronous neural network models for multilingual speech recognition, revealing that while synchronous models excel in joint accuracy, asynchronous models better recognize tones specifically.

Contribution

It introduces and evaluates four CTC-based models with different synchronization constraints for multilingual and cross-lingual speech recognition.

Findings

01

Synchronous models have lower joint phone+tone error rates.

02

Asynchronous models achieve lower tone error rates.

03

Both models are effective across multilingual and cross-lingual tasks.

Abstract

Phones, the segmental units of the International Phonetic Alphabet (IPA), are used for lexical distinctions in most human languages; Tones, the suprasegmental units of the IPA, are used in perhaps 70%. Many previous studies have explored cross-lingual adaptation of automatic speech recognition (ASR) phone models, but few have explored the multilingual and cross-lingual transfer of synchronization between phones and tones. In this paper, we test four Connectionist Temporal Classification (CTC)-based acoustic models, differing in the degree of synchrony they impose between phones and tones. Models are trained and tested multilingually in three languages, then adapted and tested cross-lingually in a fourth. Both synchronous and asynchronous models are effective in both multilingual and cross-lingual settings. Synchronous models achieve lower error rate in the joint phone+tone tier, but…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.