UML: A Universal Monolingual Output Layer for Multilingual ASR

Chao Zhang; Bo Li; Tara N. Sainath; Trevor Strohman; Shuo-yiin Chang

arXiv:2302.11186·eess.AS·February 23, 2023

UML: A Universal Monolingual Output Layer for Multilingual ASR

Chao Zhang, Bo Li, Tara N. Sainath, Trevor Strohman, Shuo-yiin Chang

PDF

Open Access

TL;DR

This paper introduces UML, a universal monolingual output layer for multilingual ASR, which reduces output layer size and improves efficiency by sharing nodes across languages, demonstrated on an 11-language voice search task.

Contribution

The paper proposes UML, a novel output layer design that re-associates nodes with multiple WPMs, enabling scalable and efficient multilingual ASR.

Findings

01

UML reduces output layer size in multilingual ASR.

02

UML achieves high recognition quality across 11 languages.

03

UML enables dynamic interpretation of output nodes based on language.

Abstract

Word-piece models (WPMs) are commonly used subword units in state-of-the-art end-to-end automatic speech recognition (ASR) systems. For multilingual ASR, due to the differences in written scripts across languages, multilingual WPMs bring the challenges of having overly large output layers and scaling to more languages. In this work, we propose a universal monolingual output layer (UML) to address such problems. Instead of one output node for only one WPM, UML re-associates each output node with multiple WPMs, one for each language, and results in a smaller monolingual output layer shared across languages. Consequently, the UML enables to switch in the interpretation of each output node depending on the language of the input speech. Experimental results on an 11-language voice search task demonstrated the feasibility of using UML for high-quality and high-efficiency multilingual…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and dialogue systems · Natural Language Processing Techniques