Dyn-ASR: Compact, Multilingual Speech Recognition via Spoken Language   and Accent Identification

Sangeeta Ghangam; Daniel Whitenack; Joshua Nemecek

arXiv:2108.02034·cs.CL·August 5, 2021·1 cites

Dyn-ASR: Compact, Multilingual Speech Recognition via Spoken Language and Accent Identification

Sangeeta Ghangam, Daniel Whitenack, Joshua Nemecek

PDF

Open Access

TL;DR

This paper introduces Dyn-ASR, a compact multilingual speech recognition system that dynamically selects monolingual models based on language and accent identification, optimizing performance and resource use on edge devices.

Contribution

It presents a novel dynamic model selection method leveraging language and accent identification for efficient multilingual ASR on resource-constrained devices.

Findings

01

Uses less than 1/12th memory of existing solutions

02

Achieves promising recognition performance

03

Efficient resource utilization on edge devices

Abstract

Running automatic speech recognition (ASR) on edge devices is non-trivial due to resource constraints, especially in scenarios that require supporting multiple languages. We propose a new approach to enable multilingual speech recognition on edge devices. This approach uses both language identification and accent identification to select one of multiple monolingual ASR models on-the-fly, each fine-tuned for a particular accent. Initial results for both recognition performance and resource usage are promising with our approach using less than 1/12th of the memory consumed by other solutions.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing