Dyn-ASR: Compact, Multilingual Speech Recognition via Spoken Language and Accent Identification
Sangeeta Ghangam, Daniel Whitenack, Joshua Nemecek

TL;DR
This paper introduces Dyn-ASR, a compact multilingual speech recognition system that dynamically selects monolingual models based on language and accent identification, optimizing performance and resource use on edge devices.
Contribution
It presents a novel dynamic model selection method leveraging language and accent identification for efficient multilingual ASR on resource-constrained devices.
Findings
Uses less than 1/12th memory of existing solutions
Achieves promising recognition performance
Efficient resource utilization on edge devices
Abstract
Running automatic speech recognition (ASR) on edge devices is non-trivial due to resource constraints, especially in scenarios that require supporting multiple languages. We propose a new approach to enable multilingual speech recognition on edge devices. This approach uses both language identification and accent identification to select one of multiple monolingual ASR models on-the-fly, each fine-tuned for a particular accent. Initial results for both recognition performance and resource usage are promising with our approach using less than 1/12th of the memory consumed by other solutions.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing
