A two-stage transliteration approach to improve performance of a multilingual ASR
Rohit Kumar

TL;DR
This paper introduces a two-stage transliteration approach for multilingual end-to-end ASR systems that improves recognition accuracy by projecting language-specific graphemes into a common script, enhancing scalability and reducing errors.
Contribution
The paper proposes a novel two-stage transliteration method that creates a language-agnostic grapheme set, enabling better handling of multiple languages in ASR without extensive retraining.
Findings
Achieved 20% relative reduction in Word Error Rate (WER).
Achieved 24% relative reduction in Character Error Rate (CER).
Demonstrated effectiveness on Nepali and Telugu languages.
Abstract
End-to-end Automatic Speech Recognition (ASR) systems are rapidly claiming to become state-of-art over other modeling methods. Several techniques have been introduced to improve their ability to handle multiple languages. However, due to variation in writing scripts for different languages, while decoding acoustically similar units, they do not always map to an appropriate grapheme in the target language. This restricts the scalability and adaptability of the model while dealing with multiple languages in code-mixing scenarios. This paper presents an approach to build a language-agnostic end-to-end model trained on a grapheme set obtained by projecting the multilingual grapheme data to the script of a more generic target language. This approach saves the acoustic model from retraining to span over a larger space and can easily be extended to multiple languages. A two-stage…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing
MethodsSparse Evolutionary Training
