A two-stage transliteration approach to improve performance of a   multilingual ASR

Rohit Kumar

arXiv:2410.14709·cs.CL·October 22, 2024

A two-stage transliteration approach to improve performance of a multilingual ASR

Rohit Kumar

PDF

Open Access

TL;DR

This paper introduces a two-stage transliteration approach for multilingual end-to-end ASR systems that improves recognition accuracy by projecting language-specific graphemes into a common script, enhancing scalability and reducing errors.

Contribution

The paper proposes a novel two-stage transliteration method that creates a language-agnostic grapheme set, enabling better handling of multiple languages in ASR without extensive retraining.

Findings

01

Achieved 20% relative reduction in Word Error Rate (WER).

02

Achieved 24% relative reduction in Character Error Rate (CER).

03

Demonstrated effectiveness on Nepali and Telugu languages.

Abstract

End-to-end Automatic Speech Recognition (ASR) systems are rapidly claiming to become state-of-art over other modeling methods. Several techniques have been introduced to improve their ability to handle multiple languages. However, due to variation in writing scripts for different languages, while decoding acoustically similar units, they do not always map to an appropriate grapheme in the target language. This restricts the scalability and adaptability of the model while dealing with multiple languages in code-mixing scenarios. This paper presents an approach to build a language-agnostic end-to-end model trained on a grapheme set obtained by projecting the multilingual grapheme data to the script of a more generic target language. This approach saves the acoustic model from retraining to span over a larger space and can easily be extended to multiple languages. A two-stage…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing

MethodsSparse Evolutionary Training