Reduce and Reconstruct: ASR for Low-Resource Phonetic Languages

Anuj Diwan; Preethi Jyothi

arXiv:2010.09322·eess.AS·September 13, 2021

Reduce and Reconstruct: ASR for Low-Resource Phonetic Languages

Anuj Diwan, Preethi Jyothi

PDF

TL;DR

This paper introduces a technique to improve low-resource phonetic language ASR by reducing the output alphabet based on phonetic similarity and then reconstructing the original sequence, leading to better performance with limited data.

Contribution

It proposes a novel reduction and reconstruction method for low-resource ASR systems, utilizing linguistically meaningful reductions and a finite state transducer-based reconstruction module.

Findings

01

Up to 7% relative WER reduction on Gujarati and Telugu with 10 hours of data

02

Effective alphabet reduction simplifies ASR training in low-resource settings

03

Reconstruction module accurately recovers original sequences from reduced alphabet predictions

Abstract

This work presents a seemingly simple but effective technique to improve low-resource ASR systems for phonetic languages. By identifying sets of acoustically similar graphemes in these languages, we first reduce the output alphabet of the ASR system using linguistically meaningful reductions and then reconstruct the original alphabet using a standalone module. We demonstrate that this lessens the burden and improves the performance of low-resource end-to-end ASR systems (because only reduced-alphabet predictions are needed) and that it is possible to design a very simple but effective reconstruction module that recovers sequences in the original alphabet from sequences in the reduced alphabet. We present a finite state transducer-based reconstruction module that operates on the 1-best ASR hypothesis in the reduced alphabet. We demonstrate the efficacy of our proposed technique using ASR…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.