Multilingual Contextual Adapters To Improve Custom Word Recognition In   Low-resource Languages

Devang Kulshreshtha; Saket Dingliwal; Brady Houston; Sravan Bodapati

arXiv:2307.00759·cs.CL·July 4, 2023

Multilingual Contextual Adapters To Improve Custom Word Recognition In Low-resource Languages

Devang Kulshreshtha, Saket Dingliwal, Brady Houston, Sravan Bodapati

PDF

Open Access

TL;DR

This paper introduces a multilingual training strategy with supervision loss for Contextual Adapters in CTC-based ASR, significantly enhancing custom word recognition in low-resource languages and improving overall model performance.

Contribution

It proposes a supervised training method and multilingual approach for Contextual Adapters, addressing low-resource language challenges in custom word recognition.

Findings

01

48% F1 improvement in unseen custom entity retrieval

02

5-11% WER reduction in base CTC model

03

Effective strategy for low-resource language ASR

Abstract

Connectionist Temporal Classification (CTC) models are popular for their balance between speed and performance for Automatic Speech Recognition (ASR). However, these CTC models still struggle in other areas, such as personalization towards custom words. A recent approach explores Contextual Adapters, wherein an attention-based biasing model for CTC is used to improve the recognition of custom entities. While this approach works well with enough data, we showcase that it isn't an effective strategy for low-resource languages. In this work, we propose a supervision loss for smoother training of the Contextual Adapters. Further, we explore a multilingual strategy to improve performance with limited training data. Our method achieves 48% F1 improvement in retrieving unseen custom entities for a low-resource language. Interestingly, as a by-product of training the Contextual Adapters, we see…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Speech Recognition and Synthesis · Natural Language Processing Techniques

MethodsBalanced Selection · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings