Transliterated Zero-Shot Domain Adaptation for Automatic Speech   Recognition

Han Zhu; Gaofeng Cheng; Qingwei Zhao; Pengyuan Zhang

arXiv:2412.11185·eess.AS·December 17, 2024

Transliterated Zero-Shot Domain Adaptation for Automatic Speech Recognition

Han Zhu, Gaofeng Cheng, Qingwei Zhao, Pengyuan Zhang

PDF

Open Access

TL;DR

This paper introduces a transliteration-based zero-shot domain adaptation method for automatic speech recognition, enabling models to adapt to new domains in unseen languages without target domain data.

Contribution

It proposes transliterated ZSDA, preserving pre-trained knowledge during fine-tuning, and demonstrates significant improvements over baseline methods in cross-lingual speech recognition.

Findings

01

Reduces word error rate by 9.2% compared to baseline

02

Outperforms self-supervised ZSDA methods

03

Matches supervised ZSDA performance

Abstract

The performance of automatic speech recognition models often degenerates on domains not covered by the training data. Domain adaptation can address this issue, assuming the availability of the target domain data in the target language. However, such assumption does not stand in many real-world applications. To make domain adaptation more applicable, we address the problem of zero-shot domain adaptation (ZSDA), where target domain data is unavailable in the target language. Instead, we transfer the target domain knowledge from another source language where the target domain data is more accessible. To do that, we first perform cross-lingual pre-training (XLPT) to share domain knowledge across languages, then use target language fine-tuning to build the final model. One challenge in this practice is that the pre-trained knowledge can be forgotten during fine-tuning, resulting in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing