Joint Speech Transcription and Translation: Pseudo-Labeling with   Out-of-Distribution Data

Mozhdeh Gheini; Tatiana Likhomanenko; Matthias Sperber; Hendra; Setiawan

arXiv:2212.09982·cs.CL·December 21, 2022·1 cites

Joint Speech Transcription and Translation: Pseudo-Labeling with Out-of-Distribution Data

Mozhdeh Gheini, Tatiana Likhomanenko, Matthias Sperber, Hendra, Setiawan

PDF

Open Access

TL;DR

This paper explores pseudo-labeling techniques to improve joint speech transcription and translation in data-scarce scenarios, addressing domain mismatch issues with filtering and augmentation, leading to modest performance gains.

Contribution

It introduces domain mismatch remedies—pseudo-label filtering and data augmentation—for pseudo-labeling in joint speech transcription and translation, enhancing performance without extra supervision.

Findings

01

Up to 0.6% absolute WER improvement.

02

Up to 2.2 BLEU points increase.

03

Effective domain mismatch mitigation methods.

Abstract

Self-training has been shown to be helpful in addressing data scarcity for many domains, including vision, speech, and language. Specifically, self-training, or pseudo-labeling, labels unsupervised data and adds that to the training pool. In this work, we investigate and use pseudo-labeling for a recently proposed novel setup: joint transcription and translation of speech, which suffers from an absence of sufficient data resources. We show that under such data-deficient circumstances, the unlabeled data can significantly vary in domain from the supervised data, which results in pseudo-label quality degradation. We investigate two categories of remedies that require no additional supervision and target the domain mismatch: pseudo-label filtering and data augmentation. We show that pseudo-label analysis and processing as such results in additional gains on top of the vanilla…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Speech Recognition and Synthesis · Speech and Audio Processing