Non-Parametric Domain Adaptation for End-to-End Speech Translation
Yichao Du, Weizhi Wang, Zhirui Zhang, Boxing Chen, Tong Xu, Jun Xie, and Enhong Chen

TL;DR
This paper introduces a non-parametric domain adaptation method for end-to-end speech translation that leverages domain-specific text data and k-nearest-neighbor search to improve translation quality without extensive in-domain speech data.
Contribution
It proposes a novel non-parametric approach that uses an additional encoder and kNN classifier for effective domain adaptation in E2E speech translation.
Findings
Achieves 12.82 BLEU improvement over baseline on Europarl-ST benchmark.
Outperforms in-domain fine-tuning methods.
Effectively utilizes only text translation data for domain adaptation.
Abstract
End-to-End Speech Translation (E2E-ST) has received increasing attention due to the potential of its less error propagation, lower latency, and fewer parameters. However, the effectiveness of neural-based approaches to this task is severely limited by the available training corpus, especially for domain adaptation where in-domain triplet training data is scarce or nonexistent. In this paper, we propose a novel non-parametric method that leverages domain-specific text translation corpus to achieve domain adaptation for the E2E-ST system. To this end, we first incorporate an additional encoder into the pre-trained E2E-ST model to realize text translation modelling, and then unify the decoder's output representation for text and speech translation tasks by reducing the correspondent representation mismatch in available triplet training data. During domain adaptation, a k-nearest-neighbor…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis
