Source and Target Bidirectional Knowledge Distillation for End-to-end   Speech Translation

Hirofumi Inaguma; Tatsuya Kawahara; Shinji Watanabe

arXiv:2104.06457·cs.CL·April 15, 2021·5 cites

Source and Target Bidirectional Knowledge Distillation for End-to-end Speech Translation

Hirofumi Inaguma, Tatsuya Kawahara, Shinji Watanabe

PDF

Open Access

TL;DR

This paper introduces a bidirectional knowledge distillation approach for end-to-end speech translation, utilizing both forward and backward sequence-level distillation from text-based NMT models to enhance translation accuracy.

Contribution

It proposes a novel bidirectional SeqKD method, including backward SeqKD and paraphrased auxiliary tasks, to better leverage source language information in speech translation models.

Findings

01

Bidirectional SeqKD improves translation performance consistently.

02

Both forward and backward SeqKD provide complementary benefits.

03

The approach enhances models of varying capacities.

Abstract

A conventional approach to improving the performance of end-to-end speech translation (E2E-ST) models is to leverage the source transcription via pre-training and joint training with automatic speech recognition (ASR) and neural machine translation (NMT) tasks. However, since the input modalities are different, it is difficult to leverage source language text successfully. In this work, we focus on sequence-level knowledge distillation (SeqKD) from external text-based NMT models. To leverage the full potential of the source language information, we propose backward SeqKD, SeqKD from a target-to-source backward NMT model. To this end, we train a bilingual E2E-ST model to predict paraphrased transcriptions as an auxiliary task with a single decoder. The paraphrases are generated from the translations in bitext via back-translation. We further propose bidirectional SeqKD in which SeqKD…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis

MethodsKnowledge Distillation