A Technical Report: BUT Speech Translation Systems

Hari Krishna Vydana; Lukas Burget; Jan Cernocky

arXiv:2010.11593·cs.CL·October 23, 2020

A Technical Report: BUT Speech Translation Systems

Hari Krishna Vydana, Lukas Burget, Jan Cernocky

PDF

Open Access

TL;DR

This paper presents BUT's English-German offline speech translation systems that improve performance by jointly training ASR and MT modules with an auxiliary loss, reducing degradation when translating ASR hypotheses.

Contribution

The paper introduces a joint training approach for ASR and MT modules using hidden representations, enhancing speech translation performance over previous methods.

Findings

01

Joint training reduces translation degradation from ASR errors.

02

Ensembling further improves translation accuracy.

03

The system achieves better performance with end-to-end differentiability.

Abstract

The paper describes the BUT's speech translation systems. The systems are English $⟶$ German offline speech translation systems. The systems are based on our previous works \cite{Jointly_trained_transformers}. Though End-to-End and cascade~(ASR-MT) spoken language translation~(SLT) systems are reaching comparable performances, a large degradation is observed when translating ASR hypothesis compared to the oracle input text. To reduce this performance degradation, we have jointly-trained ASR and MT modules with ASR objective as an auxiliary loss. Both the networks are connected through the neural hidden representations. This model has an End-to-End differentiable path with respect to the final objective function and also utilizes the ASR objective for better optimization. During the inference both the modules(i.e., ASR and MT) are connected through the hidden representations…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis