A Technical Report: BUT Speech Translation Systems
Hari Krishna Vydana, Lukas Burget, Jan Cernocky

TL;DR
This paper presents BUT's English-German offline speech translation systems that improve performance by jointly training ASR and MT modules with an auxiliary loss, reducing degradation when translating ASR hypotheses.
Contribution
The paper introduces a joint training approach for ASR and MT modules using hidden representations, enhancing speech translation performance over previous methods.
Findings
Joint training reduces translation degradation from ASR errors.
Ensembling further improves translation accuracy.
The system achieves better performance with end-to-end differentiability.
Abstract
The paper describes the BUT's speech translation systems. The systems are EnglishGerman offline speech translation systems. The systems are based on our previous works \cite{Jointly_trained_transformers}. Though End-to-End and cascade~(ASR-MT) spoken language translation~(SLT) systems are reaching comparable performances, a large degradation is observed when translating ASR hypothesis compared to the oracle input text. To reduce this performance degradation, we have jointly-trained ASR and MT modules with ASR objective as an auxiliary loss. Both the networks are connected through the neural hidden representations. This model has an End-to-End differentiable path with respect to the final objective function and also utilizes the ASR objective for better optimization. During the inference both the modules(i.e., ASR and MT) are connected through the hidden representations…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis
