Contextualized Translation of Automatically Segmented Speech

Marco Gaido; Mattia Antonino Di Gangi; Matteo Negri; Mauro Cettolo,; Marco Turchi

arXiv:2008.02270·cs.CL·August 6, 2020

Contextualized Translation of Automatically Segmented Speech

Marco Gaido, Mattia Antonino Di Gangi, Matteo Negri, Mauro Cettolo,, Marco Turchi

PDF

1 Repo

TL;DR

This paper proposes a context-aware approach to improve speech-to-text translation robustness against sub-optimal, VAD-based audio segmentation by training models on randomly segmented data and incorporating previous segments as context.

Contribution

It introduces a novel context-aware training method that enhances translation quality when dealing with imperfect, VAD-based audio segmentation.

Findings

01

Outperforms baseline models by up to 4.25 BLEU points on VAD-segmented input.

02

Training on randomly segmented data improves robustness to segmentation mismatches.

03

Adding previous segments as context enhances translation performance.

Abstract

Direct speech-to-text translation (ST) models are usually trained on corpora segmented at sentence level, but at inference time they are commonly fed with audio split by a voice activity detector (VAD). Since VAD segmentation is not syntax-informed, the resulting segments do not necessarily correspond to well-formed sentences uttered by the speaker but, most likely, to fragments of one or more sentences. This segmentation mismatch degrades considerably the quality of ST models' output. So far, researchers have focused on improving audio segmentation towards producing sentence-like splits. In this paper, instead, we address the issue in the model, making it more robust to a different, potentially sub-optimal segmentation. To this aim, we train our models on randomly segmented data and compare two approaches: fine-tuning and adding the previous segment as context. We show that our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mgaido91/FBK-fairseq-ST
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.