TL;DR
This paper explores re-translation strategies for simultaneous speech translation of long-form content, emphasizing low latency and high quality through repeated source translation, with stability improvements across multiple languages.
Contribution
It introduces a re-translation approach for simultaneous translation of long-form speech, enhancing stability and scalability across multiple languages.
Findings
Re-translation yields low latency and high final quality.
Stability improvements are effective across seven languages.
The pipeline integrates speech recognition and translation tools with heuristics.
Abstract
We investigate the problem of simultaneous machine translation of long-form speech content. We target a continuous speech-to-text scenario, generating translated captions for a live audio feed, such as a lecture or play-by-play commentary. As this scenario allows for revisions to our incremental translations, we adopt a re-translation approach to simultaneous translation, where the source is repeatedly translated from scratch as it grows. This approach naturally exhibits very low latency and high final quality, but at the cost of incremental instability as the output is continuously refined. We experiment with a pipeline of industry-grade speech recognition and translation tools, augmented with simple inference heuristics to improve stability. We use TED Talks as a source of multilingual test data, developing our techniques on English-to-German spoken language translation. Our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsTest
