Unified Segment-to-Segment Framework for Simultaneous Sequence Generation
Shaolei Zhang, Yang Feng

TL;DR
This paper introduces a unified framework for simultaneous sequence generation that adaptively learns optimal source-target mappings, improving performance and generality across tasks like streaming speech recognition and machine translation.
Contribution
The proposed Seg2Seg framework is the first to unify various simultaneous sequence generation tasks with an adaptive, learnable mapping approach, surpassing task-specific heuristics.
Findings
Achieves state-of-the-art results on multiple tasks
Demonstrates better generality across different sequence generation tasks
Effectively learns optimal generation moments through expectation training
Abstract
Simultaneous sequence generation is a pivotal task for real-time scenarios, such as streaming speech recognition, simultaneous machine translation and simultaneous speech translation, where the target sequence is generated while receiving the source sequence. The crux of achieving high-quality generation with low latency lies in identifying the optimal moments for generating, accomplished by learning a mapping between the source and target sequences. However, existing methods often rely on task-specific heuristics for different sequence types, limiting the model's capacity to adaptively learn the source-target mapping and hindering the exploration of multi-task learning for various simultaneous tasks. In this paper, we propose a unified segment-to-segment framework (Seg2Seg) for simultaneous sequence generation, which learns the mapping in an adaptive and unified manner. During the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Speech Recognition and Synthesis · Multimodal Machine Learning Applications
