Improving Sequence-to-Sequence Learning via Optimal Transport
Liqun Chen, Yizhe Zhang, Ruiyi Zhang, Chenyang Tao, Zhe Gan, Haichao, Zhang, Bai Li, Dinghan Shen, Changyou Chen, Lawrence Carin

TL;DR
This paper introduces a novel sequence-to-sequence training method using optimal transport to incorporate global semantic information, improving performance across various NLP tasks.
Contribution
It proposes a sequence-level supervision approach based on optimal transport, enhancing the modeling of semantic structures beyond local syntactic patterns.
Findings
Consistent improvements in machine translation, summarization, and image captioning.
Effective global semantic guidance via optimal transport.
The method aligns model output distribution with ground truth using Wasserstein gradient flow.
Abstract
Sequence-to-sequence models are commonly trained via maximum likelihood estimation (MLE). However, standard MLE training considers a word-level objective, predicting the next word given the previous ground-truth partial sentence. This procedure focuses on modeling local syntactic patterns, and may fail to capture long-range semantic structure. We present a novel solution to alleviate these issues. Our approach imposes global sequence-level guidance via new supervision based on optimal transport, enabling the overall characterization and preservation of semantic features. We further show that this method can be understood as a Wasserstein gradient flow trying to match our model to the ground truth sequence distribution. Extensive experiments are conducted to validate the utility of the proposed approach, showing consistent improvements over a wide variety of NLP tasks, including machine…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Natural Language Processing Techniques
