Multi-Granularity Optimization for Non-Autoregressive Translation
Yafu Li, Leyang Cui, Yongjing Yin, Yue Zhang

TL;DR
This paper introduces multi-granularity optimization for non-autoregressive translation, improving performance by incorporating feedback from translation segments of various granularities, thus addressing the limitations of the independence assumption.
Contribution
It proposes a novel multi-granularity optimization method that enhances non-autoregressive translation by leveraging feedback from multiple translation segment levels.
Findings
Significant performance improvements over baseline models
Best results on WMT'16 En-Ro translation task
Highly competitive results on WMT'14 En-De
Abstract
Despite low latency, non-autoregressive machine translation (NAT) suffers severe performance deterioration due to the naive independence assumption. This assumption is further strengthened by cross-entropy loss, which encourages a strict match between the hypothesis and the reference token by token. To alleviate this issue, we propose multi-granularity optimization for NAT, which collects model behaviors on translation segments of various granularities and integrates feedback for backpropagation. Experiments on four WMT benchmarks show that the proposed method significantly outperforms the baseline models trained with cross-entropy loss, and achieves the best performance on WMT'16 En-Ro and highly competitive results on WMT'14 En-De for fully non-autoregressive translation.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Machine Learning and Data Classification
