On the Information Redundancy in Non-Autoregressive Translation
Zhihao Wang, Longyue Wang, Jinsong Su, Junfeng Yao, Zhaopeng Tu

TL;DR
This paper investigates various types of information redundancy errors in non-autoregressive translation models, introduces automatic metrics for their evaluation, and enhances understanding of model errors beyond traditional repetition metrics.
Contribution
The study identifies new types of redundancy errors in NAT models and proposes automatic metrics for their evaluation, improving analysis of model performance.
Findings
Advanced NAT models exhibit multiple redundancy errors.
Traditional repetition metrics are insufficient to capture all errors.
Proposed metrics enable comprehensive evaluation of redundancy errors.
Abstract
Token repetition is a typical form of multi-modal problem in fully non-autoregressive translation (NAT). In this work, we revisit the multi-modal problem in recently proposed NAT models. Our study reveals that these advanced models have introduced other types of information redundancy errors, which cannot be measured by the conventional metric - the continuous repetition ratio. By manually annotating the NAT outputs, we identify two types of information redundancy errors that correspond well to lexical and reordering multi-modality problems. Since human annotation is time-consuming and labor-intensive, we propose automatic metrics to evaluate the two types of redundant errors. Our metrics allow future studies to evaluate new methods and gain a more comprehensive understanding of their effectiveness.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
