On Systematic Style Differences between Unsupervised and Supervised MT and an Application for High-Resource Machine Translation
Kelly Marchisio, Markus Freitag, David Grangier

TL;DR
This paper compares supervised and unsupervised machine translation, revealing systematic differences in output quality and structure, and proposes a combined system that enhances translation adequacy and fluency.
Contribution
It identifies systematic style differences between supervised and unsupervised MT and introduces a method to combine their strengths for better translation quality.
Findings
Unsupervised MT produces more fluent translations.
Unsupervised MT outputs are more structurally different from human translations.
Combined systems improve adequacy and fluency based on human ratings.
Abstract
Modern unsupervised machine translation (MT) systems reach reasonable translation quality under clean and controlled data conditions. As the performance gap between supervised and unsupervised MT narrows, it is interesting to ask whether the different training methods result in systematically different output beyond what is visible via quality metrics like adequacy or BLEU. We compare translations from supervised and unsupervised MT systems of similar quality, finding that unsupervised output is more fluent and more structurally different in comparison to human translation than is supervised MT. We then demonstrate a way to combine the benefits of both methods into a single system which results in improved adequacy and fluency as rated by human evaluators. Our results open the door to interesting discussions about how supervised and unsupervised MT might be different yet…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Software Engineering Research
