Reciprocal Supervised Learning Improves Neural Machine Translation
Minkai Xu, Mingxuan Wang, Zhouhan Lin, Hao Zhou, Weinan Zhang, Lei Li

TL;DR
Reciprocal Supervised Learning (RSL) enhances neural machine translation by collaboratively training multiple models to generate and utilize pseudo data, leveraging their diverse biases for improved accuracy and efficiency.
Contribution
This paper introduces RSL, a novel cooperative training method that improves NMT by jointly exploiting multiple models' agreement, surpassing previous knowledge distillation approaches.
Findings
RSL significantly improves translation accuracy on multiple benchmarks.
It outperforms traditional knowledge distillation and ensemble methods.
RSL is more computationally efficient than ensemble approaches.
Abstract
Despite the recent success on image classification, self-training has only achieved limited gains on structured prediction tasks such as neural machine translation (NMT). This is mainly due to the compositionality of the target space, where the far-away prediction hypotheses lead to the notorious reinforced mistake problem. In this paper, we revisit the utilization of multiple diverse models and present a simple yet effective approach named Reciprocal-Supervised Learning (RSL). RSL first exploits individual models to generate pseudo parallel data, and then cooperatively trains each model on the combined synthetic corpus. RSL leverages the fact that different parameterized models have different inductive biases, and better predictions can be made by jointly exploiting the agreement among each other. Unlike the previous knowledge distillation methods built upon a much stronger teacher,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications
MethodsKnowledge Distillation
