TL;DR
This paper critically examines the weaknesses of reinforcement learning in neural machine translation, providing empirical evidence that challenges previous criticisms and highlights the importance of exploration and reward scaling.
Contribution
It offers a comprehensive empirical study that revisits prior claims about RL weaknesses in NMT, emphasizing the roles of exploration and reward scaling.
Findings
Exploration and reward scaling are crucial for RL success in NMT.
Empirical evidence counters previous criticisms of RL weaknesses.
RL can be effective in both in-domain and cross-domain NMT tasks.
Abstract
Policy gradient algorithms have found wide adoption in NLP, but have recently become subject to criticism, doubting their suitability for NMT. Choshen et al. (2020) identify multiple weaknesses and suspect that their success is determined by the shape of output distributions rather than the reward. In this paper, we revisit these claims and study them under a wider range of configurations. Our experiments on in-domain and cross-domain adaptation reveal the importance of exploration and reward scaling, and provide empirical counter-evidence to these claims.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
