Differentiable lower bound for expected BLEU score
Vlad Zhukov, Eugene Golikov, Maksim Kretov

TL;DR
This paper introduces a differentiable lower bound for the expected BLEU score, enabling gradient-based optimization without costly sampling methods like REINFORCE, thus addressing the loss-evaluation mismatch in NLP tasks.
Contribution
It proposes a novel method to compute a differentiable lower bound of expected BLEU score, improving optimization efficiency in NLP models.
Findings
The method provides a computationally efficient alternative to REINFORCE.
It effectively bridges the gap between surrogate loss optimization and BLEU score improvement.
The approach enhances the training of NLP models by directly optimizing a differentiable approximation of BLEU.
Abstract
In natural language processing tasks performance of the models is often measured with some non-differentiable metric, such as BLEU score. To use efficient gradient-based methods for optimization, it is a common workaround to optimize some surrogate loss function. This approach is effective if optimization of such loss also results in improving target metric. The corresponding problem is referred to as loss-evaluation mismatch. In the present work we propose a method for calculation of differentiable lower bound of expected BLEU score that does not involve computationally expensive sampling procedure such as the one required when using REINFORCE rule from reinforcement learning (RL) framework.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Reinforcement Learning in Robotics · Machine Learning in Healthcare
MethodsREINFORCE
