Transfer Reward Learning for Policy Gradient-Based Text Generation
James O' Neill, Danushka Bollegala

TL;DR
This paper introduces a transfer learning approach for reward models in policy gradient-based text generation, improving semantic evaluation metrics in image captioning tasks.
Contribution
It proposes a transferable reward learner that enhances policy gradient models by using model-based rewards from sentence similarity tasks, outperforming n-gram overlap measures.
Findings
Improved semantic similarity scores on MSCOCO dataset.
Enhanced performance on Flickr-30k dataset.
Demonstrated general applicability of transfer learning in reward models.
Abstract
Task-specific scores are often used to optimize for and evaluate the performance of conditional text generation systems. However, such scores are non-differentiable and cannot be used in the standard supervised learning paradigm. Hence, policy gradient methods are used since the gradient can be computed without requiring a differentiable objective. However, we argue that current n-gram overlap based measures that are used as rewards can be improved by using model-based rewards transferred from tasks that directly compare the similarity of sentence pairs. These reward models either output a score of sentence-level syntactic and semantic similarity between entire predicted and target sentences as the expected return, or for intermediate phrases as segmented accumulative rewards. We demonstrate that using a \textit{Transferable Reward Learner} leads to improved results on semantical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Natural Language Processing Techniques
