A Better Variant of Self-Critical Sequence Training
Ruotian Luo

TL;DR
This paper introduces an improved variant of Self-Critical Sequence Training that enhances performance by modifying the baseline function in the REINFORCE algorithm without additional computational costs.
Contribution
It proposes a simple change in the baseline function of REINFORCE, leading to better sequence training performance.
Findings
Improved sequence training results over standard methods.
No extra computational cost required.
Enhanced performance with a simple baseline modification.
Abstract
In this work, we present a simple yet better variant of Self-Critical Sequence Training. We make a simple change in the choice of baseline function in REINFORCE algorithm. The new baseline can bring better performance with no extra cost, compared to the greedy decoding baseline.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications · Machine Learning and Algorithms
MethodsREINFORCE
