A request for clarity over the End of Sequence token in the Self-Critical Sequence Training
Jia Cheng Hu, Roberto Cavicchioli, Alessandro Capotondi

TL;DR
This paper highlights the critical issue of <Eos> token omission in Self-Critical Sequence Training for image captioning, which can artificially inflate performance metrics and hampers fair evaluation, proposing a solution with the SacreEOS library.
Contribution
It raises awareness about the <Eos> token omission problem and introduces SacreEOS to promote transparency and consistency in evaluation.
Findings
Omission of <Eos> can increase CIDEr-D scores by up to 4.1 points.
The lack of <Eos> awareness affects fair comparison of models.
SacreEOS helps standardize <Eos> handling in training and evaluation.
Abstract
The Image Captioning research field is currently compromised by the lack of transparency and awareness over the End-of-Sequence token (<Eos>) in the Self-Critical Sequence Training. If the <Eos> token is omitted, a model can boost its performance up to +4.1 CIDEr-D using trivial sentence fragments. While this phenomenon poses an obstacle to a fair evaluation and comparison of established works, people involved in new projects are given the arduous choice between lower scores and unsatisfactory descriptions due to the competitive nature of the research. This work proposes to solve the problem by spreading awareness of the issue itself. In particular, we invite future works to share a simple and informative signature with the help of a library called SacreEOS. Code available at \emph{\href{https://github.com/jchenghu/sacreeos}{https://github.com/jchenghu/sacreeos}}
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning
MethodsLib
