On Distinctive Image Captioning via Comparing and Reweighting
Jiuniu Wang, Wenjia Xu, Qingzhong Wang, Antoni B. Chan

TL;DR
This paper introduces a new metric and training strategy to enhance the distinctiveness of image captions by comparing them with similar images, leading to more unique and informative descriptions without sacrificing accuracy.
Contribution
It proposes CIDErBtw, a distinctiveness metric, and a reweighting training method that emphasizes rare words and uses negative sampling to generate more distinctive image captions.
Findings
Significantly improves caption distinctiveness as measured by CIDErBtw and retrieval metrics.
Enhances caption accuracy while increasing uniqueness.
Validated through extensive experiments and user studies.
Abstract
Recent image captioning models are achieving impressive results based on popular metrics, i.e., BLEU, CIDEr, and SPICE. However, focusing on the most popular metrics that only consider the overlap between the generated captions and human annotation could result in using common words and phrases, which lacks distinctiveness, i.e., many similar images have the same caption. In this paper, we aim to improve the distinctiveness of image captions via comparing and reweighting with a set of similar images. First, we propose a distinctiveness metric -- between-set CIDEr (CIDErBtw) to evaluate the distinctiveness of a caption with respect to those of similar images. Our metric reveals that the human annotations of each image in the MSCOCO dataset are not equivalent based on distinctiveness; however, previous works normally treat the human annotations equally during training, which could be a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
