The Solution for the CVPR2024 NICE Image Captioning Challenge
Longfei Huang, Shupeng Zhong, Xiangyu Wu, Ruoxuan Li

TL;DR
This paper presents a novel zero-shot image captioning solution for CVPR2024 NICE challenge, combining retrieval augmentation and caption grading with a large pre-trained model to generate high-quality, semantically rich captions.
Contribution
It introduces a retrieval-augmented, caption-level strategy integrated with a large-scale pre-trained model for improved zero-shot image captioning.
Findings
Achieved CIDEr score of 234.11
Enhanced caption quality through retrieval augmentation
Effectively addressed style and content gaps in captions
Abstract
This report introduces a solution to the Topic 1 Zero-shot Image Captioning of 2024 NICE : New frontiers for zero-shot Image Captioning Evaluation. In contrast to NICE 2023 datasets, this challenge involves new annotations by humans with significant differences in caption style and content. Therefore, we enhance image captions effectively through retrieval augmentation and caption grading methods. At the data level, we utilize high-quality captions generated by image caption models as training data to address the gap in text styles. At the model level, we employ OFA (a large-scale visual-language pre-training model based on handcrafted templates) to perform the image captioning task. Subsequently, we propose caption-level strategy for the high-quality caption data generated by the image caption models and integrate them with retrieval augmentation strategy into the template to compel…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Lung Cancer Diagnosis and Treatment · COVID-19 diagnosis using AI
MethodsNormalizing Flows · Affine Coupling · Non-linear Independent Component Estimation · OFA
