Partially-Supervised Novel Object Captioning Leveraging Context from Paired Data
Shashank Bujimalla, Mahesh Subedar, Omesh Tickoo

TL;DR
This paper introduces PS-NOC, a training approach for image captioning that effectively incorporates novel objects without caption labels, achieving state-of-the-art results on out-of-domain MS COCO data.
Contribution
The paper presents a novel training method for captioning models that leverages partially paired data and synthetic captions for novel objects, improving out-of-domain performance.
Findings
Achieved 85.9 F1-score on novel objects, surpassing baseline by 85.9 points.
Improved CIDEr score to 103.8, a 34.1 point increase over baseline.
Demonstrated effectiveness through extensive ablation studies.
Abstract
In this paper, we propose an approach to improve image captioning solution for images with novel objects that do not have caption labels in the training dataset. We refer to our approach as Partially-Supervised Novel Object Captioning (PS-NOC). PS-NOC is agnostic to model architecture, and primarily focuses on the training approach that uses existing fully paired image-caption data and the images with only the novel object detection labels (partially paired data). We create synthetic paired captioning data for novel objects by leveraging context from existing image-caption pairs. We then create pseudo-label captions for partially paired images with novel objects, and use this additional data to fine-tune the captioning model. We also propose a variant of SCST within PS-NOC, called SCST-F1, that directly optimizes the F1-score of novel objects. Using a popular captioning model (Up-Down)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Neural Network Applications
MethodsSelf-critical Sequence Training
