Object-Centric Unsupervised Image Captioning
Zihang Meng, David Yang, Xuefei Cao, Ashish Shah, Ser-Nam Lim

TL;DR
This paper introduces an object-centric approach to unsupervised image captioning that leverages object harvesting and transformer models, significantly outperforming existing methods and extending to non-English datasets.
Contribution
The work proposes a novel object harvesting technique for unsupervised image captioning, improving object coverage and caption quality without requiring paired annotations.
Findings
Outperforms current state-of-the-art unsupervised methods
Object harvesting enhances object coverage in captions
Method extends effectively to non-English image captioning
Abstract
Image captioning is a longstanding problem in the field of computer vision and natural language processing. To date, researchers have produced impressive state-of-the-art performance in the age of deep learning. Most of these state-of-the-art, however, requires large volume of annotated image-caption pairs in order to train their models. When given an image dataset of interests, practitioner needs to annotate the caption for each image in the training set and this process needs to happen for each newly collected image dataset. In this paper, we explore the task of unsupervised image captioning which utilizes unpaired images and texts to train the model so that the texts can come from different sources than the images. A main school of research on this topic that has been shown to be effective is to construct pairs from the images and texts in the training set according to their overlap…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques
