Unpaired Image Captioning by Image-level Weakly-Supervised Visual Concept Recognition
Peipei Zhu, Xiao Wang, Yong Luo, Zhenglong Sun, Wei-Shi Zheng, Yaowei, Wang, and Changwen Chen

TL;DR
This paper introduces a cost-effective weakly-supervised approach for unpaired image captioning that leverages image-level labels to recognize visual concepts and improve caption quality without expensive annotations.
Contribution
It proposes a novel weakly-supervised method for visual concept recognition in UIC using only image-level labels, reducing annotation costs and enhancing captioning performance.
Findings
Achieves comparable or better results than previous methods on COCO dataset.
Effectively alleviates issues with generating sentences containing nonexistent objects.
Reduces labeling costs significantly while maintaining high captioning quality.
Abstract
The goal of unpaired image captioning (UIC) is to describe images without using image-caption pairs in the training phase. Although challenging, we except the task can be accomplished by leveraging a training set of images aligned with visual concepts. Most existing studies use off-the-shelf algorithms to obtain the visual concepts because the Bounding Box (BBox) labels or relationship-triplet labels used for the training are expensive to acquire. In order to resolve the problem in expensive annotations, we propose a novel approach to achieve cost-effective UIC. Specifically, we adopt image-level labels for the optimization of the UIC model in a weakly-supervised manner. For each image, we assume that only the image-level labels are available without specific locations and numbers. The image-level labels are utilized to train a weakly-supervised object recognition model to extract…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques
MethodsGraph Neural Network
