Unpaired Image Captioning via Scene Graph Alignments

Jiuxiang Gu; Shafiq Joty; Jianfei Cai; Handong Zhao; Xu Yang; Gang; Wang

arXiv:1903.10658·cs.CV·August 20, 2019·26 cites

Unpaired Image Captioning via Scene Graph Alignments

Jiuxiang Gu, Shafiq Joty, Jianfei Cai, Handong Zhao, Xu Yang, Gang, Wang

PDF

Open Access

TL;DR

This paper introduces an unsupervised scene graph alignment method for image captioning that does not require paired datasets, achieving promising results and outperforming existing methods.

Contribution

The paper presents a novel unpaired image captioning framework using scene graph alignment, eliminating the need for large-scale paired datasets.

Findings

01

Outperforms existing unpaired captioning methods

02

Generates promising captions without paired training data

03

Uses unsupervised feature alignment for scene graphs

Abstract

Most of current image captioning models heavily rely on paired image-caption datasets. However, getting large scale image-caption paired data is labor-intensive and time-consuming. In this paper, we present a scene graph-based approach for unpaired image captioning. Our framework comprises an image scene graph generator, a sentence scene graph generator, a scene graph encoder, and a sentence decoder. Specifically, we first train the scene graph encoder and the sentence decoder on the text modality. To align the scene graphs between images and sentences, we propose an unsupervised feature alignment method that maps the scene graph features from the image to the sentence modality. Experimental results show that our proposed model can generate quite promising results without using any image-caption training pairs, outperforming existing methods by a wide margin.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques