Auto-Encoding Scene Graphs for Image Captioning

Xu Yang; Kaihua Tang; Hanwang Zhang; Jianfei Cai

arXiv:1812.02378·cs.CV·December 12, 2018·26 cites

Auto-Encoding Scene Graphs for Image Captioning

Xu Yang, Kaihua Tang, Hanwang Zhang, Jianfei Cai

PDF

Open Access 2 Repos

TL;DR

This paper introduces Scene Graph Auto-Encoder (SGAE), a novel approach that incorporates language inductive bias via scene graphs and shared dictionaries to improve image captioning, achieving state-of-the-art results on MS-COCO.

Contribution

The paper presents a new SGAE framework that transfers language priors across vision and language domains using scene graphs and shared dictionaries, enhancing captioning performance.

Findings

01

Achieved 127.8 CIDEr-D on MS-COCO, surpassing previous models.

02

The shared dictionary effectively transfers language bias across domains.

03

Single-model SGAE outperforms ensemble models on benchmark.

Abstract

We propose Scene Graph Auto-Encoder (SGAE) that incorporates the language inductive bias into the encoder-decoder image captioning framework for more human-like captions. Intuitively, we humans use the inductive bias to compose collocations and contextual inference in discourse. For example, when we see the relation `person on bike', it is natural to replace `on' with `ride' and infer `person riding bike on a road' even the `road' is not evident. Therefore, exploiting such bias as a language prior is expected to help the conventional encoder-decoder models less likely overfit to the dataset bias and focus on reasoning. Specifically, we use the scene graph --- a directed graph ( $G$ ) where an object node is connected by adjective nodes and relationship nodes --- to represent the complex structural layout of both image ( $I$ ) and sentence ( $S$ ). In the textual…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Domain Adaptation and Few-Shot Learning