Are scene graphs good enough to improve Image Captioning?

Victor Milewski; Marie-Francine Moens; Iacer Calixto

arXiv:2009.12313·cs.CV·October 28, 2020

Are scene graphs good enough to improve Image Captioning?

Victor Milewski, Marie-Francine Moens, Iacer Calixto

PDF

1 Repo

TL;DR

This paper investigates the effectiveness of scene graphs in image captioning, finding that current noisy scene graph models do not significantly improve caption quality, but high-quality scene graphs can offer notable gains.

Contribution

The study introduces a conditional graph attention network for scene graph integration and provides a comprehensive empirical analysis of scene graph utility in captioning.

Findings

01

No significant improvement with current scene graph models

02

High-quality scene graphs can improve captioning metrics

03

Scene graph noise impacts caption quality

Abstract

Many top-performing image captioning models rely solely on object features computed with an object detection model to generate image descriptions. However, recent studies propose to directly use scene graphs to introduce information about object relations into captioning, hoping to better describe interactions between objects. In this work, we thoroughly investigate the use of scene graphs in image captioning. We empirically study whether using additional scene graph encoders can lead to better image descriptions and propose a conditional graph attention network (C-GAT), where the image captioning decoder state is used to condition the graph updates. Finally, we determine to what extent noise in the predicted scene graphs influence caption quality. Overall, we find no significant difference between models that use scene graph features and models that only use object detection features…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

iacercalixto/butd-image-captioning
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.