Compositional Generalization in Image Captioning

Mitja Nikolaus; Mostafa Abdou; Matthew Lamm; Rahul Aralikatte and; Desmond Elliott

arXiv:1909.04402·cs.LG·November 12, 2019

Compositional Generalization in Image Captioning

Mitja Nikolaus, Mostafa Abdou, Matthew Lamm, Rahul Aralikatte and, Desmond Elliott

PDF

1 Repo

TL;DR

This paper investigates the ability of image captioning models to generalize to unseen concept combinations, revealing poor performance of current models and proposing a multi-task approach that significantly improves compositional generalization.

Contribution

The paper introduces a multi-task model combining captioning and ranking with a re-ranking decoding mechanism to enhance compositional generalization in image captioning.

Findings

01

The proposed model outperforms state-of-the-art models on unseen concept combinations.

02

Current models show poor generalization to novel concept compositions.

03

Multi-task training improves the ability to generalize in image captioning.

Abstract

Image captioning models are usually evaluated on their ability to describe a held-out set of images, not on their ability to generalize to unseen concepts. We study the problem of compositional generalization, which measures how well a model composes unseen combinations of concepts when describing images. State-of-the-art image captioning models show poor generalization performance on this task. We propose a multi-task model to address the poor performance, that combines caption generation and image--sentence ranking, and uses a decoding mechanism that re-ranks the captions according their similarity to the image. This model is substantially better at generalizing to unseen combinations of concepts compared to state-of-the-art captioning models.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mitjanikolaus/compositional-image-captioning
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.