Captioning Images with Diverse Objects

Subhashini Venugopalan; Lisa Anne Hendricks; Marcus Rohrbach; Raymond; Mooney; Trevor Darrell; Kate Saenko

arXiv:1606.07770·cs.CV·July 24, 2017

Captioning Images with Diverse Objects

Subhashini Venugopalan, Lisa Anne Hendricks, Marcus Rohrbach, Raymond, Mooney, Trevor Darrell, Kate Saenko

PDF

1 Repo 1 Video

TL;DR

This paper introduces NOC, a novel captioning model that leverages external datasets and semantic knowledge to describe many object categories unseen in traditional image-caption datasets, significantly improving diversity.

Contribution

The paper presents NOC, a deep visual semantic captioning model that generalizes to novel objects by integrating external recognition data and semantic embeddings.

Findings

01

NOC can generate captions for hundreds of unseen object categories.

02

The model outperforms prior work in describing diverse object categories.

03

Automatic and human evaluations confirm improved captioning diversity.

Abstract

Recent captioning models are limited in their ability to scale and describe concepts unseen in paired image-text corpora. We propose the Novel Object Captioner (NOC), a deep visual semantic captioning model that can describe a large number of object categories not present in existing image-caption datasets. Our model takes advantage of external sources -- labeled images from object recognition datasets, and semantic knowledge extracted from unannotated text. We propose minimizing a joint objective which can learn from these diverse data sources and leverage distributional semantic embeddings, enabling the model to generalize and describe novel objects outside of image-caption datasets. We demonstrate that our model exploits semantic information to generate captions for hundreds of object categories in the ImageNet object recognition dataset that are not observed in MSCOCO image-caption…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

willT97/Zero-shot-Image-Captioner
pytorch

Videos

Captioning Images With Diverse Objects· youtube