Decoupled Novel Object Captioner

Yu Wu; Linchao Zhu; Lu Jiang; Yi Yang

arXiv:1804.03803·cs.CV·August 14, 2018

Decoupled Novel Object Captioner

Yu Wu, Linchao Zhu, Lu Jiang, Yi Yang

PDF

1 Repo

TL;DR

This paper introduces a zero-shot image captioning method that effectively describes novel objects without additional annotations by decoupling language generation from object recognition.

Contribution

It proposes the Decoupled Novel Object Captioner (DNOC) framework, which separates language modeling from object descriptions using placeholders and a key-value object memory.

Findings

01

Successfully describes novel objects in zero-shot settings

02

Outperforms baseline methods on MSCOCO dataset

03

Demonstrates effective decoupling of language and object recognition

Abstract

Image captioning is a challenging task where the machine automatically describes an image by sentences or phrases. It often requires a large number of paired image-sentence annotations for training. However, a pre-trained captioning model can hardly be applied to a new domain in which some novel object categories exist, i.e., the objects and their description words are unseen during model training. To correctly caption the novel object, it requires professional human workers to annotate the images by sentences with the novel words. It is labor expensive and thus limits its usage in real-world applications. In this paper, we introduce the zero-shot novel object captioning task where the machine generates descriptions without extra sentences about the novel object. To tackle the challenging problem, we propose a Decoupled Novel Object Captioner (DNOC) framework that can fully decouple…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Pranav21091996/Semantic_Fidelity-and-Egoshots
tf

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.