Length-Controllable Image Captioning

Chaorui Deng; Ning Ding; Mingkui Tan; Qi Wu

arXiv:2007.09580·cs.CV·July 21, 2020

Length-Controllable Image Captioning

Chaorui Deng, Ning Ding, Mingkui Tan, Qi Wu

PDF

1 Repo

TL;DR

This paper introduces a length level embedding for image captioning, enabling controllable caption length, and proposes a non-autoregressive model that improves efficiency and diversity, achieving state-of-the-art results on MS COCO.

Contribution

It presents a simple length level embedding method for controllable image captioning and a non-autoregressive model that enhances efficiency and diversity.

Findings

01

Achieves state-of-the-art performance on MS COCO

02

Generates controllable and diverse captions

03

Significantly improves decoding efficiency for long captions

Abstract

The last decade has witnessed remarkable progress in the image captioning task; however, most existing methods cannot control their captions, \emph{e.g.}, choosing to describe the image either roughly or in detail. In this paper, we propose to use a simple length level embedding to endow them with this ability. Moreover, due to their autoregressive nature, the computational complexity of existing models increases linearly as the length of the generated captions grows. Thus, we further devise a non-autoregressive image captioning approach that can generate captions in a length-irrelevant complexity. We verify the merit of the proposed length level embedding on three models: two state-of-the-art (SOTA) autoregressive models with different types of decoder, as well as our proposed non-autoregressive model, to show its generalization ability. In the experiments, our length-controllable…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

bearcatt/LaBERT
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.