Masked Non-Autoregressive Image Captioning

Junlong Gao; Xi Meng; Shiqi Wang; Xia Li; Shanshe Wang; Siwei Ma; Wen; Gao

arXiv:1906.00717·cs.CV·June 4, 2019·25 cites

Masked Non-Autoregressive Image Captioning

Junlong Gao, Xi Meng, Shiqi Wang, Xia Li, Shanshe Wang, Siwei Ma, Wen, Gao

PDF

Open Access

TL;DR

This paper introduces masked non-autoregressive decoding for image captioning, enabling parallel caption generation that improves diversity and semantic preservation over traditional autoregressive methods.

Contribution

It proposes a novel masked non-autoregressive decoding approach that addresses issues of slow generation and lack of diversity in captioning models.

Findings

01

More diverse caption generation

02

Better semantic content preservation

03

Faster inference compared to autoregressive models

Abstract

Existing captioning models often adopt the encoder-decoder architecture, where the decoder uses autoregressive decoding to generate captions, such that each token is generated sequentially given the preceding generated tokens. However, autoregressive decoding results in issues such as sequential error accumulation, slow generation, improper semantics and lack of diversity. Non-autoregressive decoding has been proposed to tackle slow generation for neural machine translation but suffers from multimodality problem due to the indirect modeling of the target distribution. In this paper, we propose masked non-autoregressive decoding to tackle the issues of both autoregressive decoding and non-autoregressive decoding. In masked non-autoregressive decoding, we mask several kinds of ratios of the input sequences during training, and generate captions parallelly in several stages from a totally…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Domain Adaptation and Few-Shot Learning