CNN+CNN: Convolutional Decoders for Image Captioning

Qingzhong Wang; Antoni B. Chan

arXiv:1805.09019·cs.CV·May 24, 2018·70 cites

CNN+CNN: Convolutional Decoders for Image Captioning

Qingzhong Wang, Antoni B. Chan

PDF

Open Access 1 Repo

TL;DR

This paper introduces a convolutional neural network framework for image captioning, achieving faster training and comparable or better performance than traditional RNN/LSTM-based models by leveraging parallel computation.

Contribution

The authors propose a CNN-only model for image captioning that outperforms LSTM-based models in training speed and achieves competitive or superior captioning metrics.

Findings

01

CNN-based model trains 3 times faster than LSTM-based NIC.

02

The model achieves comparable BLEU and METEOR scores, higher CIDEr scores.

03

Outperforms hierarchical LSTMs on paragraph annotation tasks.

Abstract

Image captioning is a challenging task that combines the field of computer vision and natural language processing. A variety of approaches have been proposed to achieve the goal of automatically describing an image, and recurrent neural network (RNN) or long-short term memory (LSTM) based models dominate this field. However, RNNs or LSTMs cannot be calculated in parallel and ignore the underlying hierarchical structure of a sentence. In this paper, we propose a framework that only employs convolutional neural networks (CNNs) to generate captions. Owing to parallel computing, our basic model is around 3 times faster than NIC (an LSTM-based model) during training time, while also providing better results. We conduct extensive experiments on MSCOCO and investigate the influence of the model width and depth. Compared with LSTM-based models that apply similar attention mechanisms, our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

qingzwang/GHA-ImageCaptioning
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Advanced Image and Video Retrieval Techniques