Comparative evaluation of CNN architectures for Image Caption Generation

Sulabh Katiyar; Samir Kumar Borgohain

arXiv:2102.11506·cs.CV·February 24, 2021

Comparative evaluation of CNN architectures for Image Caption Generation

Sulabh Katiyar, Samir Kumar Borgohain

PDF

1 Repo

TL;DR

This paper systematically compares 17 CNN architectures to determine their effectiveness in image caption generation, revealing that higher complexity and object recognition accuracy do not always lead to better captioning performance.

Contribution

It provides a comprehensive evaluation of various CNN architectures for image captioning, filling a gap in understanding their relative efficacy.

Findings

01

Model complexity does not correlate with captioning performance.

02

Object recognition accuracy is not a reliable predictor of captioning quality.

03

Certain CNN architectures outperform others regardless of size or recognition accuracy.

Abstract

Aided by recent advances in Deep Learning, Image Caption Generation has seen tremendous progress over the last few years. Most methods use transfer learning to extract visual information, in the form of image features, with the help of pre-trained Convolutional Neural Network models followed by transformation of the visual information using a Caption Generator module to generate the output sentences. Different methods have used different Convolutional Neural Network Architectures and, to the best of our knowledge, there is no systematic study which compares the relative efficacy of different Convolutional Neural Network architectures for extracting the visual information. In this work, we have evaluated 17 different Convolutional Neural Networks on two popular Image Caption Generation frameworks: the first based on Neural Image Caption (NIC) generation model and the second based on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

iamsulabh/cnn_variants
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.