An Image captioning algorithm based on the Hybrid Deep Learning   Technique (CNN+GRU)

Rana Adnan Ahmad; Muhammad Azhar; Hina Sattar

arXiv:2301.02440·cs.CV·January 9, 2023

An Image captioning algorithm based on the Hybrid Deep Learning Technique (CNN+GRU)

Rana Adnan Ahmad, Muhammad Azhar, Hina Sattar

PDF

Open Access

TL;DR

This paper introduces a CNN-GRU based image captioning model that improves semantic understanding, reduces time complexity, and outperforms existing LSTM-based models in accuracy.

Contribution

The paper proposes a novel CNN-GRU encoder-decoder framework that enhances semantic comprehension and efficiency in image captioning tasks.

Findings

01

Outperforms LSTM-A5 in accuracy

02

Reduces time complexity compared to previous models

03

Enhances semantic understanding in caption generation

Abstract

Image captioning by the encoder-decoder framework has shown tremendous advancement in the last decade where CNN is mainly used as encoder and LSTM is used as a decoder. Despite such an impressive achievement in terms of accuracy in simple images, it lacks in terms of time complexity and space complexity efficiency. In addition to this, in case of complex images with a lot of information and objects, the performance of this CNN-LSTM pair downgraded exponentially due to the lack of semantic understanding of the scenes presented in the images. Thus, to take these issues into consideration, we present CNN-GRU encoder decode framework for caption-to-image reconstructor to handle the semantic context into consideration as well as the time complexity. By taking the hidden states of the decoder into consideration, the input image and its similar semantic representations is reconstructed and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Video Analysis and Summarization

MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory