An Image captioning algorithm based on the Hybrid Deep Learning Technique (CNN+GRU)
Rana Adnan Ahmad, Muhammad Azhar, Hina Sattar

TL;DR
This paper introduces a CNN-GRU based image captioning model that improves semantic understanding, reduces time complexity, and outperforms existing LSTM-based models in accuracy.
Contribution
The paper proposes a novel CNN-GRU encoder-decoder framework that enhances semantic comprehension and efficiency in image captioning tasks.
Findings
Outperforms LSTM-A5 in accuracy
Reduces time complexity compared to previous models
Enhances semantic understanding in caption generation
Abstract
Image captioning by the encoder-decoder framework has shown tremendous advancement in the last decade where CNN is mainly used as encoder and LSTM is used as a decoder. Despite such an impressive achievement in terms of accuracy in simple images, it lacks in terms of time complexity and space complexity efficiency. In addition to this, in case of complex images with a lot of information and objects, the performance of this CNN-LSTM pair downgraded exponentially due to the lack of semantic understanding of the scenes presented in the images. Thus, to take these issues into consideration, we present CNN-GRU encoder decode framework for caption-to-image reconstructor to handle the semantic context into consideration as well as the time complexity. By taking the hidden states of the decoder into consideration, the input image and its similar semantic representations is reconstructed and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Video Analysis and Summarization
MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory
