Image to Bengali Caption Generation Using Deep CNN and Bidirectional Gated Recurrent Unit
Al Momin Faruk, Hasan Al Faraby, Md. Muzahidul Azad, Md. Riduyan, Fedous, Md. Kishor Morol

TL;DR
This research introduces a CNN and Bidirectional GRU based model for generating Bengali image captions, addressing the language's underrepresentation and aiding communication and accessibility for Bengali speakers and visually impaired individuals.
Contribution
The paper presents a novel Bengali image captioning model using a pre-trained CNN and Bidirectional GRU, along with a new dataset BNATURE for training and evaluation.
Findings
Achieved BLEU-1 score of 42.6
Achieved BLEU-4 score of 23
Achieved Meteor score of 16.41
Abstract
There is very little notable research on generating descriptions of the Bengali language. About 243 million people speak in Bengali, and it is the 7th most spoken language on the planet. The purpose of this research is to propose a CNN and Bidirectional GRU based architecture model that generates natural language captions in the Bengali language from an image. Bengali people can use this research to break the language barrier and better understand each other's perspectives. It will also help many blind people with their everyday lives. This paper used an encoder-decoder approach to generate captions. We used a pre-trained Deep convolutional neural network (DCNN) called InceptonV3image embedding model as the encoder for analysis, classification, and annotation of the dataset's images Bidirectional Gated Recurrent unit (BGRU) layer as the decoder to generate captions. Argmax and Beam…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsGated Recurrent Unit
