Chittron: An Automatic Bangla Image Captioning System
Motiur Rahman, Nabeel Mohammed, Nafees Mansoor, Sifat Momen

TL;DR
This paper presents Chittron, an automatic Bangla image captioning system developed using a new dataset of 16,000 images, combining a pre-trained VGG16 with LSTM layers to generate captions in Bangla.
Contribution
The paper introduces the first Bangla image captioning system and a new dataset, addressing language resource gaps and demonstrating promising results.
Findings
Model generates Bangla captions with reasonable accuracy.
BLEU scores indicate effective language modeling.
Dataset of 16,000 images supports future research.
Abstract
Automatic image caption generation aims to produce an accurate description of an image in natural language automatically. However, Bangla, the fifth most widely spoken language in the world, is lagging considerably in the research and development of such domain. Besides, while there are many established data sets to related to image annotation in English, no such resource exists for Bangla yet. Hence, this paper outlines the development of "Chittron", an automatic image captioning system in Bangla. Moreover, to address the data set availability issue, a collection of 16,000 Bangladeshi contextual images has been accumulated and manually annotated in Bangla. This data set is then used to train a model which integrates a pre-trained VGG16 image embedding model with stacked LSTM layers. The model is trained to predict the caption when the input is an image, one word at a time. The results…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory
