Bornon: Bengali Image Captioning with Transformer-based Deep learning approach
Faisal Muhammad Shah, Mayeesha Humaira, Md Abidur Rahman Khan Jim,, Amit Saha Ami, Shimul Paul

TL;DR
This paper introduces Bornon, a transformer-based model for Bengali image captioning, demonstrating its effectiveness and comparing it with other approaches on multiple datasets.
Contribution
First to generate Bengali image captions using a transformer model and compare its performance with attention-based and other models.
Findings
Transformer model outperforms traditional encoder-decoder approaches.
Bengali captions generated with competitive accuracy.
Performance varies across different datasets.
Abstract
Image captioning using Encoder-Decoder based approach where CNN is used as the Encoder and sequence generator like RNN as Decoder has proven to be very effective. However, this method has a drawback that is sequence needs to be processed in order. To overcome this drawback some researcher has utilized the Transformer model to generate captions from images using English datasets. However, none of them generated captions in Bengali using the transformer model. As a result, we utilized three different Bengali datasets to generate Bengali captions from images using the Transformer model. Additionally, we compared the performance of the transformer-based model with a visual attention-based Encoder-Decoder approach. Finally, we compared the result of the transformer-based model with other models that employed different Bengali image captioning datasets.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Dropout · Dense Connections · Label Smoothing · Residual Connection · Adam
