Efficient Urdu Caption Generation using Attention based LSTM

Inaam Ilahi; Hafiz Muhammad Abdullah Zia; Muhammad Ahtazaz Ahsan; Rauf; Tabassam; Armaghan Ahmed

arXiv:2008.01663·cs.CL·June 22, 2021·1 cites

Efficient Urdu Caption Generation using Attention based LSTM

Inaam Ilahi, Hafiz Muhammad Abdullah Zia, Muhammad Ahtazaz Ahsan, Rauf, Tabassam, Armaghan Ahmed

PDF

Open Access 1 Repo

TL;DR

This paper presents an attention-based LSTM model for automatic Urdu image captioning, filling a language gap with a new dataset and achieving high BLEU scores, thus advancing Urdu language processing in vision-language tasks.

Contribution

It introduces the first Urdu caption generation model using attention-based deep learning and creates a new Urdu dataset based on Flickr8k images.

Findings

01

Achieved BLEU score of 0.83 on Urdu caption dataset

02

Improved caption quality with advanced CNN architectures

03

Demonstrated potential for grammar correction in generated captions

Abstract

Recent advancements in deep learning have created many opportunities to solve real-world problems that remained unsolved for more than a decade. Automatic caption generation is a major research field, and the research community has done a lot of work on it in most common languages like English. Urdu is the national language of Pakistan and also much spoken and understood in the sub-continent region of Pakistan-India, and yet no work has been done for Urdu language caption generation. Our research aims to fill this gap by developing an attention-based deep learning model using techniques of sequence modeling specialized for the Urdu language. We have prepared a dataset in the Urdu language by translating a subset of the "Flickr8k" dataset containing 700 'man' images. We evaluate our proposed technique on this dataset and show that it can achieve a BLEU score of 0.83 in the Urdu language.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

abdullahzia510/Urdu_Caption_Generation
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Video Analysis and Summarization · Human Pose and Action Recognition