Script Identification in Natural Scene Image and Video Frame using   Attention based Convolutional-LSTM Network

Ankan Kumar Bhunia; Aishik Konwer; Ayan Kumar Bhunia; Abir Bhowmick,; Partha P. Roy; Umapada Pal

arXiv:1801.00470·cs.CV·August 8, 2018

Script Identification in Natural Scene Image and Video Frame using Attention based Convolutional-LSTM Network

Ankan Kumar Bhunia, Aishik Konwer, Ayan Kumar Bhunia, Abir Bhowmick,, Partha P. Roy, Umapada Pal

PDF

1 Repo

TL;DR

This paper introduces an attention-based CNN-LSTM framework for script identification in scene images and videos, effectively handling low quality and complex backgrounds to improve recognition accuracy.

Contribution

The novel approach combines local and global feature extraction with dynamic weighting using attention mechanisms within a CNN-LSTM framework for script identification.

Findings

01

Achieved superior accuracy on four public datasets.

02

Effectively handled low-quality images and complex backgrounds.

03

Demonstrated improvement over conventional methods.

Abstract

Script identification plays a significant role in analysing documents and videos. In this paper, we focus on the problem of script identification in scene text images and video scripts. Because of low image quality, complex background and similar layout of characters shared by some scripts like Greek, Latin, etc., text recognition in those cases become challenging. In this paper, we propose a novel method that involves extraction of local and global features using CNN-LSTM framework and weighting them dynamically for script identification. First, we convert the images into patches and feed them into a CNN-LSTM framework. Attention-based patch weights are calculated applying softmax layer after LSTM. Next, we do patch-wise multiplication of these weights with corresponding CNN to yield local features. Global features are also extracted from last cell state of LSTM. We employ a fusion…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ankanbhunia/AttenScriptNetPR
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory