SAFL: A Self-Attention Scene Text Recognizer with Focal Loss

Bao Hieu Tran; Thanh Le-Cong; Huu Manh Nguyen; Duc Anh Le; Thanh Hung; Nguyen; Phi Le Nguyen

arXiv:2201.00132·cs.CV·January 4, 2022

SAFL: A Self-Attention Scene Text Recognizer with Focal Loss

Bao Hieu Tran, Thanh Le-Cong, Huu Manh Nguyen, Duc Anh Le, Thanh Hung, Nguyen, Phi Le Nguyen

PDF

1 Repo

TL;DR

SAFL introduces a self-attention neural network with focal loss and spatial transformer networks to improve scene text recognition, effectively handling distortions and irregular layouts with superior performance.

Contribution

The paper presents a novel self-attention based scene text recognizer with focal loss and spatial transformer networks, outperforming existing methods.

Findings

01

Achieves state-of-the-art performance on seven benchmarks.

02

Effectively handles distortions and irregular text layouts.

03

Focal loss improves training on low-frequency samples.

Abstract

In the last decades, scene text recognition has gained worldwide attention from both the academic community and actual users due to its importance in a wide range of applications. Despite achievements in optical character recognition, scene text recognition remains challenging due to inherent problems such as distortions or irregular layout. Most of the existing approaches mainly leverage recurrence or convolution-based neural networks. However, while recurrent neural networks (RNNs) usually suffer from slow training speed due to sequential computation and encounter problems as vanishing gradient or bottleneck, CNN endures a trade-off between complexity and performance. In this paper, we introduce SAFL, a self-attention-based neural network model with the focal loss for scene text recognition, to overcome the limitation of the existing approaches. The use of focal loss instead of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ICMLA-SAFL/SAFL_pytorch
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Focal Loss