Context-Free TextSpotter for Real-Time and Mobile End-to-End Text   Detection and Recognition

Ryota Yoshihashi; Tomohiro Tanaka; Kenji Doi; Takumi Fujino; and; Naoaki Yamashita

arXiv:2106.05611·cs.CV·June 11, 2021

Context-Free TextSpotter for Real-Time and Mobile End-to-End Text Detection and Recognition

Ryota Yoshihashi, Tomohiro Tanaka, Kenji Doi, Takumi Fujino, and, Naoaki Yamashita

PDF

Open Access

TL;DR

This paper introduces Context-Free TextSpotter, a lightweight, real-time end-to-end text detection and recognition model suitable for mobile devices, achieving competitive accuracy with minimal computation.

Contribution

It presents a novel, simple convolution-based E2E text spotting method that is significantly smaller and faster than existing models, enabling mobile deployment.

Findings

01

Achieves real-time text spotting on GPU with only three million parameters.

02

Runs efficiently on smartphones with acceptable latency.

03

Maintains competitive transcription quality despite simplified architecture.

Abstract

In the deployment of scene-text spotting systems on mobile platforms, lightweight models with low computation are preferable. In concept, end-to-end (E2E) text spotting is suitable for such purposes because it performs text detection and recognition in a single model. However, current state-of-the-art E2E methods rely on heavy feature extractors, recurrent sequence modellings, and complex shape aligners to pursue accuracy, which means their computations are still heavy. We explore the opposite direction: How far can we go without bells and whistles in E2E text spotting? To this end, we propose a text-spotting method that consists of simple convolutions and a few post-processes, named Context-Free TextSpotter. Experiments using standard benchmarks show that Context-Free TextSpotter achieves real-time text spotting on a GPU with only three million parameters, which is the smallest and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques · Natural Language Processing Techniques · Advanced Image and Video Retrieval Techniques