# A Multitask Network for Localization and Recognition of Text in Images

**Authors:** Mohammad Reza Sarshogh, Keegan E. Hines

arXiv: 1906.09266 · 2019-06-25

## TL;DR

This paper introduces an end-to-end multi-task neural network that simultaneously localizes and recognizes text in images without post-processing, utilizing a shared convolutional backbone, dynamic pooling, and attention-based recognition to improve accuracy.

## Contribution

The paper presents a novel multi-task network architecture that integrates text localization and recognition in a single trainable model with innovative pooling and attention mechanisms.

## Key findings

- Achieves high accuracy on benchmark datasets.
- Outperforms traditional OCR methods in complex scenarios.
- Eliminates need for post-processing steps.

## Abstract

We present an end-to-end trainable multi-task network that addresses the problem of lexicon-free text extraction from complex documents. This network simultaneously solves the problems of text localization and text recognition and text segments are identified with no post-processing, cropping, or word grouping. A convolutional backbone and Feature Pyramid Network are combined to provide a shared representation that benefits each of three model heads: text localization, classification, and text recognition. To improve recognition accuracy, we describe a dynamic pooling mechanism that retains high-resolution information across all RoIs. For text recognition, we propose a convolutional mechanism with attention which out-performs more common recurrent architectures. Our model is evaluated against benchmark datasets and comparable methods and achieves high performance in challenging regimes of non-traditional OCR.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1906.09266/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/1906.09266/full.md

## References

28 references — full list in the complete paper: https://tomesphere.com/paper/1906.09266/full.md

---
Source: https://tomesphere.com/paper/1906.09266