# Visual attention models for scene text recognition

**Authors:** Suman K.Ghosh, Ernest Valveny, Andrew D. Bagdanov

arXiv: 1706.01487 · 2017-06-07

## TL;DR

This paper introduces a novel LSTM-based visual attention model for scene text recognition that operates without lexicons, leveraging spatial features and an integrated language model to achieve state-of-the-art results.

## Contribution

The paper presents a new end-to-end trainable attention-based framework for lexicon-free scene text recognition, combining convolutional features with a learned attention mechanism.

## Key findings

- Achieves state-of-the-art accuracy on SVT and ICDAR'03 datasets.
- Effectively integrates language models into the recognition process.
- Demonstrates robustness in unconstrained text recognition scenarios.

## Abstract

In this paper we propose an approach to lexicon-free recognition of text in scene images. Our approach relies on a LSTM-based soft visual attention model learned from convolutional features. A set of feature vectors are derived from an intermediate convolutional layer corresponding to different areas of the image. This permits encoding of spatial information into the image representation. In this way, the framework is able to learn how to selectively focus on different parts of the image. At every time step the recognizer emits one character using a weighted combination of the convolutional feature vectors according to the learned attention model. Training can be done end-to-end using only word level annotations. In addition, we show that modifying the beam search algorithm by integrating an explicit language model leads to significantly better recognition results. We validate the performance of our approach on standard SVT and ICDAR'03 scene text datasets, showing state-of-the-art performance in unconstrained text recognition.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1706.01487/full.md

## Figures

7 figures with captions in the complete paper: https://tomesphere.com/paper/1706.01487/full.md

## References

25 references — full list in the complete paper: https://tomesphere.com/paper/1706.01487/full.md

---
Source: https://tomesphere.com/paper/1706.01487