# Transcribing Content from Structural Images with Spotlight Mechanism

**Authors:** Yu Yin, Zhenya Huang, Enhong Chen, Qi Liu, Fuzheng Zhang, Xing Xie and, Guoping Hu

arXiv: 1905.10954 · 2019-05-28

## TL;DR

This paper introduces a hierarchical Spotlight Transcribing Network (STN) framework with a novel spotlight mechanism for transcribing complex structural images, effectively capturing internal structures and content.

## Contribution

The paper proposes a new hierarchical framework with a spotlight mechanism and two implementations, enhancing recognition of complex structured images beyond existing methods.

## Key findings

- Effective in transcribing complex structural images
- Outperforms existing recognition methods on structural datasets
- Self-improving reinforcement method enhances accuracy

## Abstract

Transcribing content from structural images, e.g., writing notes from music scores, is a challenging task as not only the content objects should be recognized, but the internal structure should also be preserved. Existing image recognition methods mainly work on images with simple content (e.g., text lines with characters), but are not capable to identify ones with more complex content (e.g., structured symbols), which often follow a fine-grained grammar. To this end, in this paper, we propose a hierarchical Spotlight Transcribing Network (STN) framework followed by a two-stage "where-to-what" solution. Specifically, we first decide "where-to-look" through a novel spotlight mechanism to focus on different areas of the original image following its structure. Then, we decide "what-to-write" by developing a GRU based network with the spotlight areas for transcribing the content accordingly. Moreover, we propose two implementations on the basis of STN, i.e., STNM and STNR, where the spotlight movement follows the Markov property and Recurrent modeling, respectively. We also design a reinforcement method to refine the framework by self-improving the spotlight mechanism. We conduct extensive experiments on many structural image datasets, where the results clearly demonstrate the effectiveness of STN framework.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1905.10954/full.md

## Figures

16 figures with captions in the complete paper: https://tomesphere.com/paper/1905.10954/full.md

## References

41 references — full list in the complete paper: https://tomesphere.com/paper/1905.10954/full.md

---
Source: https://tomesphere.com/paper/1905.10954