# A Single-Shot Arbitrarily-Shaped Text Detector based on Context Attended   Multi-Task Learning

**Authors:** Pengfei Wang, Chengquan Zhang, Fei Qi, Zuming Huang, Mengyi En, Junyu, Han, Jingtuo Liu, Errui Ding, Guangming Shi

arXiv: 1908.05498 · 2019-08-16

## TL;DR

This paper introduces SAST, a novel segmentation-based text detector that uses context attended multi-task learning with FCNs to accurately detect arbitrarily-shaped scene text, achieving high speed and competitive accuracy.

## Contribution

The paper presents a new single-shot, segmentation-based text detection method employing context attention and a point-to-quad clustering approach for better polygonal text representation.

## Key findings

- Achieves 81.0% Hmean on Total-Text benchmark.
- Runs at 27.63 FPS on a single GPU.
- Outperforms most existing segmentation-based methods.

## Abstract

Detecting scene text of arbitrary shapes has been a challenging task over the past years. In this paper, we propose a novel segmentation-based text detector, namely SAST, which employs a context attended multi-task learning framework based on a Fully Convolutional Network (FCN) to learn various geometric properties for the reconstruction of polygonal representation of text regions. Taking sequential characteristics of text into consideration, a Context Attention Block is introduced to capture long-range dependencies of pixel information to obtain a more reliable segmentation. In post-processing, a Point-to-Quad assignment method is proposed to cluster pixels into text instances by integrating both high-level object knowledge and low-level pixel information in a single shot. Moreover, the polygonal representation of arbitrarily-shaped text can be extracted with the proposed geometric properties much more effectively. Experiments on several benchmarks, including ICDAR2015, ICDAR2017-MLT, SCUT-CTW1500, and Total-Text, demonstrate that SAST achieves better or comparable performance in terms of accuracy. Furthermore, the proposed algorithm runs at 27.63 FPS on SCUT-CTW1500 with a Hmean of 81.0% on a single NVIDIA Titan Xp graphics card, surpassing most of the existing segmentation-based methods.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1908.05498/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/1908.05498/full.md

## References

50 references — full list in the complete paper: https://tomesphere.com/paper/1908.05498/full.md

---
Source: https://tomesphere.com/paper/1908.05498