UNITS: Unsupervised Intermediate Training Stage for Scene Text Detection
Youhui Guo, Yu Zhou, Xugong Qin, Enze Xie, Weiping Wang

TL;DR
This paper introduces UNITS, an unsupervised intermediate training stage for scene text detection that bridges the gap between synthetic pre-training and real-world fine-tuning, improving performance without extra inference costs.
Contribution
The paper proposes a novel unsupervised intermediate training paradigm, UNITS, to enhance scene text detection models by reducing domain discrepancy without additional inference overhead.
Findings
Improves detection accuracy on three public datasets.
Does not add parameters or computation during inference.
Provides a new training strategy for domain adaptation in scene text detection.
Abstract
Recent scene text detection methods are almost based on deep learning and data-driven. Synthetic data is commonly adopted for pre-training due to expensive annotation cost. However, there are obvious domain discrepancies between synthetic data and real-world data. It may lead to sub-optimal performance to directly adopt the model initialized by synthetic data in the fine-tuning stage. In this paper, we propose a new training paradigm for scene text detection, which introduces an \textbf{UN}supervised \textbf{I}ntermediate \textbf{T}raining \textbf{S}tage (UNITS) that builds a buffer path to real-world data and can alleviate the gap between the pre-training stage and fine-tuning stage. Three training strategies are further explored to perceive information from real-world data in an unsupervised way. With UNITS, scene text detectors are improved without introducing any parameters and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Text and Document Classification Technologies · Topic Modeling
