SegHist: A General Segmentation-based Framework for Chinese Historical Document Text Line Detection
Xingjian Hu, Baole Wei, Liangcai Gao, Jun Wang

TL;DR
SegHist is a versatile segmentation-based framework that significantly improves Chinese historical document text line detection, especially for high aspect ratio and rotated text lines, achieving state-of-the-art results on multiple datasets.
Contribution
The paper introduces SegHist, a general segmentation framework that enhances existing methods for challenging historical document text line detection tasks.
Findings
Achieves state-of-the-art results on CHDAC, MTHv2, and HDRC datasets.
Improves detection accuracy by 1.19% on CHDAC.
Demonstrates robustness to rotated text lines.
Abstract
Text line detection is a key task in historical document analysis facing many challenges of arbitrary-shaped text lines, dense texts, and text lines with high aspect ratios, etc. In this paper, we propose a general framework for historical document text detection (SegHist), enabling existing segmentation-based text detection methods to effectively address the challenges, especially text lines with high aspect ratios. Integrating the SegHist framework with the commonly used method DB++, we develop DB-SegHist. This approach achieves SOTA on the CHDAC, MTHv2, and competitive results on HDRC datasets, with a significant improvement of 1.19% on the most challenging CHDAC dataset which features more text lines with high aspect ratios. Moreover, our method attains SOTA on rotated MTHv2 and rotated HDRC, demonstrating its rotational robustness. The code is available at…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Web Data Mining and Analysis · Text and Document Classification Technologies
