TL;DR
This paper presents a fine-grained layout analysis method for historical Tibetan documents, utilizing an enhanced SOLOv2 model to accurately segment text lines, sentences, and titles without relying on text-line segmentation.
Contribution
It introduces a dynamic dataset creation method and adapts SOLOv2 for sub-line level layout analysis of complex Tibetan documents, addressing a gap in existing research.
Findings
Achieved 72.7% average precision on the Tibetan document dataset.
Demonstrated the effectiveness of SOLOv2 for fine-grained layout analysis.
Provided insights applicable to other language documents with complex layouts.
Abstract
Accurate layout analysis without subsequent text-line segmentation remains an ongoing challenge, especially when facing the Kangyur, a kind of historical Tibetan document featuring considerable touching components and mottled background. Aiming at identifying different regions in document images, layout analysis is indispensable for subsequent procedures such as character recognition. However, there was only a little research being carried out to perform line-level layout analysis which failed to deal with the Kangyur. To obtain the optimal results, a fine-grained sub-line level layout analysis approach is presented. Firstly, we introduced an accelerated method to build the dataset which is dynamic and reliable. Secondly, enhancement had been made to the SOLOv2 according to the characteristics of the Kangyur. Then, we fed the enhanced SOLOv2 with the prepared annotation file during the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
