PingAn-VCGroup's Solution for ICDAR 2021 Competition on Scientific Literature Parsing Task B: Table Recognition to HTML
Jiaquan Ye, Xianbiao Qi, Yelin He, Yihao Chen, Dengyi Gu, and Peng Gao, Rong Xiao

TL;DR
This paper introduces a comprehensive table recognition system for scientific literature, combining structure recognition, text detection, and recognition, achieving high accuracy in the ICDAR 2021 competition.
Contribution
The method integrates customized MASTER-based algorithms and PSENet for robust table structure and text line detection, advancing scientific literature parsing.
Findings
Achieved 96.84% TEDS score on validation data.
Achieved 96.32% TEDS score on final evaluation.
Effective integration of structure and text recognition techniques.
Abstract
This paper presents our solution for ICDAR 2021 competition on scientific literature parsing taskB: table recognition to HTML. In our method, we divide the table content recognition task into foursub-tasks: table structure recognition, text line detection, text line recognition, and box assignment.Our table structure recognition algorithm is customized based on MASTER [1], a robust image textrecognition algorithm. PSENet [2] is used to detect each text line in the table image. For text linerecognition, our model is also built on MASTER. Finally, in the box assignment phase, we associatedthe text boxes detected by PSENet with the structure item reconstructed by table structure prediction,and fill the recognized content of the text line into the corresponding item. Our proposed methodachieves a 96.84% TEDS score on 9,115 validation samples in the development phase, and a 96.32%TEDS score…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Topic Modeling · Text and Document Classification Technologies
