Complicated Table Structure Recognition
Zewen Chi, Heyan Huang, Heng-Da Xu, Houjin Yu, Wanxuan Yin and, Xian-Ling Mao

TL;DR
This paper introduces GraphTSR, a graph neural network for recognizing complex table structures in PDFs, especially those with spanning cells, and provides a large dataset for evaluation.
Contribution
It proposes a novel graph neural network model for complex table structure recognition and constructs a large-scale dataset for benchmarking.
Findings
GraphTSR outperforms existing methods on benchmark datasets.
The dataset SciTSR contains 15,000 annotated tables from scientific papers.
The model effectively handles tables with spanning cells.
Abstract
The task of table structure recognition aims to recognize the internal structure of a table, which is a key step to make machines understand tables. Currently, there are lots of studies on this task for different file formats such as ASCII text and HTML. It also attracts lots of attention to recognize the table structures in PDF files. However, it is hard for the existing methods to accurately recognize the structure of complicated tables in PDF files. The complicated tables contain spanning cells which occupy at least two columns or rows. To address the issue, we propose a novel graph neural network for recognizing the table structure in PDF files, named GraphTSR. Specifically, it takes table cells as input, and then recognizes the table structures by predicting relations among cells. Moreover, to evaluate the task better, we construct a large-scale table structure recognition dataset…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Advanced Text Analysis Techniques · Data Quality and Management
MethodsGraph Neural Network
