Split, embed and merge: An accurate table structure recognizer
Zhenrong Zhang, Jianshu Zhang, Jun Du

TL;DR
This paper presents SEM, a novel model for accurate table structure recognition that combines vision and language features, achieving state-of-the-art results on multiple datasets and complex tables.
Contribution
The paper introduces SEM, a three-stage model that accurately recognizes complex table structures by integrating visual and textual information with an attention-based merging process.
Findings
Achieves 97.11% F1-Measure on SciTSR dataset
Outperforms existing methods significantly
Wins first place in ICDAR 2021 Scientific Literature Parsing competition
Abstract
Table structure recognition is an essential part for making machines understand tables. Its main task is to recognize the internal structure of a table. However, due to the complexity and diversity in their structure and style, it is very difficult to parse the tabular data into the structured format which machines can understand easily, especially for complex tables. In this paper, we introduce Split, Embed and Merge (SEM), an accurate table structure recognizer. Our model takes table images as input and can correctly recognize the structure of tables, whether they are simple or a complex tables. SEM is mainly composed of three parts, splitter, embedder and merger. In the first stage, we apply the splitter to predict the potential regions of the table row (column) separators, and obtain the fine grid structure of the table. In the second stage, by taking a full consideration of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Text Analysis Techniques · Data Quality and Management · Handwritten Text Recognition Techniques
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Convolution · Max Pooling · Fully Convolutional Network · Refunds@Expedia|||How do I get a full refund from Expedia? · Linear Warmup With Linear Decay · Residual Connection · Dense Connections
