Improving Table Structure Recognition with Visual-Alignment Sequential Coordinate Modeling
Yongshuai Huang, Ning Lu, Dapeng Chen, Yibo Li, Zecheng Xie, Shenggao, Zhu, Liangcai Gao, Wei Peng

TL;DR
This paper introduces VAST, an end-to-end framework that improves table structure recognition by modeling bounding box coordinates as sequences and incorporating visual alignment, leading to state-of-the-art results.
Contribution
The paper proposes a novel coordinate sequence decoder and visual-alignment loss to enhance logical and physical table structure recognition.
Findings
Achieves state-of-the-art performance in table structure recognition.
The coordinate sequence decoder improves bounding box accuracy.
Visual-alignment loss enhances local visual detail in logical representations.
Abstract
Table structure recognition aims to extract the logical and physical structure of unstructured table images into a machine-readable format. The latest end-to-end image-to-text approaches simultaneously predict the two structures by two decoders, where the prediction of the physical structure (the bounding boxes of the cells) is based on the representation of the logical structure. However, the previous methods struggle with imprecise bounding boxes as the logical representation lacks local visual information. To address this issue, we propose an end-to-end sequential modeling framework for table structure recognition called VAST. It contains a novel coordinate sequence decoder triggered by the representation of the non-empty cell from the logical structure decoder. In the coordinate sequence decoder, we model the bounding box coordinates as a language sequence, where the left, top,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Image Retrieval and Classification Techniques · Currency Recognition and Detection
