GFTE: Graph-based Financial Table Extraction
Yiren Li, Zheng Huang, Junchi Yan, Yi Zhou, Fan Ye, Xianhui Liu

TL;DR
This paper introduces GFTE, a graph-based neural network model for extracting structured financial tables from unstructured digital files, supported by a new Chinese dataset called FinTab with over 1,600 tables.
Contribution
The paper presents a novel graph-based neural network model GFTE and a comprehensive dataset FinTab for financial table extraction from unstructured files.
Findings
GFTE achieves good results in table structure prediction.
FinTab dataset contains diverse financial tables with JSON annotations.
GFTE effectively integrates image, position, and textual features.
Abstract
Tabular data is a crucial form of information expression, which can organize data in a standard structure for easy information retrieval and comparison. However, in financial industry and many other fields tables are often disclosed in unstructured digital files, e.g. Portable Document Format (PDF) and images, which are difficult to be extracted directly. In this paper, to facilitate deep learning based table extraction from unstructured digital files, we publish a standard Chinese dataset named FinTab, which contains more than 1,600 financial tables of diverse kinds and their corresponding structure representation in JSON. In addition, we propose a novel graph-based convolutional neural network model named GFTE as a baseline for future comparison. GFTE integrates image feature, position feature and textual feature together for precise edge prediction and reaches overall good results.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Text Analysis Techniques · Handwritten Text Recognition Techniques · Image Retrieval and Classification Techniques
