Deep Structured Feature Networks for Table Detection and Tabular Data Extraction from Scanned Financial Document Images
Siwen Luo, Mengting Wu, Yiwen Gong, Wanying Zhou, Josiah Poon

TL;DR
This paper introduces a deep learning-based method for detecting tables and extracting tabular data from scanned financial PDFs, addressing challenges like noise and data integrity for improved financial data processing.
Contribution
It presents a new annotated dataset for financial documents, a superior detection model using Faster R-CNN with FPN, and a rule-based layout segmentation technique for accurate data extraction.
Findings
High table detection accuracy on the new dataset
Effective extraction of structured tabular data
Scalable rule-based filtering improves data quality
Abstract
Automatic table detection in PDF documents has achieved a great success but tabular data extraction are still challenging due to the integrity and noise issues in detected table areas. The accurate data extraction is extremely crucial in finance area. Inspired by this, the aim of this research is proposing an automated table detection and tabular data extraction from financial PDF documents. We proposed a method that consists of three main processes, which are detecting table areas with a Faster R-CNN (Region-based Convolutional Neural Network) model with Feature Pyramid Network (FPN) on each page image, extracting contents and structures by a compounded layout segmentation technique based on optical character recognition (OCR) and formulating regular expression rules for table header separation. The tabular data extraction feature is embedded with rule-based filtering and restructuring…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Currency Recognition and Detection · Vehicle License Plate Recognition
MethodsRoIPool · Softmax · Convolution · Region Proposal Network · Faster R-CNN
