On Cropped versus Uncropped Training Sets in Tabular Structure Detection
Yakup Akkaya, Murat Simsek, Burak Kantarci, Shahzad Khan

TL;DR
This paper compares the effectiveness of cropped versus uncropped datasets in table structure detection, revealing that cropping improves detection performance especially at higher IoU thresholds, with minimal impact at lower thresholds.
Contribution
It provides the first systematic analysis of how dataset cropping affects deep learning-based table structure detection performance.
Findings
Cropped datasets improve detection by up to 9% in average precision and recall.
Impact of cropping is negligible at IoU 50%-70%.
Cropped datasets outperform uncropped ones at IoU above 70%.
Abstract
Automated document processing for tabular information extraction is highly desired in many organizations, from industry to government. Prior works have addressed this problem under table detection and table structure detection tasks. Proposed solutions leveraging deep learning approaches have been giving promising results in these tasks. However, the impact of dataset structures on table structure detection has not been investigated. In this study, we provide a comparison of table structure detection performance with cropped and uncropped datasets. The cropped set consists of only table images that are cropped from documents assuming tables are detected perfectly. The uncropped set consists of regular document images. Experiments show that deep learning models can improve the detection performance by up to 9% in average precision and average recall on the cropped versions. Furthermore,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Image Processing and 3D Reconstruction · Digital Media Forensic Detection
