Uncertainty-Aware Complex Scientific Table Data Extraction
Kehinde Ajayi, Yi He, Jian Wu

TL;DR
This paper introduces an uncertainty-aware framework for extracting data from complex scientific tables, leveraging conformal prediction to improve accuracy and reduce manual verification efforts.
Contribution
It presents a novel uncertainty quantification method for scientific table data extraction, enhancing error detection and reducing manual verification workload.
Findings
Using UQ, data quality improves by 30% with 47% manual verification.
Framework effectively detects extraction errors in complex scientific tables.
Quantitative evaluation demonstrates the potential of UQ to streamline data extraction workflows.
Abstract
Table structure recognition (TSR) and optical character recognition (OCR) play crucial roles in extracting structured data from tables in scientific documents. However, existing extraction frameworks built on top of TSR and OCR methods often fail to quantify the uncertainties of extracted results. To obtain highly accurate data for scientific domains, all extracted data must be manually verified, which can be time-consuming and labor-intensive. We propose a framework that performs uncertainty-aware data extraction for complex scientific tables, built on conformal prediction, a model-agnostic method for uncertainty quantification (UQ). We explored various uncertainty scoring methods to aggregate the uncertainties introduced by TSR and OCR. We rigorously evaluated the framework using a standard benchmark and an in-house dataset consisting of complex scientific tables in six scientific…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Mathematics, Computing, and Information Processing · Data Quality and Management
