Automated data extraction of bar chart raster images
Alex Carderas, Ye Yuan, Itamar Livnat, Ryan Yanagihara, Rosita Saul,, Gabrielle Montes De Oca, Kai Zheng, Andrew W. Browne

TL;DR
This paper presents an automated method using OCR to extract data from bar chart images, aiming to facilitate meta-analyses by reducing manual effort and improving accuracy.
Contribution
The authors developed a multistep OCR-based software for automatic data extraction from bar charts, with validation against manual methods showing high agreement.
Findings
91.8% of data points within limits of agreement
Automated extraction accuracy: X-axis labels 79.5%, Y-tick 88.6%, bar values <5% error 88.0%
Potential improvements include neural network-based redundancy checks
Abstract
Objective: To develop software utilizing optical character recognition toward the automatic extraction of data from bar charts for meta-analysis. Methods: We utilized a multistep data extraction approach that included figure extraction, text detection, and image disassembly. PubMed Central papers that were processed in this manner included clinical trials regarding macular degeneration, a disease causing blindness with a heavy disease burden and many clinical trials. Bar chart characteristics were extracted in both an automated and manual fashion. These two approaches were then compared for accuracy. These characteristics were then compared using a Bland-Altman analysis. Results: Based on Bland-Altman analysis, 91.8% of data points were within the limits of agreement. By comparing our automated data extraction with manual data extraction, automated data extraction yielded the following…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRetinal Imaging and Analysis · Digital Imaging for Blood Diseases · AI in cancer detection
