# Scatteract: Automated extraction of data from scatter plots

**Authors:** Mathieu Cliche, David Rosenberg, Dhruv Madeka, Connie Yee

arXiv: 1704.06687 · 2018-10-10

## TL;DR

Scatteract is an automated system that extracts numerical data from scatter plot images using deep learning and OCR, enabling detailed data analysis from visualizations.

## Contribution

It introduces the first fully automatic method for extracting data from scatter plots with linear scales, combining deep learning, OCR, and robust regression.

## Key findings

- Achieves 89% success rate on test scatter plots.
- Utilizes deep learning for chart component identification.
- Employs OCR and regression for pixel-to-data coordinate mapping.

## Abstract

Charts are an excellent way to convey patterns and trends in data, but they do not facilitate further modeling of the data or close inspection of individual data points. We present a fully automated system for extracting the numerical values of data points from images of scatter plots. We use deep learning techniques to identify the key components of the chart, and optical character recognition together with robust regression to map from pixels to the coordinate system of the chart. We focus on scatter plots with linear scales, which already have several interesting challenges. Previous work has done fully automatic extraction for other types of charts, but to our knowledge this is the first approach that is fully automatic for scatter plots. Our method performs well, achieving successful data extraction on 89% of the plots in our test set.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1704.06687/full.md

## Figures

29 figures with captions in the complete paper: https://tomesphere.com/paper/1704.06687/full.md

## References

25 references — full list in the complete paper: https://tomesphere.com/paper/1704.06687/full.md

---
Source: https://tomesphere.com/paper/1704.06687