VizExtract: Automatic Relation Extraction from Data Visualizations

Dale Decatur; Sanjay Krishnan

arXiv:2112.03485·cs.CV·December 8, 2021·1 cites

VizExtract: Automatic Relation Extraction from Data Visualizations

Dale Decatur, Sanjay Krishnan

PDF

Open Access

TL;DR

This paper introduces VizExtract, a computer vision framework that automatically extracts variable relationships from diverse scientific visualizations, aiding search, fact-checking, and data extraction tasks.

Contribution

It presents a novel CV-based approach trained on synthetic data to identify and analyze relationships in various types of statistical charts.

Findings

01

87.5% accuracy in classifying correlations in controlled experiments

02

72.8% accuracy on real-world internet graphs

03

84.7% accuracy on the FigureQA dataset

Abstract

Visual graphics, such as plots, charts, and figures, are widely used to communicate statistical conclusions. Extracting information directly from such visualizations is a key sub-problem for effective search through scientific corpora, fact-checking, and data extraction. This paper presents a framework for automatically extracting compared variables from statistical charts. Due to the diversity and variation of charting styles, libraries, and tools, we leverage a computer vision based framework to automatically identify and localize visualization facets in line graphs, scatter plots, or bar graphs and can include multiple series per graph. The framework is trained on a large synthetically generated corpus of matplotlib charts and we evaluate the trained model on other chart datasets. In controlled experiments, our framework is able to classify, with 87.5% accuracy, the correlation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Visualization and Analytics · Advanced Text Analysis Techniques · Data Management and Algorithms