VizExtract: Automatic Relation Extraction from Data Visualizations
Dale Decatur, Sanjay Krishnan

TL;DR
This paper introduces VizExtract, a computer vision framework that automatically extracts variable relationships from diverse scientific visualizations, aiding search, fact-checking, and data extraction tasks.
Contribution
It presents a novel CV-based approach trained on synthetic data to identify and analyze relationships in various types of statistical charts.
Findings
87.5% accuracy in classifying correlations in controlled experiments
72.8% accuracy on real-world internet graphs
84.7% accuracy on the FigureQA dataset
Abstract
Visual graphics, such as plots, charts, and figures, are widely used to communicate statistical conclusions. Extracting information directly from such visualizations is a key sub-problem for effective search through scientific corpora, fact-checking, and data extraction. This paper presents a framework for automatically extracting compared variables from statistical charts. Due to the diversity and variation of charting styles, libraries, and tools, we leverage a computer vision based framework to automatically identify and localize visualization facets in line graphs, scatter plots, or bar graphs and can include multiple series per graph. The framework is trained on a large synthetically generated corpus of matplotlib charts and we evaluate the trained model on other chart datasets. In controlled experiments, our framework is able to classify, with 87.5% accuracy, the correlation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Visualization and Analytics · Advanced Text Analysis Techniques · Data Management and Algorithms
