The State of the Art in Creating Visualization Corpora for Automated Chart Analysis
Chen Chen, Zhicheng Liu

TL;DR
This paper surveys 56 studies on visualization corpora for automated chart analysis, analyzing their properties, common practices, gaps, and future needs to advance research in this field.
Contribution
It provides a comprehensive overview of existing visualization corpora, introduces a taxonomy for analysis, and highlights research gaps and future directions.
Findings
Identifies common formats and collection methods for chart corpora.
Highlights gaps in diversity and annotation practices.
Recommends tools and properties needed for future benchmark corpora.
Abstract
We present a state-of-the-art report on visualization corpora in automated chart analysis research. We survey 56 papers that created or used a visualization corpus as the input of their research techniques or systems. Based on a multi-level task taxonomy that identifies the goal, method, and outputs of automated chart analysis, we examine the property space of existing chart corpora along five dimensions: format, scope, collection method, annotations, and diversity. Through the survey, we summarize common patterns and practices of creating chart corpora, identify research gaps and opportunities, and discuss the desired properties of future benchmark corpora and the required tools to create them.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Visualization and Analytics · Advanced Text Analysis Techniques · Computational and Text Analysis Methods
