Datasets of Visualization for Machine Learning
Can Liu, Ruike Jiang, Shaocong Tan, Jiacheng Yu, Chaofan Yang, Hanning, Shao, Xiaoru Yuan

TL;DR
This paper surveys existing visualization datasets for machine learning, analyzing their characteristics, challenges, and proposing a model to understand their diversity, aiming to guide future dataset development and standardization.
Contribution
It provides a comprehensive overview of visualization datasets, introduces a what-why-how model, and discusses challenges and future directions for dataset standardization and expansion.
Findings
Diverse data types and formats in visualization datasets
Limited availability of large-scale datasets
Challenges in standardization and dataset construction
Abstract
Datasets of visualization play a crucial role in automating data-driven visualization pipelines, serving as the foundation for supervised model training and algorithm benchmarking. In this paper, we survey the literature on visualization datasets and provide a comprehensive overview of existing visualization datasets, including their data types, formats, supported tasks, and openness. We propose a what-why-how model for visualization datasets, considering the content of the dataset (what), the supported tasks (why), and the dataset construction process (how). This model provides a clear understanding of the diversity and complexity of visualization datasets. Additionally, we highlight the challenges faced by existing visualization datasets, including the lack of standardization in data types and formats and the limited availability of large-scale datasets. To address these challenges,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
