Datasets of Visualization for Machine Learning

Can Liu; Ruike Jiang; Shaocong Tan; Jiacheng Yu; Chaofan Yang; Hanning; Shao; Xiaoru Yuan

arXiv:2407.16351·cs.HC·July 24, 2024

Datasets of Visualization for Machine Learning

Can Liu, Ruike Jiang, Shaocong Tan, Jiacheng Yu, Chaofan Yang, Hanning, Shao, Xiaoru Yuan

PDF

TL;DR

This paper surveys existing visualization datasets for machine learning, analyzing their characteristics, challenges, and proposing a model to understand their diversity, aiming to guide future dataset development and standardization.

Contribution

It provides a comprehensive overview of visualization datasets, introduces a what-why-how model, and discusses challenges and future directions for dataset standardization and expansion.

Findings

01

Diverse data types and formats in visualization datasets

02

Limited availability of large-scale datasets

03

Challenges in standardization and dataset construction

Abstract

Datasets of visualization play a crucial role in automating data-driven visualization pipelines, serving as the foundation for supervised model training and algorithm benchmarking. In this paper, we survey the literature on visualization datasets and provide a comprehensive overview of existing visualization datasets, including their data types, formats, supported tasks, and openness. We propose a what-why-how model for visualization datasets, considering the content of the dataset (what), the supported tasks (why), and the dataset construction process (how). This model provides a clear understanding of the diversity and complexity of visualization datasets. Additionally, we highlight the challenges faced by existing visualization datasets, including the lack of standardization in data types and formats and the limited availability of large-scale datasets. To address these challenges,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.