Artificial Data, Real Insights: Evaluating Opportunities and Risks of Expanding the Data Ecosystem with Synthetic Data
Richard Timpone, Yongwei Yang

TL;DR
This paper explores the expanding role of synthetic data generated by AI, analyzing its opportunities, risks, and evaluation criteria across various applications in scientific research and data ecosystems.
Contribution
It provides a comprehensive taxonomy of synthetic data, links advances in generative AI to scientific discovery, and discusses evaluation frameworks for diverse use cases.
Findings
Synthetic data enables new research opportunities and applications.
Evaluation criteria vary depending on use case and data type.
Generative AI can create diverse synthetic datasets including populations and expert systems.
Abstract
Synthetic Data is not new, but recent advances in Generative AI have raised interest in expanding the research toolbox, creating new opportunities and risks. This article provides a taxonomy of the full breadth of the Synthetic Data domain. We discuss its place in the research ecosystem by linking the advances in computational social science with the idea of the Fourth Paradigm of scientific discovery that integrates the elements of the evolution from empirical to theoretic to computational models. Further, leveraging the framework of Truth, Beauty, and Justice, we discuss how evaluation criteria vary across use cases as the information is used to add value and draw insights. Building a framework to organize different types of synthetic data, we end by describing the opportunities and challenges with detailed examples of using Generative AI to create synthetic quantitative and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBig Data and Business Intelligence · Scientific Computing and Data Management · Data Quality and Management
