scatteR: Generating instance space based on scagnostics
Janith C. Wanniarachchi, Thiyanga S. Talagala

TL;DR
scatteR is a novel data generation method that uses Scagnostics measures and an optimization process to create datasets with specific structural characteristics, offering an alternative to traditional model-based synthetic data methods.
Contribution
scatteR introduces a new approach to synthetic data generation by leveraging Scagnostics and optimization, focusing on data structure rather than model parameters.
Findings
Generates 50 data points in under 30 seconds
Achieves an average RMSE of 0.05 in matching target structures
Provides a pedagogical tool for teaching statistical data characteristics
Abstract
Traditional synthetic data generation methods rely on model-based approaches that tune the parameters of a model rather than focusing on the structure of the data itself. In contrast, Scagnostics is an exploratory graphical method that captures the structure of bivariate data using graph-theoretic measures. This paper presents a novel data generation method, scatteR, that uses Scagnostics measurements to control the characteristics of the generated dataset. By using an iterative Generalized Simulated Annealing optimizer, scatteR finds the optimal arrangement of data points that minimizes the distance between current and target Scagnostics measurements. The results demonstrate that scatteR can generate 50 data points in under 30 seconds with an average Root Mean Squared Error of 0.05, making it a useful pedagogical tool for teaching statistical methods. Overall, scatteR provides an entry…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistics Education and Methodologies · Machine Learning and Data Classification · Evolutionary Algorithms and Applications
