Synthetic Data Generation and Automated Multidimensional Data Labeling for AI/ML in General and Circular Coordinates
Alice Williams, Boris Kovalerchuk

TL;DR
This paper introduces a unified method for synthetic data generation and automated labeling using multidimensional coordinate systems, enhancing AI/ML training data quality through innovative visualization techniques.
Contribution
It presents a novel unified SDG-ADL algorithm utilizing General and Circular Coordinates for multidimensional data visualization and labeling, with interactive software implementation.
Findings
Improved classifier performance with synthetic data
Effective outlier detection using GLCs
Enhanced data visualization in multidimensional space
Abstract
Insufficient amounts of available training data is a critical challenge for both development and deployment of artificial intelligence and machine learning (AI/ML) models. This paper proposes a unified approach to both synthetic data generation (SDG) and automated data labeling (ADL) with a unified SDG-ADL algorithm. SDG-ADL uses multidimensional (n-D) representations of data visualized losslessly with General Line Coordinates (GLCs), relying on reversible GLC properties to visualize n-D data in multiple GLCs. This paper demonstrates use of the new Circular Coordinates in Static and Dynamic forms, used with Parallel Coordinates and Shifted Paired Coordinates, since each GLC exemplifies unique data properties, such as interattribute n-D distributions and outlier detection. The approach is interactively implemented in computer software with the Dynamic Coordinates Visualization system…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical and Computational Modeling · Advanced Data Processing Techniques
