Data Quality Measures and Efficient Evaluation Algorithms for Large-Scale High-Dimensional Data
Hyeongmin Cho, Sangkyun Lee

TL;DR
This paper introduces two novel data quality measures focusing on class separability and in-class variability, along with efficient algorithms for large-scale high-dimensional data, enhancing data assessment in machine learning applications.
Contribution
It proposes new data quality measures and efficient algorithms tailored for large-scale high-dimensional datasets, addressing limitations of classical measures.
Findings
Measures are compatible with classical ones on small data
Algorithms significantly reduce computation time on large datasets
Methods effectively evaluate data quality for high-dimensional data
Abstract
Machine learning has been proven to be effective in various application areas, such as object and speech recognition on mobile systems. Since a critical key to machine learning success is the availability of large training data, many datasets are being disclosed and published online. From a data consumer or manager point of view, measuring data quality is an important first step in the learning process. We need to determine which datasets to use, update, and maintain. However, not many practical ways to measure data quality are available today, especially when it comes to large-scale high-dimensional data, such as images and videos. This paper proposes two data quality measures that can compute class separability and in-class variability, the two important aspects of data quality, for a given dataset. Classical data quality measures tend to focus only on class separability; however, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Imbalanced Data Classification Techniques · Face and Expression Recognition
