Spectral Metric for Dataset Complexity Assessment
Frederic Branchaud-Charron, Andrew Achkar, Pierre-Marc Jodoin

TL;DR
This paper introduces the cumulative spectral gradient (CSG), a new spectral-based metric for assessing image dataset complexity that correlates with CNN test accuracy and aids in dataset analysis and reduction.
Contribution
The paper presents the CSG measure, a novel spectral clustering-based complexity metric that outperforms previous methods in accuracy and speed for dataset complexity assessment.
Findings
CSG correlates strongly with CNN test accuracy.
CSG is more accurate and faster than previous complexity measures.
The metric effectively identifies dataset difficulty and class separability.
Abstract
In this paper, we propose a new measure to gauge the complexity of image classification problems. Given an annotated image dataset, our method computes a complexity measure called the cumulative spectral gradient (CSG) which strongly correlates with the test accuracy of convolutional neural networks (CNN). The CSG measure is derived from the probabilistic divergence between classes in a spectral clustering framework. We show that this metric correlates with the overall separability of the dataset and thus its inherent complexity. As will be shown, our metric can be used for dataset reduction, to assess which classes are more difficult to disentangle, and approximate the accuracy one could expect to get with a CNN. Results obtained on 11 datasets and three CNN models reveal that our method is more accurate and faster than previous complexity measures.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Adversarial Robustness in Machine Learning · Advanced Neural Network Applications
MethodsSpectral Clustering
