Dataset Complexity Assessment Based on Cumulative Maximum Scaled Area Under Laplacian Spectrum
Guang Li, Ren Togo, Takahiro Ogawa, Miki Haseyama

TL;DR
This paper introduces cmsAULS, a new method for predicting dataset complexity before training deep neural networks, enabling better classifier selection and dataset management with high accuracy.
Contribution
The paper proposes a novel complexity assessment method, cmsAULS, based on Laplacian spectrum analysis, achieving state-of-the-art results across multiple datasets.
Findings
cmsAULS outperforms existing complexity measures
Effective pre-training dataset complexity prediction
Facilitates classifier selection and dataset reduction
Abstract
Dataset complexity assessment aims to predict classification performance on a dataset with complexity calculation before training a classifier, which can also be used for classifier selection and dataset reduction. The training process of deep convolutional neural networks (DCNNs) is iterative and time-consuming because of hyperparameter uncertainty and the domain shift introduced by different datasets. Hence, it is meaningful to predict classification performance by assessing the complexity of datasets effectively before training DCNN models. This paper proposes a novel method called cumulative maximum scaled Area Under Laplacian Spectrum (cmsAULS), which can achieve state-of-the-art complexity assessment performance on six datasets.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsDiffusion-Convolutional Neural Networks
