Dataset Complexity Assessment Based on Cumulative Maximum Scaled Area   Under Laplacian Spectrum

Guang Li; Ren Togo; Takahiro Ogawa; Miki Haseyama

arXiv:2209.14743·cs.CV·September 30, 2022

Dataset Complexity Assessment Based on Cumulative Maximum Scaled Area Under Laplacian Spectrum

Guang Li, Ren Togo, Takahiro Ogawa, Miki Haseyama

PDF

TL;DR

This paper introduces cmsAULS, a new method for predicting dataset complexity before training deep neural networks, enabling better classifier selection and dataset management with high accuracy.

Contribution

The paper proposes a novel complexity assessment method, cmsAULS, based on Laplacian spectrum analysis, achieving state-of-the-art results across multiple datasets.

Findings

01

cmsAULS outperforms existing complexity measures

02

Effective pre-training dataset complexity prediction

03

Facilitates classifier selection and dataset reduction

Abstract

Dataset complexity assessment aims to predict classification performance on a dataset with complexity calculation before training a classifier, which can also be used for classifier selection and dataset reduction. The training process of deep convolutional neural networks (DCNNs) is iterative and time-consuming because of hyperparameter uncertainty and the domain shift introduced by different datasets. Hence, it is meaningful to predict classification performance by assessing the complexity of datasets effectively before training DCNN models. This paper proposes a novel method called cumulative maximum scaled Area Under Laplacian Spectrum (cmsAULS), which can achieve state-of-the-art complexity assessment performance on six datasets.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsDiffusion-Convolutional Neural Networks