Optimal properties of centroid-based classifiers for very   high-dimensional data

Peter Hall; Tung Pham

arXiv:1002.4781·math.ST·February 26, 2010

Optimal properties of centroid-based classifiers for very high-dimensional data

Peter Hall, Tung Pham

PDF

TL;DR

This paper demonstrates that scale-adjusted centroid classifiers are optimal for distinguishing high-dimensional populations with location differences, outperforming other distance-based methods under various conditions.

Contribution

It introduces a scale-adjusted centroid classifier with proven optimal properties for high-dimensional data, accommodating sparsity, varying distributions, and mild dependence conditions.

Findings

01

Scale adjustment removes confounding scale effects.

02

Centroid classifier achieves optimal discrimination in high dimensions.

03

Numerical results support theoretical performance claims.

Abstract

We show that scale-adjusted versions of the centroid-based classifier enjoys optimal properties when used to discriminate between two very high-dimensional populations where the principal differences are in location. The scale adjustment removes the tendency of scale differences to confound differences in means. Certain other distance-based methods, for example, those founded on nearest-neighbor distance, do not have optimal performance in the sense that we propose. Our results permit varying degrees of sparsity and signal strength to be treated, and require only mild conditions on dependence of vector components. Additionally, we permit the marginal distributions of vector components to vary extensively. In addition to providing theory we explore numerical properties of a centroid-based classifier, and show that these features reflect theoretical accounts of performance.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.