Sparse classification boundaries

Yuri I. Ingster (LETI); Christophe Pouet (LATP); Alexandre B. Tsybakov; (PMA; CREST)

arXiv:0903.4807·math.ST·March 30, 2009

Sparse classification boundaries

Yuri I. Ingster (LETI), Christophe Pouet (LATP), Alexandre B. Tsybakov, (PMA, CREST)

PDF

Open Access

TL;DR

This paper investigates the limits of sparse classification boundaries in high-dimensional settings with Gaussian and non-Gaussian noise, proposing classifiers that achieve these optimal boundaries.

Contribution

It derives the sharp classification boundary for sparse shifts in high dimensions and introduces classifiers that attain this boundary under various noise conditions.

Findings

01

Established the sharp classification boundary for Gaussian noise.

02

Proposed classifiers that achieve the optimal boundary.

03

Extended results to non-Gaussian noise satisfying the Cramér condition.

Abstract

Given a training sample of size $m$ from a $d$ -dimensional population, we wish to allocate a new observation $Z \in R^{d}$ to this population or to the noise. We suppose that the difference between the distribution of the population and that of the noise is only in a shift, which is a sparse vector. For the Gaussian noise, fixed sample size $m$ , and the dimension $d$ that tends to infinity, we obtain the sharp classification boundary and we propose classifiers attaining this boundary. We also give extensions of this result to the case where the sample size $m$ depends on $d$ and satisfies the condition $(lo g m) / lo g d \to γ$ , $0 \leq γ < 1$ , and to the case of non-Gaussian noise satisfying the Cram\'er condition.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStatistical Methods and Inference · Machine Learning and Algorithms · Imbalanced Data Classification Techniques