Quantile-based classifiers

Christian Hennig; Cinzia Viroli

arXiv:1303.1282·stat.ME·November 13, 2013

Quantile-based classifiers

Christian Hennig, Cinzia Viroli

PDF

Open Access

TL;DR

This paper introduces quantile classifiers for high-dimensional data, optimizing classification by selecting quantiles that minimize training error, with proven consistency and superior performance in simulations and real data.

Contribution

It proposes a novel quantile-based classification method that adapts to skewness and high-dimensional settings, improving upon existing classifiers.

Findings

01

The quantile classifier outperforms nine other classifiers in simulations.

02

Optimal quantile selection enhances classification accuracy.

03

The method is consistent and converges to perfect classification as data size grows.

Abstract

Quantile classifiers for potentially high-dimensional data are defined by classifying an observation according to a sum of appropriately weighted component-wise distances of the components of the observation to the within-class quantiles. An optimal percentage for the quantiles can be chosen by minimizing the misclassification error in the training sample. It is shown that this is consistent, for $n \to \infty$ , for the classification rule with asymptotically optimal quantile, and that, under some assumptions, for $p \to \infty$ the probability of correct classification converges to one. The role of skewness of the involved variables is discussed, which leads to an improved classifier. The optimal quantile classifier performs very well in a comprehensive simulation study and a real data set from chemistry (classification of bioaerosols) compared to nine other classifiers, including…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Statistical Methods and Models · Statistical Methods and Inference · Gene expression and cancer classification