Classification with High-Dimensional Sparse Samples

Dayu Huang; Sean Meyn

arXiv:1202.1574·cs.IT·April 18, 2016

Classification with High-Dimensional Sparse Samples

Dayu Huang, Sean Meyn

PDF

TL;DR

This paper analyzes the conditions under which high-dimensional binary classification is consistent when sample sizes and alphabet size grow, introducing classifiers with different error decay properties.

Contribution

It extends previous results to the case of unequal training and test sample sizes and introduces classifiers with improved error decay in sparse high-dimensional settings.

Findings

01

Asymptotic consistency requires m=o(min{N^2, Nn})

02

Finer error decay results for sparse samples: -log(P_e) proportional to min{N^2, Nn}/m

03

Weighted coincidence classifier achieves positive error exponent J

Abstract

The task of the binary classification problem is to determine which of two distributions has generated a length- $n$ test sequence. The two distributions are unknown; two training sequences of length $N$ , one from each distribution, are observed. The distributions share an alphabet of size $m$ , which is significantly larger than $n$ and $N$ . How does $N, n, m$ affect the probability of classification error? We characterize the achievable error rate in a high-dimensional setting in which $N, n, m$ all tend to infinity, under the assumption that probability of any symbol is $O (m^{- 1})$ . The results are: 1. There exists an asymptotically consistent classifier if and only if $m = o (min {N^{2}, N n})$ . This extends the previous consistency result in [1] to the case $N \neq = n$ . 2. For the sparse sample case where $max {n, N} = o (m)$ , finer results are obtained: The best achievable probability of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.