Classification with High-Dimensional Sparse Samples
Dayu Huang, Sean Meyn

TL;DR
This paper analyzes the conditions under which high-dimensional binary classification is consistent when sample sizes and alphabet size grow, introducing classifiers with different error decay properties.
Contribution
It extends previous results to the case of unequal training and test sample sizes and introduces classifiers with improved error decay in sparse high-dimensional settings.
Findings
Asymptotic consistency requires m=o(min{N^2, Nn})
Finer error decay results for sparse samples: -log(P_e) proportional to min{N^2, Nn}/m
Weighted coincidence classifier achieves positive error exponent J
Abstract
The task of the binary classification problem is to determine which of two distributions has generated a length- test sequence. The two distributions are unknown; two training sequences of length , one from each distribution, are observed. The distributions share an alphabet of size , which is significantly larger than and . How does affect the probability of classification error? We characterize the achievable error rate in a high-dimensional setting in which all tend to infinity, under the assumption that probability of any symbol is . The results are: 1. There exists an asymptotically consistent classifier if and only if . This extends the previous consistency result in [1] to the case . 2. For the sparse sample case where , finer results are obtained: The best achievable probability of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
