Consistency of the $k$-Nearest Neighbor Regressor under Complex Survey Designs
Caren Hasler

TL;DR
This paper establishes the conditions under which the $k$-nearest neighbor regressor remains consistent when applied to complex survey data, extending known results from i.i.d. data to more intricate sampling designs.
Contribution
It provides the first theoretical proof of $k$-nearest neighbor consistency under complex survey designs, including convergence rates and the impact of high dimensionality.
Findings
$k$-nearest neighbor is consistent under certain survey design conditions
Convergence rates are affected by the curse of dimensionality
Empirical results support theoretical predictions
Abstract
We study the consistency of the -nearest neighbor regressor under complex survey designs. While consistency results for this algorithm are well established for independent and identically distributed data, corresponding results for complex survey data are lacking. We show that the -nearest neighbor regressor is consistent under regularity conditions on the sampling design and the distribution of the data. We derive lower bounds for the rate of convergence and show that these bounds exhibit the curse of dimensionality, as in the independent and identically distributed setting. Empirical studies based on simulated and real data illustrate our theoretical findings.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference · Distributed Sensor Networks and Detection Algorithms · Privacy-Preserving Technologies in Data
