Adaptive Noisy Clustering
Michael Chichignoud, S\'ebastien Loustau

TL;DR
This paper introduces an adaptive noisy clustering method that employs a deconvolution-based $k$-means approach with a data-driven bandwidth selection rule, achieving fast convergence rates despite noise.
Contribution
It proposes a novel adaptive noisy $k$-means clustering method with a new bandwidth selection rule based on empirical risk comparison, applicable to various $M$-estimation problems.
Findings
Achieves fast convergence rates for excess risk.
Develops a data-driven bandwidth selection rule (ERC).
Demonstrates applicability to multiple statistical problems.
Abstract
The problem of adaptive noisy clustering is investigated. Given a set of noisy observations , , the goal is to design clusters associated with the law of 's, with unknown density with respect to the Lebesgue measure. Since we observe a corrupted sample, a direct approach as the popular {\it -means} is not suitable in this case. In this paper, we propose a noisy -means minimization, which is based on the -means loss function and a deconvolution estimator of the density . In particular, this approach suffers from the dependence on a bandwidth involved in the deconvolution kernel. Fast rates of convergence for the excess risk are proposed for a particular choice of the bandwidth, which depends on the smoothness of the density . Then, we turn out into the main issue of the paper: the data-driven choice of the bandwidth. We state an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
