Adaptive Noisy Clustering

Michael Chichignoud; S\'ebastien Loustau

arXiv:1306.2194·math.ST·June 11, 2013·IEEE Trans. Inf. Theory

Adaptive Noisy Clustering

Michael Chichignoud, S\'ebastien Loustau

PDF

TL;DR

This paper introduces an adaptive noisy clustering method that employs a deconvolution-based $k$-means approach with a data-driven bandwidth selection rule, achieving fast convergence rates despite noise.

Contribution

It proposes a novel adaptive noisy $k$-means clustering method with a new bandwidth selection rule based on empirical risk comparison, applicable to various $M$-estimation problems.

Findings

01

Achieves fast convergence rates for excess risk.

02

Develops a data-driven bandwidth selection rule (ERC).

03

Demonstrates applicability to multiple statistical problems.

Abstract

The problem of adaptive noisy clustering is investigated. Given a set of noisy observations $Z_{i} = X_{i} + ϵ_{i}$ , $i = 1, ..., n$ , the goal is to design clusters associated with the law of $X_{i}$ 's, with unknown density $f$ with respect to the Lebesgue measure. Since we observe a corrupted sample, a direct approach as the popular {\it $k$ -means} is not suitable in this case. In this paper, we propose a noisy $k$ -means minimization, which is based on the $k$ -means loss function and a deconvolution estimator of the density $f$ . In particular, this approach suffers from the dependence on a bandwidth involved in the deconvolution kernel. Fast rates of convergence for the excess risk are proposed for a particular choice of the bandwidth, which depends on the smoothness of the density $f$ . Then, we turn out into the main issue of the paper: the data-driven choice of the bandwidth. We state an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.