Iterative Subsampling in Solution Path Clustering of Noisy Big Data

Yuliya Marchetti; Qing Zhou

arXiv:1412.1559·stat.ME·September 16, 2016

Iterative Subsampling in Solution Path Clustering of Noisy Big Data

Yuliya Marchetti, Qing Zhou

PDF

TL;DR

This paper introduces an iterative subsampling technique to enhance the computational efficiency of solution path clustering for large noisy datasets, maintaining noise recognition and small cluster detection capabilities.

Contribution

It presents a novel iterative subsampling approach that significantly speeds up solution path clustering while preserving key features like noise isolation and small cluster detection.

Findings

01

Method achieves substantial computational savings.

02

Maintains accuracy in noise recognition and small cluster detection.

03

Effectively handles large, noisy datasets in gene expression analysis.

Abstract

We develop an iterative subsampling approach to improve the computational efficiency of our previous work on solution path clustering (SPC). The SPC method achieves clustering by concave regularization on the pairwise distances between cluster centers. This clustering method has the important capability to recognize noise and to provide a short path of clustering solutions; however, it is not sufficiently fast for big datasets. Thus, we propose a method that iterates between clustering a small subsample of the full data and sequentially assigning the other data points to attain orders of magnitude of computational savings. The new method preserves the ability to isolate noise, includes a solution selection mechanism that ultimately provides one clustering solution with an estimated number of clusters, and is shown to be able to extract small tight clusters from noisy data. The method's…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.