A Probabilistic $\ell_1$ Method for Clustering High Dimensional Data

Tsvetan Asamov; Adi Ben-Israel

arXiv:1504.01294·math.ST·April 26, 2016

A Probabilistic $\ell_1$ Method for Clustering High Dimensional Data

Tsvetan Asamov, Adi Ben-Israel

PDF

Open Access

TL;DR

This paper introduces a probabilistic iterative clustering method for high-dimensional data using the $\, ext{l}_1$-metric, addressing distance unreliability and computational complexity issues in high-dimensional spaces.

Contribution

It presents a novel $\, ext{l}_1$-based clustering algorithm that is computationally efficient and performs better as data dimensionality increases.

Findings

01

Algorithm complexity is linear in data dimension.

02

Performance improves with higher data dimensionality.

03

Uses weighted medians for clustering in high-dimensional space.

Abstract

In general, the clustering problem is NP-hard, and global optimality cannot be established for non-trivial instances. For high-dimensional data, distance-based methods for clustering or classification face an additional difficulty, the unreliability of distances in very high-dimensional spaces. We propose a distance-based iterative method for clustering data in very high-dimensional space, using the $ℓ_{1}$ -metric that is less sensitive to high dimensionality than the Euclidean distance. For $K$ clusters in $R^{n}$ , the problem decomposes to $K$ problems coupled by probabilities, and an iteration reduces to finding $K n$ weighted medians of points on a line. The complexity of the algorithm is linear in the dimension of the data space, and its performance was observed to improve significantly as the dimension increases.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Clustering Algorithms Research · Data Management and Algorithms · Face and Expression Recognition