Mending the Big-Data Missing Information

Hadassa Daltrophe; Shlomi Dolev; Zvi Lotker

arXiv:1405.2512·cs.OH·May 10, 2016

Mending the Big-Data Missing Information

Hadassa Daltrophe, Shlomi Dolev, Zvi Lotker

PDF

TL;DR

This paper introduces a clustering algorithm for high-dimensional, incomplete data sets modeled as affine subspaces, leveraging probabilistic analysis to ensure efficiency and correctness.

Contribution

The paper presents a novel clustering method that handles partial information in high-dimensional data using affine subspace projections with proven probabilistic guarantees.

Findings

01

Algorithm achieves poly-logarithmic time complexity.

02

Probabilistic analysis confirms the correctness of the clustering approach.

03

Method effectively clusters data with missing features in high dimensions.

Abstract

Consider a high-dimensional data set, in which for every data-point there is incomplete information. Each object in the data set represents a real entity, which is described by a point in high-dimensional space. We model the lack of information for a given object as an affine subspace in $R^{d}$ whose dimension $k$ is the number of missing features. Our goal in this study is to find clusters of objects where the main problem is to cope with partial information and high dimension. Assuming the data set is separable, namely, its emergence from clusters that can be modeled as a set of disjoint ball in $R^{d}$ , we suggest a simple data clustering algorithm. Our suggested algorithm use the affine subspaces minimum distance and calculates pair-wise projection of the data achieving poly-logarithmic time complexity. We use probabilistic considerations to prove the algorithm's…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.