Clustering of Data with Missing Entries

Sunrita Poddar; Mathews Jacob

arXiv:1801.01455·cs.LG·January 8, 2018

Clustering of Data with Missing Entries

Sunrita Poddar, Mathews Jacob

PDF

TL;DR

This paper introduces a novel clustering algorithm capable of effectively handling datasets with significant missing entries by solving an $usion penalty optimization problem, supported by theoretical analysis and practical demonstrations.

Contribution

It presents a new clustering method that maintains performance with missing data, including a relaxation approach with non-convex penalties and theoretical recovery guarantees.

Findings

01

Performs well with large fractions of missing data

02

Successfully recovers clusters in simulated datasets

03

Demonstrates effectiveness on real datasets

Abstract

The analysis of large datasets is often complicated by the presence of missing entries, mainly because most of the current machine learning algorithms are designed to work with full data. The main focus of this work is to introduce a clustering algorithm, that will provide good clustering even in the presence of missing data. The proposed technique solves an $ℓ_{0}$ fusion penalty based optimization problem to recover the clusters. We theoretically analyze the conditions needed for the successful recovery of the clusters. We also propose an algorithm to solve a relaxation of this problem using saturating non-convex fusion penalties. The method is demonstrated on simulated and real datasets, and is observed to perform well in the presence of large fractions of missing entries.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.