Clustering Unclustered Data: Unsupervised Binary Labeling of Two   Datasets Having Different Class Balances

Marthinus Christoffel du Plessis; Masashi Sugiyama

arXiv:1305.0103·cs.LG·May 2, 2013·1 cites

Clustering Unclustered Data: Unsupervised Binary Labeling of Two Datasets Having Different Class Balances

Marthinus Christoffel du Plessis, Masashi Sugiyama

PDF

Open Access

TL;DR

This paper presents a novel unsupervised method for binary labeling of two datasets with different class balances by estimating the sign of their density difference, bypassing traditional clustering limitations.

Contribution

It introduces a new approach to label unlabeled data using density difference sign estimation, applicable even when data isn't well-clustered.

Findings

01

The method outperforms traditional clustering in various datasets.

02

Direct density difference sign estimation is effective without explicit density modeling.

03

Applicable to real-world datasets with different class distributions.

Abstract

We consider the unsupervised learning problem of assigning labels to unlabeled data. A naive approach is to use clustering methods, but this works well only when data is properly clustered and each cluster corresponds to an underlying class. In this paper, we first show that this unsupervised labeling problem in balanced binary cases can be solved if two unlabeled datasets having different class balances are available. More specifically, estimation of the sign of the difference between probability densities of two unlabeled datasets gives the solution. We then introduce a new method to directly estimate the sign of the density difference without density estimation. Finally, we demonstrate the usefulness of the proposed method against several clustering methods on various toy problems and real-world datasets.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Automated Road and Building Extraction · Rough Sets and Fuzzy Logic