Decorrelated Clustering with Data Selection Bias

Xiao Wang; Shaohua Fan; Kun Kuang; Chuan Shi; Jiawei Liu; Bai Wang

arXiv:2006.15874·cs.LG·July 3, 2020·1 cites

Decorrelated Clustering with Data Selection Bias

Xiao Wang, Shaohua Fan, Kun Kuang, Chuan Shi, Jiawei Liu, Bai Wang

PDF

Open Access 1 Repo

TL;DR

This paper introduces DCKM, a clustering algorithm that mitigates selection bias effects by decorrelating features through sample weighting, leading to improved clustering performance on biased data.

Contribution

The paper proposes a novel decorrelation regularizer integrated with k-means to address selection bias in clustering, a largely unexplored area.

Findings

01

DCKM outperforms existing methods on real-world datasets.

02

Removing feature correlations improves clustering accuracy.

03

The approach effectively balances sample distribution.

Abstract

Most of existing clustering algorithms are proposed without considering the selection bias in data. In many real applications, however, one cannot guarantee the data is unbiased. Selection bias might bring the unexpected correlation between features and ignoring those unexpected correlations will hurt the performance of clustering algorithms. Therefore, how to remove those unexpected correlations induced by selection bias is extremely important yet largely unexplored for clustering. In this paper, we propose a novel Decorrelation regularized K-Means algorithm (DCKM) for clustering with data selection bias. Specifically, the decorrelation regularizer aims to learn the global sample weights which are capable of balancing the sample distribution, so as to remove unexpected correlations among features. Meanwhile, the learned weights are combined with k-means, which makes the reweighted…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

googlebaba/IJCAI2020-DCKM
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Face and Expression Recognition · Text and Document Classification Technologies