Decorrelated Clustering with Data Selection Bias
Xiao Wang, Shaohua Fan, Kun Kuang, Chuan Shi, Jiawei Liu, Bai Wang

TL;DR
This paper introduces DCKM, a clustering algorithm that mitigates selection bias effects by decorrelating features through sample weighting, leading to improved clustering performance on biased data.
Contribution
The paper proposes a novel decorrelation regularizer integrated with k-means to address selection bias in clustering, a largely unexplored area.
Findings
DCKM outperforms existing methods on real-world datasets.
Removing feature correlations improves clustering accuracy.
The approach effectively balances sample distribution.
Abstract
Most of existing clustering algorithms are proposed without considering the selection bias in data. In many real applications, however, one cannot guarantee the data is unbiased. Selection bias might bring the unexpected correlation between features and ignoring those unexpected correlations will hurt the performance of clustering algorithms. Therefore, how to remove those unexpected correlations induced by selection bias is extremely important yet largely unexplored for clustering. In this paper, we propose a novel Decorrelation regularized K-Means algorithm (DCKM) for clustering with data selection bias. Specifically, the decorrelation regularizer aims to learn the global sample weights which are capable of balancing the sample distribution, so as to remove unexpected correlations among features. Meanwhile, the learned weights are combined with k-means, which makes the reweighted…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Face and Expression Recognition · Text and Document Classification Technologies
