Joint Debiased Representation Learning and Imbalanced Data Clustering
Mina Rezaei, Emilio Dorigatti, David Ruegamer, Bernd Bischl

TL;DR
This paper introduces a novel unsupervised framework that jointly learns debiased representations and performs image clustering, effectively handling class imbalance and out-of-distribution samples to improve clustering accuracy and generalization.
Contribution
The proposed method combines deep representation learning with clustering using a statistics pooling block to address class imbalance and out-of-distribution issues in unsupervised image clustering.
Findings
Significantly improves clustering results on imbalanced datasets
Enhances generalization to out-of-distribution datasets
Utilizes a novel statistics pooling block for debiased learning
Abstract
One of the most promising approaches for unsupervised learning is combining deep representation learning and deep clustering. Some recent works propose to simultaneously learn representation using deep neural networks and perform clustering by defining a clustering loss on top of embedded features. However, these approaches are sensitive to imbalanced data and out-of-distribution samples. As a consequence, these methods optimize clustering by pushing data close to randomly initialized cluster centers. This is problematic when the number of instances varies largely in different classes or a cluster with few samples has less chance to be assigned a good centroid. To overcome these limitations, we introduce a new unsupervised framework for joint debiased representation learning and image clustering. We simultaneously train two deep learning models, a deep representation network that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Machine Learning in Healthcare · Imbalanced Data Classification Techniques
