PET-TURTLE: Deep Unsupervised Support Vector Machines for Imbalanced Data Clusters
Javier Salazar Cavazos

TL;DR
PET-TURTLE is a novel deep clustering algorithm that effectively handles imbalanced data by incorporating a power law prior and sparse logits, improving accuracy and cluster balance in unsupervised learning.
Contribution
It extends TURTLE by introducing a new cost function for imbalanced data and sparse logits, enhancing deep clustering performance on imbalanced datasets.
Findings
Improves clustering accuracy on imbalanced datasets.
Reduces over-prediction of minority clusters.
Enhances overall clustering quality.
Abstract
Foundation vision, audio, and language models enable zero-shot performance on downstream tasks via their latent representations. Recently, unsupervised learning of data group structure with deep learning methods has gained popularity. TURTLE, a state of the art deep clustering algorithm, uncovers data labeling without supervision by alternating label and hyperplane updates, maximizing the hyperplane margin, in a similar fashion to support vector machines (SVMs). However, TURTLE assumes clusters are balanced; when data is imbalanced, it yields non-ideal hyperplanes that cause higher clustering error. We propose PET-TURTLE, which generalizes the cost function to handle imbalanced data distributions by a power law prior. Additionally, by introducing sparse logits in the labeling process, PET-TURTLE optimizes a simpler search space that in turn improves accuracy for balanced datasets.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImbalanced Data Classification Techniques · Generative Adversarial Networks and Image Synthesis · Adversarial Robustness in Machine Learning
