TL;DR
UCSL is a versatile EM-based framework that combines supervised learning with clustering to discover meaningful subgroups, validated on synthetic and real datasets with improved accuracy over existing methods.
Contribution
The paper introduces UCSL, a novel EM ensemble method that integrates supervised models into clustering, capable of handling classification and regression tasks with a new dimension-reduction technique.
Findings
Achieves +1.9 points in balanced accuracy over state-of-the-art.
Effectively identifies consistent subtypes in psychiatric disease data.
Demonstrates robustness and generalization on synthetic and real datasets.
Abstract
Subtype Discovery consists in finding interpretable and consistent sub-parts of a dataset, which are also relevant to a certain supervised task. From a mathematical point of view, this can be defined as a clustering task driven by supervised learning in order to uncover subgroups in line with the supervised prediction. In this paper, we propose a general Expectation-Maximization ensemble framework entitled UCSL (Unsupervised Clustering driven by Supervised Learning). Our method is generic, it can integrate any clustering method and can be driven by both binary classification and regression. We propose to construct a non-linear model by merging multiple linear estimators, one per cluster. Each hyperplane is estimated so that it correctly discriminates - or predict - only one cluster. We use SVC or Logistic Regression for classification and SVR for regression. Furthermore, to perform…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsLogistic Regression
