TL;DR
TabClustPFN is a novel neural network model that performs fast, unsupervised clustering of tabular data by leveraging a prior-fitted Bayesian inference approach, handling diverse features and structures without retraining.
Contribution
It extends prior-fitted networks to clustering, enabling one-pass inference of cluster assignments and sizes on unseen tabular datasets without dataset-specific tuning.
Findings
Outperforms classical and deep clustering methods on benchmarks.
Handles heterogeneous numerical and categorical features effectively.
Demonstrates robustness in exploratory data analysis settings.
Abstract
Clustering tabular data is a fundamental yet challenging problem due to heterogeneous feature types, diverse data-generating mechanisms, and the absence of transferable inductive biases across datasets. Prior-fitted networks (PFNs) have recently demonstrated strong generalization in supervised tabular learning by amortizing Bayesian inference under a broad synthetic prior. Extending this paradigm to clustering is nontrivial: clustering is unsupervised, admits a combinatorial and permutation-invariant output space, and requires inferring the number of clusters. We introduce TabClustPFN, a prior-fitted network for tabular data clustering that performs amortized Bayesian inference over both cluster assignments and cluster cardinality. Pretrained on synthetic datasets drawn from a flexible clustering prior, TabClustPFN clusters unseen datasets in a single forward pass, without…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
