A Data-Informed Variational Clustering Framework for Noisy High-Dimensional Data

Wan Ping Chen

arXiv:2604.06864·stat.ML·April 9, 2026

A Data-Informed Variational Clustering Framework for Noisy High-Dimensional Data

Wan Ping Chen

PDF

TL;DR

DIVI is a practical variational clustering framework designed for noisy high-dimensional data, combining feature relevance learning and adaptive structure growth to improve stability and interpretability.

Contribution

It introduces a data-informed variational approach with feature gating and adaptive structure expansion, addressing challenges of noise and unknown cluster number.

Findings

01

Performs well under severe feature noise

02

Maintains computational feasibility

03

Provides interpretable feature relevance behavior

Abstract

Clustering in high-dimensional settings with severe feature noise remains challenging, especially when only a small subset of dimensions is informative and the final number of clusters is not specified in advance. In such regimes, partition recovery, feature relevance learning, and structural adaptation are tightly coupled, and standard likelihood-based methods can become unstable or overly sensitive to noisy dimensions. We propose DIVI, a data-informed variational clustering framework that combines global feature gating with split-based adaptive structure growth. DIVI uses informative prior initialization to stabilize optimization, learns feature relevance in a differentiable manner, and expands model complexity only when local diagnostics indicate underfit. Beyond clustering performance, we also examine runtime scalability and parameter sensitivity in order to clarify the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.