TL;DR
This paper introduces scalable unsupervised feature selection methods that leverage weight stability across Minkowski exponents, improving clustering in high-dimensional data.
Contribution
It proposes new algorithms, FS-MWK++ and SFS-MWK++, with theoretical guarantees for identifying relevant features across Minkowski exponents.
Findings
The algorithms effectively distinguish relevant from noise features.
Theoretical analysis confirms consistent feature weighting under certain conditions.
Software implementation is publicly available at the provided GitHub link.
Abstract
Unsupervised feature selection is critical for improving clustering performance in high-dimensional data, where irrelevant features can obscure meaningful structure. In this work, we introduce the Minkowski weighted -means++, a novel initialisation strategy for the Minkowski Weighted -means. Our initialisation selects centroids probabilistically using feature relevance estimates derived from the data itself. Building on this, we propose two new feature selection algorithms, FS-MWK++, which aggregates feature weights across a range of Minkowski exponents to identify stable and informative features, and SFS-MWK++, a scalable variant based on subsampling. We support our approach with a theoretical analysis, demonstrating that, under explicit assumptions on noise features and cluster structure, relevant features are assigned consistently higher weights than noise features across a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
