i-IF-Learn: Iterative Feature Selection and Unsupervised Learning for High-Dimensional Complex Data
Chen Ma, Wanjie Wang, Shuhao Fan

TL;DR
i-IF-Learn is an iterative unsupervised framework that jointly performs feature selection and clustering in high-dimensional data, effectively identifying influential features and improving clustering accuracy.
Contribution
It introduces an adaptive feature selection statistic that combines pseudo-labels with unsupervised signals, reducing error propagation in iterative clustering.
Findings
Outperforms classical and deep clustering methods on gene and single-cell datasets.
Selected influential features improve downstream deep learning models.
Demonstrates the importance of targeted feature selection in high-dimensional clustering.
Abstract
Unsupervised learning of high-dimensional data is challenging due to irrelevant or noisy features obscuring underlying structures. It's common that only a few features, called the influential features, meaningfully define the clusters. Recovering these influential features is helpful in data interpretation and clustering. We propose i-IF-Learn, an iterative unsupervised framework that jointly performs feature selection and clustering. Our core innovation is an adaptive feature selection statistic that effectively combines pseudo-label supervision with unsupervised signals, dynamically adjusting based on intermediate label reliability to mitigate error propagation common in iterative frameworks. Leveraging low-dimensional embeddings (PCA or Laplacian eigenmaps) followed by -means, i-IF-Learn simultaneously outputs influential feature subset and clustering labels. Numerical experiments…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSingle-cell and spatial transcriptomics · Gene expression and cancer classification · Domain Adaptation and Few-Shot Learning
