Improving cluster recovery with feature rescaling factors
Renato Cordeiro de Amorim, Vladimir Makarenkov

TL;DR
This paper proposes a novel feature rescaling method for clustering that emphasizes more relevant features based on their within-cluster relevance, improving cluster recovery performance.
Contribution
The paper introduces a feature rescaling approach that considers feature relevance within clusters, enhancing clustering accuracy over traditional normalization methods.
Findings
Proposed method outperforms traditional normalization in experiments.
Method is effective on both real and synthetic datasets.
Improves clustering robustness in noisy environments.
Abstract
The data preprocessing stage is crucial in clustering. Features may describe entities using different scales. To rectify this, one usually applies feature normalisation aiming at rescaling features so that none of them overpowers the others in the objective function of the selected clustering algorithm. In this paper, we argue that the rescaling procedure should not treat all features identically. Instead, it should favour the features that are more meaningful for clustering. With this in mind, we introduce a feature rescaling method that takes into account the within-cluster degree of relevance of each feature. Our comprehensive simulation study, carried out on real and synthetic data, with and without noise features, clearly demonstrates that clustering methods that use the proposed data normalization strategy clearly outperform those that use traditional data normalization.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
