Dimension reduction for data of unknown cluster structure
Ewa Nowakowska, Jacek Koronacki, Stan Lipovetsky

TL;DR
This paper introduces a dimension reduction method for data from Gaussian mixtures that preserves clustering structure without prior knowledge of the clusters, using a data transformation followed by PCA.
Contribution
The paper proposes a novel transformation approach that aligns data variability with class separation directions, enabling PCA to approximate Fisher's subspace without prior cluster information.
Findings
Transformation aligns variability with class separation
PCA approximates Fisher's subspace after transformation
Preserves clustering structure in reduced dimensions
Abstract
For numerous reasons there raises a need for dimension reduction that preserves certain characteristics of data. In this work we focus on data coming from a mixture of Gaussian distributions and we propose a method that preserves distinctness of clustering structure, although the structure is assumed to be yet unknown. The rationale behind the method is the following: (i) had one known the clusters (classes) within the data, one could facilitate further analysis and reduce space dimension by projecting the data to the Fisher's linear subspace, which -- by definition -- preserves the structure of the given classes best (ii) under some reasonable assumptions, this can be done, albeit approximately, without the prior knowledge of the clusters (classes). In the paper, we show how this approach works. We present a method of preliminary data transformation that brings the directions of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Methods and Mixture Models · Face and Expression Recognition · Advanced Clustering Algorithms Research
