Clustering, multicollinearity, and singular vectors

Hamid Usefi

arXiv:2008.03368·cs.LG·August 11, 2020

Clustering, multicollinearity, and singular vectors

Hamid Usefi

PDF

Open Access

TL;DR

This paper presents a mathematical approach to identify redundant features in data matrices by analyzing the structure of a specific matrix related to the pseudo-inverse, with applications in clustering and feature selection.

Contribution

It proves that the matrix S can be block-diagonalized to reveal linearly dependent column groups, aiding in feature redundancy detection.

Findings

01

Matrix S has a block-diagonal form after column reordering.

02

The method helps identify linearly dependent columns in data matrices.

03

Applications include improved feature selection and clustering techniques.

Abstract

Let $A$ be a matrix with its pseudo-matrix $A^{†}$ and set $S = I - A^{†} A$ . We prove that, after re-ordering the columns of $A$ , the matrix $S$ has a block-diagonal form where each block corresponds to a set of linearly dependent columns. This allows us to identify redundant columns in $A$ . We explore some applications in supervised and unsupervised learning, specially feature selection, clustering, and sensitivity of solutions of least squares solutions.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace and Expression Recognition · Advanced Statistical Methods and Models · Sparse and Compressive Sensing Techniques