Discovering Structure in High-Dimensional Data Through Correlation   Explanation

Greg Ver Steeg; Aram Galstyan

arXiv:1406.1222·cs.LG·November 3, 2014·64 cites

Discovering Structure in High-Dimensional Data Through Correlation Explanation

Greg Ver Steeg, Aram Galstyan

PDF

Open Access 3 Repos

TL;DR

This paper presents CorEx, an unsupervised, scalable method that uncovers hierarchical, meaningful structures in high-dimensional data by optimizing an information-theoretic objective based on multivariate mutual information.

Contribution

It introduces CorEx, a novel unsupervised approach that learns hierarchical representations without model assumptions, suitable for high-dimensional datasets.

Findings

01

Automatically discovers meaningful data structures.

02

Effective across diverse data sources like personality tests, DNA, and language.

03

Scales linearly with data variables.

Abstract

We introduce a method to learn a hierarchy of successively more abstract representations of complex data based on optimizing an information-theoretic objective. Intuitively, the optimization searches for a set of latent factors that best explain the correlations in the data as measured by multivariate mutual information. The method is unsupervised, requires no model assumptions, and scales linearly with the number of variables which makes it an attractive approach for very high dimensional systems. We demonstrate that Correlation Explanation (CorEx) automatically discovers meaningful structure for data from diverse sources including personality tests, DNA, and human language.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Text Analysis Techniques · Bioinformatics and Genomic Networks · Data Analysis with R