Big Data Scaling through Metric Mapping: Exploiting the Remarkable   Simplicity of Very High Dimensional Spaces using Correspondence Analysis

Fionn Murtagh

arXiv:1512.04052·stat.ML·December 15, 2015

Big Data Scaling through Metric Mapping: Exploiting the Remarkable Simplicity of Very High Dimensional Spaces using Correspondence Analysis

Fionn Murtagh

PDF

TL;DR

This paper demonstrates how Correspondence Analysis can effectively scale and analyze very high dimensional data, such as in digital chemistry and finance, by exploiting the simplicity of high-dimensional spaces.

Contribution

It introduces a novel application of Correspondence Analysis for high-dimensional data scaling, especially for power law distributed datasets in various domains.

Findings

01

Effective high-dimensional scaling with Correspondence Analysis

02

Suitable for power law distributed data

03

Applicable to digital chemistry and finance datasets

Abstract

We present new findings in regard to data analysis in very high dimensional spaces. We use dimensionalities up to around one million. A particular benefit of Correspondence Analysis is its suitability for carrying out an orthonormal mapping, or scaling, of power law distributed data. Power law distributed data are found in many domains. Correspondence factor analysis provides a latent semantic or principal axes mapping. Our experiments use data from digital chemistry and finance, and other statistically generated data.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.