Big Data Scaling through Metric Mapping: Exploiting the Remarkable Simplicity of Very High Dimensional Spaces using Correspondence Analysis
Fionn Murtagh

TL;DR
This paper demonstrates how Correspondence Analysis can effectively scale and analyze very high dimensional data, such as in digital chemistry and finance, by exploiting the simplicity of high-dimensional spaces.
Contribution
It introduces a novel application of Correspondence Analysis for high-dimensional data scaling, especially for power law distributed datasets in various domains.
Findings
Effective high-dimensional scaling with Correspondence Analysis
Suitable for power law distributed data
Applicable to digital chemistry and finance datasets
Abstract
We present new findings in regard to data analysis in very high dimensional spaces. We use dimensionalities up to around one million. A particular benefit of Correspondence Analysis is its suitability for carrying out an orthonormal mapping, or scaling, of power law distributed data. Power law distributed data are found in many domains. Correspondence factor analysis provides a latent semantic or principal axes mapping. Our experiments use data from digital chemistry and finance, and other statistically generated data.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
