Statistical embedding: Beyond principal components
Dag Tj{\o}stheim, Martin Jullum, Anders L{\o}land

TL;DR
This survey reviews recent advances in high-dimensional and nonlinear data embedding techniques, covering methods from principal curves to topological and network data embeddings, and their applications in visualization.
Contribution
It provides a comprehensive overview of diverse embedding methods, compares algorithmic and statistical approaches, and illustrates techniques with simulated data examples.
Findings
Different embedding methods have unique strengths and limitations.
Topological and network embeddings enable analysis of complex data structures.
Visualization techniques like t-SNE, UMAP, and LargeVis effectively represent high-dimensional data.
Abstract
There has been an intense recent activity in embedding of very high dimensional and nonlinear data structures, much of it in the data science and machine learning literature. We survey this activity in four parts. In the first part we cover nonlinear methods such as principal curves, multidimensional scaling, local linear methods, ISOMAP, graph based methods and diffusion mapping, kernel based methods and random projections. The second part is concerned with topological embedding methods, in particular mapping topological properties into persistence diagrams and the Mapper algorithm. Another type of data sets with a tremendous growth is very high-dimensional network data. The task considered in part three is how to embed such data in a vector space of moderate dimension to make the data amenable to traditional techniques such as cluster and classification techniques. Arguably this is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopological and Geometric Data Analysis · Complex Network Analysis Techniques · Bioinformatics and Genomic Networks
MethodsDiffusion
