On landmark selection and sampling in high-dimensional data analysis
Mohamed-Ali Belabbas, Patrick J. Wolfe

TL;DR
This paper reviews spectral methods for high-dimensional data analysis, focusing on landmark selection and sampling techniques to improve computational efficiency and extract low-dimensional structures from massive datasets.
Contribution
It introduces a quantitative framework for analyzing landmark selection procedures and provides performance bounds, enhancing understanding of spectral methods in high-dimensional data analysis.
Findings
Performance bounds for landmark selection algorithms
Effective extraction of low-dimensional structures from high-dimensional data
Demonstration of methods on real-world computer vision datasets
Abstract
In recent years, the spectral analysis of appropriately defined kernel matrices has emerged as a principled way to extract the low-dimensional structure often prevalent in high-dimensional data. Here we provide an introduction to spectral methods for linear and nonlinear dimension reduction, emphasizing ways to overcome the computational limitations currently faced by practitioners with massive datasets. In particular, a data subsampling or landmark selection process is often employed to construct a kernel based on partial information, followed by an approximate spectral analysis termed the Nystrom extension. We provide a quantitative framework to analyse this procedure, and use it to demonstrate algorithmic performance bounds on a range of practical approaches designed to optimize the landmark selection process. We compare the practical implications of these bounds by way of real-world…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
