On landmark selection and sampling in high-dimensional data analysis

Mohamed-Ali Belabbas; Patrick J. Wolfe

arXiv:0906.4582·stat.ML·April 20, 2010

On landmark selection and sampling in high-dimensional data analysis

Mohamed-Ali Belabbas, Patrick J. Wolfe

PDF

TL;DR

This paper reviews spectral methods for high-dimensional data analysis, focusing on landmark selection and sampling techniques to improve computational efficiency and extract low-dimensional structures from massive datasets.

Contribution

It introduces a quantitative framework for analyzing landmark selection procedures and provides performance bounds, enhancing understanding of spectral methods in high-dimensional data analysis.

Findings

01

Performance bounds for landmark selection algorithms

02

Effective extraction of low-dimensional structures from high-dimensional data

03

Demonstration of methods on real-world computer vision datasets

Abstract

In recent years, the spectral analysis of appropriately defined kernel matrices has emerged as a principled way to extract the low-dimensional structure often prevalent in high-dimensional data. Here we provide an introduction to spectral methods for linear and nonlinear dimension reduction, emphasizing ways to overcome the computational limitations currently faced by practitioners with massive datasets. In particular, a data subsampling or landmark selection process is often employed to construct a kernel based on partial information, followed by an approximate spectral analysis termed the Nystrom extension. We provide a quantitative framework to analyse this procedure, and use it to demonstrate algorithmic performance bounds on a range of practical approaches designed to optimize the landmark selection process. We compare the practical implications of these bounds by way of real-world…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.