Linear Dimensionality Reduction in Linear Time:   Johnson-Lindenstrauss-type Guarantees for Random Subspace

Nick Lim; Robert J. Durrant

arXiv:1705.06408·stat.ML·May 19, 2017·2 cites

Linear Dimensionality Reduction in Linear Time: Johnson-Lindenstrauss-type Guarantees for Random Subspace

Nick Lim, Robert J. Durrant

PDF

Open Access

TL;DR

This paper introduces a fast, data-dependent Johnson-Lindenstrauss-type dimensionality reduction method using random subspaces, with guarantees for norm preservation applicable to both dense and sparse data.

Contribution

It provides theoretical guarantees for random subspace methods in dimensionality reduction, including a novel densifying preprocessing for sparse data, supported by empirical validation.

Findings

01

Random subspace preserves Euclidean geometry with high probability.

02

Densifying preprocessing improves performance on sparse data.

03

Projection dimension is logarithmic in data size, with regularity-dependent constants.

Abstract

We consider the problem of efficient randomized dimensionality reduction with norm-preservation guarantees. Specifically we prove data-dependent Johnson-Lindenstrauss-type geometry preservation guarantees for Ho's random subspace method: When data satisfy a mild regularity condition -- the extent of which can be estimated by sampling from the data -- then random subspace approximately preserves the Euclidean geometry of the data with high probability. Our guarantees are of the same order as those for random projection, namely the required dimension for projection is logarithmic in the number of data points, but have a larger constant term in the bound which depends upon this regularity. A challenging situation is when the original data have a sparse representation, since this implies a very large projection dimension is required: We show how this situation can be improved for sparse…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Markov Chains and Monte Carlo Methods