Random projections: data perturbation for classification problems

Timothy I. Cannings

arXiv:1911.10800·stat.ME·November 26, 2019

Random projections: data perturbation for classification problems

Timothy I. Cannings

PDF

TL;DR

This paper explores the use of random projections for data perturbation in high-dimensional classification, focusing on ensemble methods and hashing techniques to improve accuracy and computational efficiency.

Contribution

It introduces and compares two main random projection techniques—ensemble aggregation and hashing—for large-scale classification problems.

Findings

01

Ensemble of random projections improves classification accuracy.

02

Hashing techniques reduce computational complexity effectively.

03

Random projections preserve statistical properties in high-dimensional data.

Abstract

Random projections offer an appealing and flexible approach to a wide range of large-scale statistical problems. They are particularly useful in high-dimensional settings, where we have many covariates recorded for each observation. In classification problems there are two general techniques using random projections. The first involves many projections in an ensemble -- the idea here is to aggregate the results after applying different random projections, with the aim of achieving superior statistical accuracy. The second class of methods include hashing and sketching techniques, which are straightforward ways to reduce the complexity of a problem, perhaps therefore with a huge computational saving, while approximately preserving the statistical efficiency.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.