Supervised Dimensionality Reduction for Big Data

Joshua T. Vogelstein; Eric Bridgeford; Minh Tang; Da Zheng,; Christopher Douville; Randal Burns; Mauro Maggioni

arXiv:1709.01233·stat.ML·January 26, 2021

Supervised Dimensionality Reduction for Big Data

Joshua T. Vogelstein, Eric Bridgeford, Minh Tang, Da Zheng,, Christopher Douville, Randal Burns, Mauro Maggioni

PDF

2 Repos

TL;DR

This paper introduces a scalable supervised dimensionality reduction method, LOL, that improves classification accuracy on ultra-high-dimensional biomedical data while maintaining computational efficiency.

Contribution

The paper presents XOX, a novel framework extending PCA with class-conditional moments, and introduces LOL, a simple yet effective supervised reduction technique with theoretical guarantees.

Findings

01

LOL outperforms existing methods in accuracy on large biomedical datasets

02

LOL scales efficiently to millions of features within minutes

03

Theoretical analysis supports the effectiveness of the proposed approach

Abstract

To solve key biomedical problems, experimentalists now routinely measure millions or billions of features (dimensions) per sample, with the hope that data science techniques will be able to build accurate data-driven inferences. Because sample sizes are typically orders of magnitude smaller than the dimensionality of these data, valid inferences require finding a low-dimensional representation that preserves the discriminating information (e.g., whether the individual suffers from a particular disease). There is a lack of interpretable supervised dimensionality reduction methods that scale to millions of dimensions with strong statistical theoretical guarantees.We introduce an approach, XOX, to extending principal components analysis by incorporating class-conditional moment estimates into the low-dimensional projection. The simplest ver-sion, "Linear Optimal Low-rank" projection (LOL),…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsPrincipal Components Analysis