Classification of Big Data with Application to Imaging Genetics

Magnus O. Ulfarsson; Frosti Palsson; Jakob Sigurdsson; Johannes R.; Sveinsson

arXiv:1605.04932·physics.data-an·May 18, 2016

Classification of Big Data with Application to Imaging Genetics

Magnus O. Ulfarsson, Frosti Palsson, Jakob Sigurdsson, Johannes R., Sveinsson

PDF

TL;DR

This paper introduces a novel classification method for high-dimensional data where variables far outnumber observations, utilizing sparse noisy PCA within LDA to improve variable selection and classification accuracy in imaging genetics.

Contribution

The paper develops a new LDA-based classification approach that incorporates sparse noisy PCA for effective variable selection in p>>n datasets, addressing covariance estimation issues.

Findings

01

Outperforms existing methods on simulated data

02

Effective variable selection in imaging genetics

03

Improved classification accuracy

Abstract

Big data applications, such as medical imaging and genetics, typically generate datasets that consist of few observations n on many more variables p, a scenario that we denote as p>>n. Traditional data processing methods are often insufficient for extracting information out of big data. This calls for the development of new algorithms that can deal with the size, complexity, and the special structure of such datasets. In this paper, we consider the problem of classifying p>>n data and propose a classification method based on linear discriminant analysis (LDA). Traditional LDA depends on the covariance estimate of the data, but when p>>n the sample covariance estimate is singular. The proposed method estimates the covariance by using a sparse version of noisy principal component analysis (nPCA). The use of sparsity in this setting aims at automatically selecting variables that are…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsLinear Discriminant Analysis