Classification of Big Data with Application to Imaging Genetics
Magnus O. Ulfarsson, Frosti Palsson, Jakob Sigurdsson, Johannes R., Sveinsson

TL;DR
This paper introduces a novel classification method for high-dimensional data where variables far outnumber observations, utilizing sparse noisy PCA within LDA to improve variable selection and classification accuracy in imaging genetics.
Contribution
The paper develops a new LDA-based classification approach that incorporates sparse noisy PCA for effective variable selection in p>>n datasets, addressing covariance estimation issues.
Findings
Outperforms existing methods on simulated data
Effective variable selection in imaging genetics
Improved classification accuracy
Abstract
Big data applications, such as medical imaging and genetics, typically generate datasets that consist of few observations n on many more variables p, a scenario that we denote as p>>n. Traditional data processing methods are often insufficient for extracting information out of big data. This calls for the development of new algorithms that can deal with the size, complexity, and the special structure of such datasets. In this paper, we consider the problem of classifying p>>n data and propose a classification method based on linear discriminant analysis (LDA). Traditional LDA depends on the covariance estimate of the data, but when p>>n the sample covariance estimate is singular. The proposed method estimates the covariance by using a sparse version of noisy principal component analysis (nPCA). The use of sparsity in this setting aims at automatically selecting variables that are…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsLinear Discriminant Analysis
