Classification and sparse-signature extraction from gene-expression data
Andrea Pagnani, Francesca Tria, Martin Weigt

TL;DR
This paper introduces a statistical mechanics-based algorithm for classifying high-dimensional gene-expression data and extracting sparse signatures, demonstrating improved performance over existing methods despite data limitations.
Contribution
It presents a novel message-passing algorithm for simultaneous classification and sparse signature extraction in high-dimensional data, addressing the NP-hardness of the problem.
Findings
Algorithm performs better than many state-of-the-art bioinformatics methods.
Validates approach on artificial data to assess limitations.
Shows applicability to gene-expression data in cancer classification.
Abstract
In this work we suggest a statistical mechanics approach to the classification of high-dimensional data according to a binary label. We propose an algorithm whose aim is twofold: First it learns a classifier from a relatively small number of data, second it extracts a sparse signature, {\it i.e.} a lower-dimensional subspace carrying the information needed for the classification. In particular the second part of the task is NP-hard, therefore we propose a statistical-mechanics based message-passing approach. The resulting algorithm is firstly tested on artificial data to prove its validity, but also to elucidate possible limitations. As an important application, we consider the classification of gene-expression data measured in various types of cancer tissues. We find that, despite the currently low quantity and quality of available data (the number of available samples is much…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
