Classification and sparse-signature extraction from gene-expression data

Andrea Pagnani; Francesca Tria; Martin Weigt

arXiv:0907.3687·cond-mat.stat-mech·July 22, 2009

Classification and sparse-signature extraction from gene-expression data

Andrea Pagnani, Francesca Tria, Martin Weigt

PDF

TL;DR

This paper introduces a statistical mechanics-based algorithm for classifying high-dimensional gene-expression data and extracting sparse signatures, demonstrating improved performance over existing methods despite data limitations.

Contribution

It presents a novel message-passing algorithm for simultaneous classification and sparse signature extraction in high-dimensional data, addressing the NP-hardness of the problem.

Findings

01

Algorithm performs better than many state-of-the-art bioinformatics methods.

02

Validates approach on artificial data to assess limitations.

03

Shows applicability to gene-expression data in cancer classification.

Abstract

In this work we suggest a statistical mechanics approach to the classification of high-dimensional data according to a binary label. We propose an algorithm whose aim is twofold: First it learns a classifier from a relatively small number of data, second it extracts a sparse signature, {\it i.e.} a lower-dimensional subspace carrying the information needed for the classification. In particular the second part of the task is NP-hard, therefore we propose a statistical-mechanics based message-passing approach. The resulting algorithm is firstly tested on artificial data to prove its validity, but also to elucidate possible limitations. As an important application, we consider the classification of gene-expression data measured in various types of cancer tissues. We find that, despite the currently low quantity and quality of available data (the number of available samples is much…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.