Greedy Biomarker Discovery in the Genome with Applications to   Antimicrobial Resistance

Alexandre Drouin; S\'ebastien Gigu\`ere; Maxime D\'eraspe,; Fran\c{c}ois Laviolette; Mario Marchand; Jacques Corbeil

arXiv:1505.06249·q-bio.GN·May 26, 2015

Greedy Biomarker Discovery in the Genome with Applications to Antimicrobial Resistance

Alexandre Drouin, S\'ebastien Gigu\`ere, Maxime D\'eraspe,, Fran\c{c}ois Laviolette, Mario Marchand, Jacques Corbeil

PDF

Open Access

TL;DR

This paper extends the Set Covering Machine algorithm to handle extremely high-dimensional genomic data, demonstrating its effectiveness in predicting antimicrobial resistance with superior sparsity and accuracy compared to other methods.

Contribution

The paper introduces an extension of the SCM for large-scale genomic datasets, enabling direct analysis without feature filtering, and evaluates its performance on antimicrobial resistance prediction.

Findings

01

SCM outperforms L1/L2 SVMs and CART in sparsity and accuracy

02

SCM can analyze the full feature space without filtering

03

SCM is computationally feasible for datasets with over 10^7 features

Abstract

The Set Covering Machine (SCM) is a greedy learning algorithm that produces sparse classifiers. We extend the SCM for datasets that contain a huge number of features. The whole genetic material of living organisms is an example of such a case, where the number of feature exceeds 10^7. Three human pathogens were used to evaluate the performance of the SCM at predicting antimicrobial resistance. Our results show that the SCM compares favorably in terms of sparsity and accuracy against L1 and L2 regularized Support Vector Machines and CART decision trees. Moreover, the SCM was the only algorithm that could consider the full feature space. For all other algorithms, the latter had to be filtered as a preprocessing step.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Bioinformatics · Gene expression and cancer classification · Text and Document Classification Technologies