# MetaBoot: a machine learning framework of taxonomical biomarker discovery for different microbial communities based on metagenomic data

**Authors:** Xiaojun Wang, Xiaoquan Su, Xinping Cui, Kang Ning

PMC · DOI: 10.7717/peerj.993 · 2015-07-07

## TL;DR

MetaBoot is a new machine learning method for finding non-redundant taxonomical biomarkers in microbial communities using metagenomic data.

## Contribution

MetaBoot combines mRMR and bootstrapping to robustly and accurately discover biomarkers for microbial communities.

## Key findings

- MetaBoot outperforms existing methods in selecting non-redundant and discriminative biomarkers.
- It is robust across datasets with varied complexity and taxonomical distribution patterns.
- The method shows high accuracy and biological consistency in biomarker discovery.

## Abstract

As more than 90% of species in a microbial community could not be isolated and cultivated, the metagenomic methods have become one of the most important methods to analyze microbial community as a whole. With the fast accumulation of metagenomic samples and the advance of next-generation sequencing techniques, it is now possible to qualitatively and quantitatively assess all taxa (features) in a microbial community. A set of taxa with presence/absence or their different abundances could potentially be used as taxonomical biomarkers for identification of the corresponding microbial community’s phenotype. Though there exist some bioinformatics methods for metagenomic biomarker discovery, current methods are not robust, accurate and fast enough at selection of non-redundant biomarkers for prediction of microbial community’s phenotype. In this study, we have proposed a novel method, MetaBoot, that combines the techniques of mRMR (minimal redundancy maximal relevance) and bootstrapping, for discover of non-redundant biomarkers for microbial communities through mining of metagenomic data. MetaBoot has been tested and compared with other methods on well-designed simulated datasets considering normal and gamma distribution as well as publicly available metagenomic datasets. Results have shown that MetaBoot was robust across datasets of varied complexity and taxonomical distribution patterns and could also select discriminative biomarkers with quite high accuracy and biological consistency. Thus, MetaBoot is suitable for robustly and accurately discover taxonomical biomarkers for different microbial communities.

## Full-text entities

- **Genes:** PODXL2 (podocalyxin like 2) [NCBI Gene 50512] {aka EG, PODLX2}
- **Diseases:** periodontal diseases (MESH:D010510), oral disease (MESH:D009059), gingivitis (MESH:D005891), periodontitis (MESH:D010518), oral infections (MESH:D007239), oral (MESH:D020820), endocarditis (MESH:D004696)
- **Species:** Streptococcus (genus) [taxon 1301], Treponema (genus) [taxon 157], Peptostreptococcus (genus) [taxon 1257], Leptotrichia (genus) [taxon 32067], Homo sapiens (human, species) [taxon 9606], Cardiobacterium (genus) [taxon 2717]
- **Cell lines:** S2 — Drosophila melanogaster (Fruit fly), Spontaneously immortalized cell line (CVCL_Z232)

## Figures

19 figures with captions in the complete paper: https://tomesphere.com/paper/PMC4512773/full.md

---
Source: https://tomesphere.com/paper/PMC4512773