TAGINE: fast taxonomy-based feature engineering for microbiome analysis
Shiri Baum, Ido Meshulam, Yadid M Algavi, Omri Peleg, Elhanan Borenstein

TL;DR
TAGINE is a fast method for creating useful features in microbiome data by using the taxonomic tree to improve predictive models.
Contribution
TAGINE introduces a novel taxonomy-based feature engineering algorithm that is faster and produces more compact feature sets.
Findings
TAGINE produces more compact feature sets compared to other methods.
TAGINE is orders of magnitude faster than existing methods while maintaining accuracy.
The algorithm preserves biological relevance and interpretability.
Abstract
TAGINE is a feature engineering algorithm that leverages the microbial taxonomic tree to optimize feature sets in microbiome data for predictive modeling. The algorithm starts with features at high taxonomic levels and iteratively splits them into lower-level clades in cases where it improves predictive accuracy, ultimately producing a feature set spanning multiple taxonomic levels. This approach aims to markedly reduce the number of features while preserving biological relevance and interpretability. We compare TAGINE’s performance to other standard and taxonomy-based feature engineering methods on several different datasets, and show that TAGINE yields more compact feature sets and is orders of magnitude faster than other methods, while maintaining predictive accuracy. TAGINE is freely available under the MIT license with source code available at…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGut microbiota and health · Machine Learning in Bioinformatics · Gene expression and cancer classification
