# TAGINE: fast taxonomy-based feature engineering for microbiome analysis

**Authors:** Shiri Baum, Ido Meshulam, Yadid M Algavi, Omri Peleg, Elhanan Borenstein

PMC · DOI: 10.1093/bioadv/vbag056 · 2026-02-17

## TL;DR

TAGINE is a fast method for creating useful features in microbiome data by using the taxonomic tree to improve predictive models.

## Contribution

TAGINE introduces a novel taxonomy-based feature engineering algorithm that is faster and produces more compact feature sets.

## Key findings

- TAGINE produces more compact feature sets compared to other methods.
- TAGINE is orders of magnitude faster than existing methods while maintaining accuracy.
- The algorithm preserves biological relevance and interpretability.

## Abstract

TAGINE is a feature engineering algorithm that leverages the microbial taxonomic tree to optimize feature sets in microbiome data for predictive modeling. The algorithm starts with features at high taxonomic levels and iteratively splits them into lower-level clades in cases where it improves predictive accuracy, ultimately producing a feature set spanning multiple taxonomic levels. This approach aims to markedly reduce the number of features while preserving biological relevance and interpretability. We compare TAGINE’s performance to other standard and taxonomy-based feature engineering methods on several different datasets, and show that TAGINE yields more compact feature sets and is orders of magnitude faster than other methods, while maintaining predictive accuracy.

TAGINE is freely available under the MIT license with source code available at https://github.com/borenstein-lab/tagine_fe.

## Full-text entities

- **Diseases:** GC (MESH:D013274), obesity (MESH:D009765), CD (MESH:D003424), ESRD (MESH:D007676), CRC (MESH:D015179), IBD (MESH:D015212)
- **Chemicals:** TAGINE (-)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

1 figure with captions in the complete paper: https://tomesphere.com/paper/PMC12961271/full.md

---
Source: https://tomesphere.com/paper/PMC12961271