# UniCor and UniCorP: a novel metric and hierarchical feature selection algorithm for microbial community analysis

**Authors:** Sebastian Staab, Kim-Isabelle Mayer, Anny Cárdenas, Raquel S Peixoto, Falk Schreiber, Christian R Voolstra

PMC · DOI: 10.1093/ismeco/ycaf174 · ISME Communications · 2025-09-27

## TL;DR

The paper introduces UniCor and UniCorP, new tools for analyzing microbial communities by identifying and selecting important features based on their correlation and hierarchy.

## Contribution

UniCor and UniCorP are novel methods that combine feature uniqueness and correlation with target variables in hierarchical datasets.

## Key findings

- UniCor identifies uniquely correlated entities in microbiome datasets with continuous variables.
- UniCorP improves predictive performance by propagating selected features through taxonomic hierarchies.
- The method outperforms existing approaches in feature reduction and prediction accuracy.

## Abstract

The rapid advancement of technologies and methods in the life sciences has significantly increased the availability of big data, presenting new challenges for its analysis. Microbiome datasets, in particular, are characterized by extensive feature sets with defined but complex hierarchical structures that are often overlooked or underutilized. Here we introduce a novel metric, UniCor, to identify UNIquely CORrelated eNtities (UNICORNs) in quantitative datasets associated with continuous target variables. These datasets may include microbiome community structures in relation to environmental factors (e.g., temperature, pH, salinity) or biotic variables (e.g., thermal tolerance, oxidative stress). The UniCor metric combines the uniqueness of a given feature within a dataset with its correlation to a target variable of interest. To further enhance its utility, we developed a propagation algorithm (UniCorP), which exploits inherent dataset hierarchies, such as taxonomic levels in microbiome datasets, by selecting and propagating features based on their UniCor metric. Using bacterial community datasets with hierarchical taxonomic annotations and various continuous environmental variables, we demonstrate the ability of the novel metric to reduce features and increase predictive performance in cross-validated Random Forest Regressions. After propagating features with UniCorP and enriching the hierarchical levels with UNICORNs, the predictive performance consistently outperformed control trials for taxonomic aggregation, even at the least granular hierarchical level, allowing a substantial reduction of the feature space. We also compared the metric to existing methods for feature aggregation, showing that it offers stable, competitive predictive performance and feature reduction, within a simple and adaptable framework.

## Full-text entities

- **Genes:** DCLK3 (doublecortin like kinase 3) [NCBI Gene 85443] {aka CLR, DCAMKL3, DCDC3C, DCK3}
- **Diseases:** ML (MESH:D007859)
- **Species:** Stylophora pistillata (species) [taxon 50429], Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12560769/full.md

## Figures

1 figure with captions in the complete paper: https://tomesphere.com/paper/PMC12560769/full.md

## References

39 references — full list in the complete paper: https://tomesphere.com/paper/PMC12560769/full.md

---
Source: https://tomesphere.com/paper/PMC12560769