# Host-Filtered Blood Nucleic Acids for Pathogen Detection: Shared Background, Sparse Signal, and Methodological Limits

**Authors:** Zhaoxia Wang, Guangchan Chen, Mei Yang, Saihua Wang, Jiahui Fang, Ce Shi, Yuying Gu, Zhongping Ning

PMC · DOI: 10.3390/pathogens15010055 · 2026-01-06

## TL;DR

This study evaluates the potential of blood-based RNA sequencing for detecting pathogens, finding that most signals come from a shared background rather than specific disease-causing organisms.

## Contribution

The paper provides a quantitative benchmark of the limitations of plasma cfRNA metagenomics for pathogen detection.

## Key findings

- Classified non-host reads formed a small fraction of total cfRNA, dominated by low-abundance skin, oral, and environmental taxa.
- Mycobacterium tuberculosis reads were sparse and occurred at similar levels in TB-negative samples, limiting reliable detection.
- Background-associated clades, not distinct pathogen clusters, explained visually 'enriched' taxa in TB-positive samples.

## Abstract

Plasma cell-free RNA (cfRNA) metagenomics is increasingly explored for blood-based pathogen detection, but the structure of the shared background “blood microbiome”, the reproducibility of reported signals, and the practical limits of this approach remain unclear. We performed a critical re-analysis and benchmarking (“stress test”) of host-filtered blood RNA sequencing data from two cohorts: a bacteriologically confirmed tuberculosis (TB) cohort (n = 51) previously used only to derive host cfRNA signatures, and a coronary artery disease (CAD) cohort (n = 16) previously reported to show a CAD-shifted “blood microbiome” enriched for periodontal taxa. Both datasets were processed with a unified pipeline combining stringent human read removal and taxonomic profiling using the latest versions of specialized tools Kraken2 and MetaPhlAn4. Across both cohorts, only a minority of non-host reads were classifiable; under strict host filtering, classified non-host reads comprised 7.3% (5.0–12.0%) in CAD and 21.8% (5.4–31.5%) in TB, still representing only a small fraction of total cfRNA. Classified non-host communities were dominated by recurrent, low-abundance taxa from skin, oral, and environmental lineages, forming a largely shared, low-complexity background in both TB and CAD. Background-derived bacterial signatures showed only modest separation between disease and control groups, with wide intra-group variability. Mycobacterium tuberculosis-assigned reads were detectable in many TB-positive samples but accounted for ≤0.001% of total cfRNA and occurred at similar orders of magnitude in a subset of TB-negative samples, precluding robust discrimination. Phylogeny-aware visualization confirmed that visually “enriched” taxa in TB-positive plasma arose mainly from background-associated clades rather than a distinct pathogen-specific cluster. Collectively, these findings provide a quantitative benchmark of the background-dominated regime and practical limits of plasma cfRNA metagenomics for pathogen detection, highlighting that practical performance is constrained more by a shared, low-complexity background and sparse pathogen-derived fragments than by large disease-specific shifts, underscoring the need for transparent host filtering, explicit background modeling, and integration with targeted or orthogonal assays.

## Linked entities

- **Diseases:** tuberculosis (MONDO:0018076), coronary artery disease (MONDO:0005010)
- **Species:** Mycobacterium tuberculosis (taxon 1773)

## Full-text entities

- **Diseases:** TB (MESH:D014376), CAD (MESH:D003324)
- **Species:** Homo sapiens (human, species) [taxon 9606], Mycobacterium tuberculosis (species) [taxon 1773]

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12845112/full.md

---
Source: https://tomesphere.com/paper/PMC12845112