# Towards accurate artificial intelligence models for strain-level phage–host prediction

**Authors:** Chris J Malajczuk, Andrew Vaitekenas, Joshua J Iszatt, Stephen M Stick, Anthony Kicic, Yuliya V Karpievitch

PMC · DOI: 10.1093/bib/bbag085 · Briefings in Bioinformatics · 2026-02-26

## TL;DR

This paper reviews AI models for predicting phage-host interactions at the strain level, focusing on their accuracy and applicability in phage therapy.

## Contribution

The paper provides a comprehensive analysis of AI-driven strain-level phage-host prediction methods and their challenges in real-world settings.

## Key findings

- Current AI models face challenges due to sparse and imbalanced data in phage-host interaction prediction.
- Evaluation strategies significantly influence the perceived performance of strain-level prediction models.
- Improving model robustness and interpretability is crucial for clinical translation of phage therapy.

## Abstract

Strain-level prediction of phage–host interactions (PHIs) is essential for developing targeted phage therapies. Traditional empirical and homology-based methods often lack the resolution and scalability needed for precision applications. Recently, a new generation of artificial intelligence-driven models has emerged leveraging genomic information to infer PHIs at strain-level resolution. Here, we review recent advances in strain-level PHI prediction, spanning biologically grounded feature-based models, hybrid representation-learning frameworks, phylogeny-agnostic machine learning approaches, and end-to-end deep learning architectures. We examine how these modelling strategies navigate shared structural constraints arising from sparse and imbalanced outcome data, assay-dependent labels, infection complexity, and limited generalization. We further analyse how evaluation design, negative definition, and train-test splitting strategies shape apparent strain-level performance, and why inappropriate benchmarking can inflate claims of biological resolution. Framing these issues in the context of clinical phage therapy, we examine how current strain-level PHI prediction frameworks perform under the biological, experimental, and data constraints characteristic of real-world therapeutic settings. Finally, we outline pragmatic pathways toward more robust, interpretable, and clinically translatable PHI prediction systems.

## Full-text entities

- **Diseases:** DL (MESH:D007859), Cystic Fibrosis (MESH:D003550), Infection (MESH:D007239), PBIP (MESH:C563663)
- **Chemicals:** LPS (MESH:D008070), O-antigen (MESH:D019081), PxPC (-)
- **Species:** Klebsiella pneumoniae (species) [taxon 573], Homo sapiens (human, species) [taxon 9606], Pseudomonas aeruginosa (species) [taxon 287], Escherichia coli (E. coli, species) [taxon 562]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12936788/full.md

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12936788/full.md

## References

134 references — full list in the complete paper: https://tomesphere.com/paper/PMC12936788/full.md

---
Source: https://tomesphere.com/paper/PMC12936788