Pan-infection Foundation Framework Enables Multiple Pathogen Prediction
Lingrui Zhang, Haonan Wu, Nana Jin, Chenqing Zheng, Jize Xie, Qitai, Cai, Jun Wang, Qin Cao, Xubin Zheng, Jiankun Wang, Lixin Cheng

TL;DR
This study introduces a comprehensive host-response transcriptome framework for accurate, generalizable pathogen prediction, utilizing a large dataset and knowledge distillation to create lightweight, disease-specific diagnostic models suitable for clinical use.
Contribution
The paper presents the largest infection host-response dataset and a novel knowledge distillation approach to develop accurate, lightweight pathogen diagnostic models from a pan-infection foundation.
Findings
Achieved high diagnostic accuracy with AUCs above 0.93 for multiple pathogens.
Developed lightweight models suitable for clinical deployment.
Enabled cross-disease analysis from pan-infection to sepsis.
Abstract
Host-response-based diagnostics can improve the accuracy of diagnosing bacterial and viral infections, thereby reducing inappropriate antibiotic prescriptions. However, the existing cohorts with limited sample size and coarse infections types are unable to support the exploration of an accurate and generalizable diagnostic model. Here, we curate the largest infection host-response transcriptome data, including 11,247 samples across 89 blood transcriptome datasets from 13 countries and 21 platforms. We build a diagnostic model for pathogen prediction starting from a pan-infection model as foundation (AUC = 0.97) based on the pan-infection dataset. Then, we utilize knowledge distillation to efficiently transfer the insights from this "teacher" model to four lightweight pathogen "student" models, i.e., staphylococcal infection (AUC = 0.99), streptococcal infection (AUC = 0.94), HIV…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCOVID-19 diagnosis using AI
MethodsKnowledge Distillation
