A Clinical Prediction Model for Bacterial Coinfection in Children with Respiratory Syncytial Virus Infection: A Development and Validation Study
Di Lian, Jianxing Wei, Dong Wang, Meiling Xie, Chenye Lin, Qiuyu Tang

TL;DR
This study created a model to identify bacterial coinfections in children with RSV using blood markers, helping reduce unnecessary antibiotic use.
Contribution
A novel clinical prediction model using NLR, CRP, and SAA for bacterial coinfection in RSV-infected children.
Findings
The model achieved an AUC of 0.832 in the training set and 0.811 in the test set.
NLR, CRP, and SAA were identified as key predictors of bacterial coinfection.
The model showed good calibration and clinical utility across various threshold probabilities.
Abstract
Objectives: Respiratory syncytial virus (RSV) is a leading cause of hospitalization for acute lower respiratory tract infections (ALRIs) in children, with bacterial coinfection complicating diagnosis and often driving antibiotic overuse. This study aimed to develop and validate a clinical prediction model using common laboratory biomarkers to enable early, accurate identification of clinically significant bacterial coinfection in children with RSV infection. Methods: A single-center, retrospective cohort study was conducted at Fujian Children’s Hospital, enrolling 518 hospitalized children with RSV infection, which was confirmed via targeted next-generation sequencing (tNGS). Patients were randomly divided into a training set (n = 363) and a test set (n = 155) at a 7:3 ratio. The primary outcome, bacterial coinfection, was defined by a composite reference standard integrating…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3- —Natural Science Foundation of Fujian Province
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRespiratory viral infections research · Pneumonia and Respiratory Infections · Antibiotic Use and Resistance
1. Introduction
Globally, respiratory syncytial virus (RSV) is recognized as the primary pathogen responsible for acute lower respiratory tract infections (ALRIs) among the pediatric population, representing a significant challenge to public health systems [1]. A key challenge in managing RSV infections is the early detection of bacterial coinfection, which exacerbates disease severity and drives antibiotic overuse, particularly when clinical presentations overlap with viral pathology [2,3]. Targeted next-generation sequencing (tNGS) has significantly improved pathogen detection sensitivity. However, distinguishing clinically significant bacterial coinfection from colonization remains challenging, often requiring integrated laboratory approaches [4].
Conventional inflammatory indices, such as C-reactive protein (CRP) and white blood cell count (WBC), lack sufficient diagnostic accuracy for detecting bacterial coinfection in patients with RSV [5]. Emerging evidence suggests that novel inflammatory biomarkers, such as the neutrophil-to-lymphocyte ratio (NLR) and serum amyloid A (SAA), captured through routine laboratory testing, may enhance diagnostic accuracy by reflecting distinct immune responses [6]. However, the diagnostic potential of individual markers is constrained, highlighting the need for multivariable models that leverage laboratory data [7]. The integration of tNGS with conventional biomarkers presents a unique opportunity in laboratory medicine to develop precise diagnostic tools.
The primary objective of this research was to construct and internally validate a predictive tool that integrates NLR, CRP, and SAA for the detection of bacterial coinfection in children with tNGS-verified RSV infection. We further developed a nomogram to visualize this model, providing clinicians with a practical tool to interpret complex laboratory data. This approach aims to support evidence-based antibiotic prescribing and facilitate precision medicine in pediatric RSV cases.
2. Materials and Methods
2.1. Study Design and Ethical Statement
This single-center, retrospective cohort study was conducted at Fujian Children’s Hospital. The study protocol was approved by the Ethics Committee of Fujian Children’s Hospital (Approval No.: 2025ETKLRK10017) and adhered to the Declaration of Helsinki. Individual informed consent was waived due to the retrospective nature and the use of anonymized data. The development and reporting of this prediction model strictly adhered to the Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD) statement [8].
2.2. Study Population
We screened all pediatric patients (aged 28 days to 14 years) hospitalized with a primary diagnosis of RSV infection between January 2022 and August 2025. While the initial clinical diagnosis of RSV was established via either multiplex PCR or tNGS, eligibility for this specific analysis was strictly limited to patients with available tNGS results from respiratory tract specimens (nasopharyngeal swabs or bronchoalveolar lavage fluid). To ensure cohort homogeneity, the following exclusion criteria were applied: (1) presence of severe underlying comorbidities that could confound infection assessment (e.g., congenital heart disease, severe immunodeficiency); (2) administration of systemic antibiotics for more than 48 h prior to admission; (3) incomplete electronic medical records regarding key modeling variables; and (4) co-detection of other respiratory viruses (e.g., adenovirus, influenza) or atypical pathogens (e.g., Mycoplasma pneumoniae) via tNGS. This final criterion was implemented to focus the analysis exclusively on distinguishing pure RSV infection from RSV combined with typical bacterial coinfection. Following this selection process, eligible patients were randomly partitioned into a training set and a test set at a 7:3 ratio for model development and internal validation.
2.3. Data Collection and Definitions
Data were extracted from the hospital’s electronic medical record (EMR) system by two independent researchers using a standardized form, with discrepancies resolved by consensus. Collected variables included: demographic data (age, sex, weight), clinical outcomes (hospital stay length, severity, ICU admission, mechanical ventilation), imaging findings (chest X-ray/CT: increased markings, consolidation, infiltrates, pleural effusion), and laboratory parameters (first 24 h results: WBC, platelet count, neutrophil/lymphocyte counts, CRP, procalcitonin, SAA, ferritin, LDH, albumin, ALT, AST, D-dimer). Derived indices included NLR and PLR (platelet/lymphocyte ratio). RSV subtyping (A/B) was determined via tNGS.
2.4. Targeted Next-Generation Sequencing (tNGS) and Pathogen Identification
Respiratory tract specimens (nasopharyngeal swabs or bronchoalveolar lavage fluid) were collected typically within 24 h of admission and transported to certified third-party clinical laboratories for analysis. Pathogen identification was performed using commercial multiplex PCR-based tNGS assays provided by Dian Diagnostics (Hangzhou, China) or KingMed Diagnostics (Fuzhou/Hangzhou, China). Briefly, total nucleic acids (DNA and RNA) were extracted from 300 µL of the clinical sample using automated extraction systems. For the detection of RNA viruses, reverse transcription was performed to generate cDNA. Library preparation involved multiplex PCR amplification using specific primers designed to target the hypervariable regions or specific gene sequences of pathogens. The constructed libraries were subsequently sequenced on high-throughput platforms, such as the Illumina NextSeq (Illumina, San Diego, CA, USA) or MGISEQ-2000 (MGI Tech, Shenzhen, China). The bioinformatic pipeline included adapter trimming, filtration of low-quality reads and human host sequences, and alignment of high-quality reads against a curated reference database covering over 200 respiratory pathogens. Both assays utilized internal standards to enable semi-quantitative analysis (reported as copies/mL), with a lower limit of detection ranging from 100 to 500 copies/mL.
2.5. Outcome Definition and Adjudication
The primary endpoint, clinically significant bacterial coinfection, was determined using a rigorous Composite Reference Standard (CRS) [9]. A patient was classified as coinfected only if tNGS revealed a substantial bacterial load (sequence read count >10,000 or concentration > 10^3^ copies/mL) AND was accompanied by at least two of the following clinical indicators: (1) worsening clinical status (e.g., persistent fever, increased work of breathing); (2) elevated inflammatory biomarkers (procalcitonin > 0.5 µg/L or CRP > 20 mg/L); (3) radiological confirmation of new consolidation or infiltrates; and (4) a positive therapeutic response to targeted antibiotics. While individual indicators may be nonspecific in isolation, their required concurrence (at least two) with a high bacterial load on tNGS creates a highly specific diagnostic profile. To distinguish true pathogenicity from colonization—a critical challenge in respiratory diagnostics—final classification was adjudicated by a blinded panel of two independent pediatricians. Discrepancies were resolved by a senior consultant. Notably, the adjudication process involved triangulation of available microbiological, clinical, inflammatory, and imaging data, specifically accounting for the differences in diagnostic value between specimen types (nasopharyngeal swabs vs. bronchoalveolar lavage fluid) to minimize misclassification bias [10].
2.6. Statistical Analysis
The hospital dataset was randomly split into a training set (n = 363) and a test set (n = 155) at a 7:3 ratio using stratified sampling. Data distribution was assessed with the Shapiro–Wilk test, with normally distributed variables summarized as mean ± standard deviation and non-normally distributed variables as median (interquartile range); baseline comparisons employed Pearson’s chi-squared or Fisher’s exact tests for categorical variables. For continuous data, group comparisons were performed using either Student’s t-test or the Mann–Whitney U test, depending on data normality. Records containing missing values (approximately 5.2%) were removed from the final analysis. In the training set, Least Absolute Shrinkage and Selection Operator (LASSO) regression with 10-fold cross-validation (glmnet package in R) identified predictors of bacterial coinfection, followed by multivariable logistic regression modeling. Model performance was evaluated using the area under the receiver operating characteristic curve (AUC, 0.5–1.0), the Hosmer–Lemeshow calibration test, and decision curve analysis (DCA) for net benefit estimation [11,12]. Statistical computations were executed using R software (version 4.2.2, R Foundation for Statistical Computing, Vienna, Austria) and the MSTATA software (https://www.mstata.com/). A two-sided p-value of less than 0.05 was considered statistically significant.
3. Results
3.1. Patient Enrollment and Baseline Characteristics
Between January 2022 and August 2025, 2615 children hospitalized with RSV infection were screened at Fujian Children’s Hospital. Of these, 1235 underwent targeted next-generation sequencing (tNGS), with 102 excluded due to severe underlying conditions (e.g., congenital heart disease), 376 due to systemic antibiotic use >48 h pre-admission, 47 due to co-detection of other respiratory pathogens (e.g., adenovirus, influenza), and 192 due to incomplete records, yielding 518 eligible patients. These were randomly allocated to a training set (n = 363) and a test set (n = 155) at a 7:3 ratio (Figure 1). Baseline characteristics showed no significant differences between cohorts in sex (male: 57.9% vs. 60.6%, p = 0.554), median age (12 vs. 12 months, p = 0.271), or weight (10.7 vs. 9.9 kg, p = 0.325), with laboratory and clinical features also balanced (all p > 0.05, Table 1), confirming successful randomization.
3.2. Univariate Analysis of Risk Factors for Bacterial Coinfection
In the training set (n = 363), 129 patients (35.5%) were adjudicated with bacterial coinfection. Univariate analysis revealed significant differences: median age (24 vs. 12 months, p < 0.001), white blood cell count (10.2 vs. 8.2 × 10^9^/L, p < 0.001), NLR (1.64 vs. 0.54, p < 0.001), PLR (103 vs. 74, p < 0.001), CRP (11 vs. 3 mg/L, p < 0.001), procalcitonin (0.12 vs. 0.09 µg/L, p < 0.001), and SAA (52 vs. 20 mg/L, p < 0.001) were higher in the coinfection group (Table 2), suggesting these markers’ potential in distinguishing infection states.
3.3. Development of the Predictive Model via LASSO Regression
LASSO regression with 10-fold cross-validation in the training set identified NLR, CRP, and SAA as key predictors, with an optimal penalty coefficient λ = 0.0943 (Figure 2A). The coefficient profile plot showed variable shrinkage, retaining these three markers (Figure 2B). Multivariable logistic regression confirmed their independence: NLR (OR = 2.13, 95% CI: 1.64–2.79, p < 0.001), CRP (OR = 1.03, 95% CI: 1.01–1.06, p = 0.017), and SAA (OR = 1.01, 95% CI: 1.00–1.01, p = 0.007) (Table 3).
3.4. Nomogram for Clinical Application
Based on the developed model, a nomogram was constructed to estimate bacterial coinfection probability, integrating NLR, CRP, and SAA values. Clinicians can sum points from each variable’s axis and project the total onto a risk scale for rapid assessment (Figure 3A).
3.5. Performance and Validation of the Predictive Model
The model’s discriminative ability was strong, with an AUC of 0.832 (95% CI: 0.788–0.875) in the training set and 0.811 (95% CI: 0.737–0.885) in the test set (Figure 3B). Calibration curves showed good agreement between predicted and observed probabilities (Hosmer-Lemeshow p > 0.05, Figure 3C,D). Decision curve analysis indicated net clinical benefit across 10–80% threshold probabilities (Figure 3E,F), supporting practical utility.
3.6. Pathogen Distribution of Bacterial Coinfections
Among 129 patients with bacterial coinfection, tNGS identified Haemophilus influenzae (98 cases, 18.92%), Streptococcus pneumoniae (65 cases, 12.55%), Moraxella catarrhalis (28 cases, 5.41%), Bordetella pertussis (12 cases, 2.32%), Staphylococcus aureus (10 cases, 1.93%), and Klebsiella pneumoniae (9 cases, 1.74%) as predominant pathogens (Table 4).
4. Discussion
In this study, we successfully developed and validated a clinical prediction model that integrates the NLR, CRP, and SAA for the early identification of clinically significant bacterial coinfection in children hospitalized with RSV infection. The model demonstrated not only excellent discrimination (AUC > 0.8) and good calibration in internal validation but, more importantly, its practical utility was confirmed by decision curve analysis across a wide range of clinical thresholds. To our knowledge, this is one of the first studies to integrate these three common inflammatory markers for this specific clinical scenario, providing a novel, evidence-based tool to address a persistent diagnostic challenge and promote precision antibiotic stewardship in pediatrics.
Our multivariable analysis revealed that NLR, CRP, and SAA were all independent predictors of bacterial coinfection in children with RSV-ALRI. As a marker of systemic inflammation, an elevated NLR reflects an immune status characterized by neutrophil activation and lymphocyte suppression, which is closely associated with the host’s stress response during bacterial infection [13,14]. The odds ratio for NLR in our study was 2.13 (95% CI: 1.64–2.79), indicating that for each unit increase in NLR, the risk of bacterial coinfection approximately doubles. This finding is consistent with numerous studies in sepsis and intra-abdominal infections, where NLR has been proven to be a crucial indicator for predicting infection severity and prognosis [15]. As an acute-phase protein, CRP levels rise significantly during bacterial infections; although the OR in our study was modest at 1.03 (95% CI: 1.01–1.06), its dynamic changes should not be overlooked as an indicator of bacterial infection [16]. SAA, another sensitive inflammatory marker, had an OR of 1.01 (95% CI: 1.00–1.01), further supporting its value in differentiating viral from bacterial infections. The combined application of these inflammatory markers, integrated through a multivariable model, significantly improves the predictive accuracy for bacterial coinfection, overcoming the limitations of any single marker.
Compared to the existing literature, the significant innovation of our study lies in its methodological rigor and specific clinical focus. First, regarding outcome definition, we addressed the difficulty of distinguishing true pathogens from colonizers in tNGS results. Rather than relying solely on positive tNGS reads, we applied a composite reference standard. Instead, we adopted a composite reference standard encompassing etiological, clinical, inflammatory, and imaging evidence, adjudicated through a blinded expert panel process [17]. This approach minimizes misclassification bias of the outcome event, ensuring that our model predicts a “clinically significant” infection that truly warrants intervention, rather than asymptomatic carriage. This greatly enhances the clinical relevance of our findings. Second, in terms of statistical strategy, the application of LASSO regression not only resolved the issues of subjectivity and collinearity in traditional multivariable analysis but also constructed a data-driven, parsimonious model with only three core predictors [18]. This simplicity is key to the model’s potential for clinical translation, as it is easy to remember, calculate, and implement. The robustness of this statistical approach was particularly evident in addressing potential confounders. Although patients in the coinfection group were older (median 24 vs. 12 months), our LASSO regression analysis effectively controlled for this. Age was included as a candidate variable alongside inflammatory markers. However, the algorithm did not select Age, but instead retained NLR, CRP, and SAA as the most powerful predictors. This indicates that the elevated inflammatory markers are independent indicators of bacterial coinfection rather than a proxy for age-related immune responses.
This study also provides important local microbiological evidence for clinical practice. We found that Haemophilus influenzae and Streptococcus pneumoniae are the predominant pathogens in children with RSV and bacterial coinfection. This finding is consistent with the pathogen spectrum of community-acquired pneumonia in children in many regions, but clarifying their leading role in the context of RSV infection provides more precise guidance for empirical antibiotic selection [19,20]. For example, while awaiting etiological results, choosing an antibiotic that effectively covers these two pathogens (e.g., amoxicillin–clavulanate or a second/third-generation cephalosporin) would be a more evidence-based decision for RSV-infected children with a high suspicion of bacterial coinfection [21,22]. The efficacy of such targeted treatment was clearly reflected in our cohort’s data. Interestingly, our univariate analysis showed no significant difference in the length of hospital stay or the rate of severe disease between the coinfection and non-coinfection groups. This finding likely reflects the “treatment paradox” often observed in retrospective studies. In our cohort, 100% of the patients in the bacterial coinfection group received targeted antibiotic therapy (predominantly the regimens mentioned above) based on clinical judgment or tNGS results. This timely intervention effectively mitigated disease progression, resulting in clinical outcomes comparable to those of the viral-only group. This underscores the clinical value of our prediction model: without early identification and subsequent antibiotic treatment, these high-risk patients would likely have experienced worse outcomes.
Despite its strengths, our study has limitations. As a single-center, retrospective analysis, our findings need to be interpreted with caution, as patient demographics and local practice patterns may have influenced the results. The model’s generalizability, therefore, remains to be established through external validation. While our sample size was sufficient for the primary analysis, it may not have been robust enough for detailed subgroup evaluations, such as across different age brackets. Additionally, although we included Bordetella pertussis in our coinfection analysis due to its clinical significance, we acknowledge that it represents a distinct pathology compared to typical bacterial superinfections. Furthermore, our model was intentionally parsimonious, relying on three common inflammatory markers; future iterations could potentially be enhanced by incorporating other variables, like viral load or additional cytokines. To further facilitate clinical utility, future work will focus on integrating the prediction formula directly into EMR systems, allowing for automated risk calculation to complement the manual nomogram.
These limitations naturally guide our future work. An external, multicenter prospective validation is the immediate and essential next step to confirm the model’s robustness across diverse populations. Future studies could also explore dynamic models that track biomarker changes over the first 48 h to further improve predictive accuracy. Ultimately, the true clinical value of this tool can only be confirmed through a randomized controlled trial (RCT) designed to test whether a model-guided antibiotic strategy improves patient outcomes, such as reducing antibiotic usage and hospital stay.
5. Conclusions
In conclusion, this study successfully developed and validated a clinical prediction model integrating NLR, CRP, and SAA. This model functions as a straightforward and objective instrument to assist healthcare providers in the timely and precise detection of vulnerable patients with clinically significant bacterial coinfection in the complex clinical scenario of RSV infection. It therefore provides valuable insights for achieving precision antibiotic therapy and enhancing antimicrobial stewardship (AMS). However, given the retrospective nature of this study, external validation is required before widespread clinical implementation.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Li Y. Wang X. Blau D.M. Caballero M.T. Feikin D.R. Gill C.J. Madhi S.A. Omer S.B. Simões E.A.F. Campbell H. Global, Regional, and National Disease Burden Estimates of Acute Lower Respiratory Infections Due to Respiratory Syncytial Virus in Children Younger than 5 Years in 2019: A Systematic Analysis Lancet 20223992047206410.1016/S 0140-6736(22)00478-035598608 PMC 7613574 · doi ↗ · pubmed ↗
- 2Shi T. Mc Allister D.A. O’Brien K.L. Simoes E.A.F. Madhi S.A. Gessner B.D. Polack F.P. Balsells E. Acacio S. Aguayo C. Global, Regional, and National Disease Burden Estimates of Acute Lower Respiratory Infections Due to Respiratory Syncytial Virus in Young Children in 2015: A Systematic Review and Modelling Study Lancet 201739094695810.1016/S 0140-6736(17)30938-828689664 PMC 5592248 · doi ↗ · pubmed ↗
- 3Fleming-Dutra K.E. Hersh A.L. Shapiro D.J. Bartoces M. Enns E.A. File T.M. Finkelstein J.A. Gerber J.S. Hyun D.Y. Linder J.A. Prevalence of Inappropriate Antibiotic Prescriptions Among US Ambulatory Care Visits, 2010–2011 JAMA 20163151864187310.1001/jama.2016.415127139059 · doi ↗ · pubmed ↗
- 4Liang A. Wu X. Zhu Y. Pan L. Wang A. Wu C. Xia J. Targeted Next-Generation Sequencing (t NGS): An Upcoming Application for Pathogen Identification in Clinical Diagnosis J. Infect. Public Health 20251810293610.1016/j.jiph.2025.10293640857774 · doi ↗ · pubmed ↗
- 5Qu X. Ye X. Yu J. Zheng F. Tang Y. Yuan F. Xie Q. Epidemiological and Clinical Characteristics of Bacterial Co-Detection in Respiratory Syncytial Virus-Positive Children in Wenzhou, China, 2021 to 2023 BMC Infect. Dis.20252569710.1186/s 12879-025-11086-z 40369488 PMC 12076938 · doi ↗ · pubmed ↗
- 6Fu S. Zhang M.-M. Zhang L. Wu L.-F. Hu Q.-L. The Value of Combined Serum Amyloid A Protein and Neutrophil-to-Lymphocyte Ratio Testing in the Diagnosis and Treatment of Influenza A in Children Int. J. Gen. Med.2021143729373510.2147/IJGM.S 31389534326659 PMC 8314685 · doi ↗ · pubmed ↗
- 7Wang Y. Wang T. Wen X. Feng C. Establishing a Predictive Model for Liver Fluke Infection on the Basis of Early Changes in Laboratory Indicators: A Retrospective Study Parasit. Vectors 20251818610.1186/s 13071-025-06833-940405253 PMC 12096801 · doi ↗ · pubmed ↗
- 8Collins G.S. Reitsma J.B. Altman D.G. Moons K.G. Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD): The TRIPOD Statement BMJ 2015350 g 759410.1136/bmj.g 759425569120 · doi ↗ · pubmed ↗
