# External validation of a web- and artificial intelligence-based HIV/STI risk assessment tool: performance evaluation using data from Sydney sexual health centre

**Authors:** Phyu Mon Latt, Anik Ray, Heng Lu, Nyi N. Soe, Xianglong Xu, Yining Bao, Jason J. Ong, Eric P. F. Chow, Rick Varma, Lei Zhang, Christopher K. Fairley

PMC · DOI: 10.1186/s12879-025-12087-8 · 2025-11-25

## TL;DR

A machine learning tool for predicting HIV and STI risk was tested in a new clinic and showed moderate accuracy, with performance varying by demographic group.

## Contribution

The study provides external validation of MySTIRisk in a different Australian sexual health center, revealing its generalizability and performance variations.

## Key findings

- MySTIRisk showed AUC values of 0.67 for HIV and 0.73 for gonorrhoea, lower than original validation metrics.
- The model performed better for HIV in men who have sex with men and for gonorrhoea in younger attendees.
- At balanced thresholds, the tool identified 58.6–64.1% of infections with testing only 25.8–39.4% of the population.

## Abstract

HIV and sexually transmitted infections (STIs) continue to pose significant public health challenges globally. MySTIRisk, developed at Melbourne Sexual Health Centre (MSHC), is a machine learning-based tool that predicts individual risk for HIV, syphilis, gonorrhoea, and chlamydia using demographic and behavioural data. While initial validation showed promising results, external validation is crucial to assess its generalisability. This study externally validates MySTIRisk using data from the Sydney Sexual Health Centre (SSHC), Australia’s second-largest sexual health centre.

Following TRIPOD guidelines, we analysed consultations from patients aged 18 years and older attending SSHC between January 2013 and December 2023. Pre-trained MySTIRisk models were applied directly without modification. Performance was evaluated using the area under the receiver operating characteristic curve (AUC), sensitivity, and specificity at multiple thresholds, with subgroup analyses across demographic characteristics.

We analysed 159,043 to 207,582 consultations at SSHC, depending on the specific infections tested. The median age was 30 years, and 60.2% to 68.8% of the consultations involved men who have sex with men. The area under the receiver operating characteristic curve (AUC) values using data from SSHC were 0.67 (95% CI: 0.65–0.68) for HIV, 0.70 (95% CI: 0.69–0.71) for syphilis, 0.73 (95% CI: 0.73–0.74) for gonorrhoea, and 0.65 (95% CI: 0.65–0.66) for chlamydia, which were lower than the original MSHC validation metrics (0.74–0.87, all p < 0.001). Notably, model performance varied across demographic subgroups, with stronger HIV prediction among men who have sex with men with an AUC of 0.78 and better gonorrhoea prediction among younger attendees < 25 years with an AUC of 0.79. At balanced sensitivity-specificity thresholds, the models identified 58.6–64.1% of infections while requiring testing of only 25.8–39.4% of the population.

Despite performance decrements in external validation using SSHC data, MySTIRisk maintained moderate to good predictive ability across all infections, demonstrating reasonable generalisability across different clinical populations. The demographic variations in performance highlight the importance of context-specific implementation and potential recalibration to optimise clinical utility.

Not applicable.

The online version contains supplementary material available at 10.1186/s12879-025-12087-8.

## Linked entities

- **Diseases:** syphilis (MONDO:0005976)

## Full-text entities

- **Diseases:** HIV/STI (MESH:D012749)

## Figures

2 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12648915/full.md

---
Source: https://tomesphere.com/paper/PMC12648915