External Validation of a Web- and Artificial Intelligence-Based HIV/STI Risk Assessment Tool: Performance Evaluation Using Data from Sydney Sexual Health Centre

This article has 0 evaluations Published on
Read the full article Related papers
This article on Sciety

Abstract

Introduction HIV and sexually transmitted infections (STIs) continue to pose significant public health challenges globally. MySTIRisk, developed at Melbourne Sexual Health Centre (MSHC), is a machine learning-based tool that predicts individual risk for HIV, syphilis, gonorrhoea, and chlamydia using demographic and behavioural data. While initial validation showed promising results, external validation is crucial to assess its generalisability. This study externally validates MySTIRisk using data from the Sydney Sexual Health Centre (SSHC), Australia's second largest sexual health centre. Methods Following TRIPOD guidelines, we analysed consultations from patients aged 18 years and older attending SSHC between January 2013 and December 2023. Pre-trained MySTIRisk models were applied directly without modification. Performance was evaluated using the area under the receiver operating characteristic curve (AUC), sensitivity, and specificity at multiple thresholds, with subgroup analyses across demographic characteristics. Results We analysed 159,043 to 207,582 consultations at SSHC, with a median age of 30 years and 60.2–68.8% of the consultations involving men who have sex with men. The area under the receiver operating characteristic curve (AUC) values using data from SSHC were 0.67 (95% CI: 0.65–0.68) for HIV, 0.70 (95% CI: 0.69–0.71) for syphilis, 0.73 (95% CI: 0.73–0.74) for gonorrhoea, and 0.65 (95% CI: 0.65–0.66) for chlamydia, which were lower than the original MSHC validation metrics (0.74–0.87, all p < 0.001). Notably, model performance varied across demographic subgroups, with stronger HIV prediction among men who have sex with men with an AUC of 0.78 and better gonorrhoea prediction among younger attendees < 25 years with an AUC value of 0.79. At balanced sensitivity-specificity thresholds, the models identified 58.6–64.1% of infections while requiring testing of only 25.8–39.4% of the population. Conclusions Despite performance decrements in external validation using SSHC data, MySTIRisk maintained moderate to good predictive ability across all infections, demonstrating reasonable generalisability across different clinical populations. The demographic variations in performance highlight the importance of context-specific implementation and potential recalibration to optimise clinical utility.

Related articles

Related articles are currently not available for this article.