Assessing putative bias in prediction of anti-microbial resistance from real-world genotyping data under explicit causal assumptions
Mattia Prosperi, Simone Marini, Christina Boucher, Jiang Bian

TL;DR
This study evaluates the impact of sampling bias on antimicrobial resistance prediction models using genomic data, demonstrating that bias adjustment can modestly improve model performance under explicit causal assumptions.
Contribution
It introduces a causal framework for assessing and adjusting bias in genotype-based AMR prediction models, comparing bias-handling methods with standard approaches.
Findings
Bias due to species, location, and year affects AMR prediction.
Bias adjustment yields 1-5% AUROC improvement.
Model performance remains high with AUROCs around 0.94-0.95.
Abstract
Whole genome sequencing (WGS) is quickly becoming the customary means for identification of antimicrobial resistance (AMR) due to its ability to obtain high resolution information about the genes and mechanisms that are causing resistance and driving pathogen mobility. By contrast, traditional phenotypic (antibiogram) testing cannot easily elucidate such information. Yet development of AMR prediction tools from genotype-phenotype data can be biased, since sampling is non-randomized. Sample provenience, period of collection, and species representation can confound the association of genetic traits with AMR. Thus, prediction models can perform poorly on new data with sampling distribution shifts. In this work -- under an explicit set of causal assumptions -- we evaluate the effectiveness of propensity-based rebalancing and confounding adjustment on AMR prediction using genotype-phenotype…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAntibiotic Resistance in Bacteria · Mycobacterium research and diagnosis · Evolution and Genetic Dynamics
MethodsLogistic Regression
