A Bayesian framework for molecular strain identification from mixed diagnostic samples
Lauri Mustonen, Xiangxi Gao, Asteroide Santana, Rebecca Mitchell, Ymir, Vigfusson, Lars Ruthotto

TL;DR
This paper introduces a Bayesian computational framework for identifying multiple microbial strains from mixed DNA samples, crucial for public health diagnostics and outbreak monitoring.
Contribution
It formulates strain identification as a Bayesian inverse problem with binary and real-valued variables, and develops two scalable algorithms for solving the non-convex optimization.
Findings
Effective in synthetic and experimental data
Provides uncertainty quantification of solutions
Addresses binary constraints in strain identification
Abstract
We provide a mathematical formulation and develop a computational framework for identifying multiple strains of microorganisms from mixed samples of DNA. Our method is applicable in public health domains where efficient identification of pathogens is paramount, e.g., for the monitoring of disease outbreaks. We formulate strain identification as an inverse problem that aims at simultaneously estimating a binary matrix (encoding presence or absence of mutations in each strain) and a real-valued vector (representing the mixture of strains) such that their product is approximately equal to the measured data vector. The problem at hand has a similar structure to blind deconvolution, except for the presence of binary constraints, which we enforce in our approach. Following a Bayesian approach, we derive a posterior density. We present two computational methods for solving the non-convex…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
