Analysis of Diagnostics (Part I): Prevalence, Uncertainty Quantification, and Machine Learning
Paul N. Patrone, Raquel A. Binder, Catherine S. Forconi, Ann M., Moormann, Anthony J. Kearsley

TL;DR
This paper explores how prevalence influences classification accuracy and uncertainty quantification in machine learning, introducing new theoretical insights and a numerical algorithm validated on synthetic and SARS-CoV-2 data.
Contribution
It establishes a connection between prevalence and classification theory, introduces a homotopy algorithm for estimating probability level-sets, and enhances uncertainty quantification in ML.
Findings
Classifiers minimizing prevalence-weighted error contain the same info as Bayes classifiers.
Proposed homotopy algorithm effectively estimates probability level-sets.
Validated methods on synthetic data and SARS-CoV-2 ELISA assay.
Abstract
Diagnostic testing provides a unique setting for studying and developing tools in classification theory. In such contexts, the concept of prevalence, i.e. the number of individuals with a given condition, is fundamental, both as an inherent quantity of interest and as a parameter that controls classification accuracy. This manuscript is the first in a two-part series that studies deeper connections between classification theory and prevalence, showing how the latter establishes a more complete theory of uncertainty quantification (UQ) for certain types of machine learning (ML). We motivate this analysis via a lemma demonstrating that general classifiers minimizing a prevalence-weighted error contain the same probabilistic information as Bayes-optimal classifiers, which depend on conditional probability densities. This leads us to study relative probability level-sets ,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTuberculosis Research and Epidemiology · Computational Drug Discovery Methods · Statistical Methods and Inference
