The Midpoint Mixed Model with a Missingness Mechanism (M5): A Likelihood-Based Framework for Quantification of Mass Spectrometry Proteomics Data (Preprint)
Jonathon O'Brien, Harsha Gunawardena, Xian Chen, Joseph Ibrahim,, Bahjat Qaqish

TL;DR
This paper introduces a novel likelihood-based statistical model for proteomics data that effectively accounts for nonignorable missingness and peptide-level matched pairs, significantly improving protein quantification accuracy.
Contribution
It presents the first model to incorporate nonignorable missing data mechanisms while utilizing peptide-level matched pairs in proteomics analysis.
Findings
Model reduces mean squared error by 35% compared to median ratio estimates.
Simulation shows basic ANOVA estimates have 371% higher MSE than median ratio estimates.
Application to breast cancer data increases estimated proteins by 22%.
Abstract
Statistical models for proteomics data often estimate protein fold changes between two samples, A and B, as the average peptide intensity from sample A divided by the average peptide intensity from sample B. Such average intensity ratios fail to take full advantage of the experimental design which eliminates unseen confounding variables by processing peptides from both samples under identical conditions. Typically this structure is exploited through the estimation of a protein ratio as the median ratio of matched peptide intensities. This simple solution fails to account for a substantial missing data bias which has led to the development of more sophisticated average intensity models. Here we develop the first statistical model that accounts for nonignorable missingness while utilizing peptide level matched pairs across samples. Our simulation analysis shows that models which fail to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Proteomics Techniques and Applications · Gene expression and cancer classification · Wheat and Barley Genetics and Pathology
