Protein abundance inference via expectation-maximization in fluorosequencing
Javier Kipen, Matthew Beauregard Smith, Thomas Blom, Sophia Bailing Zhou, Edward M Marcotte, Joakim Jaldén

TL;DR
This paper introduces a new method using expectation-maximization to estimate protein abundances from fluorosequencing data, improving accuracy and scalability.
Contribution
A novel probabilistic framework using EM for fluorosequencing data to infer protein abundances is introduced.
Findings
The algorithm processes one million reads in under ten seconds and reduces error significantly.
Ten million reads are processed in under four hours on a GPU, showing scalability.
Improved fluorosequencing chemistry could lead to more accurate protein abundance estimates.
Abstract
Fluorosequencing generates millions of single peptide reads, yet a principled route to quantitative protein abundances has been lacking. We present a probabilistic framework that adapts expectation–maximization (EM) to the fluorosequencing measurement process, using posterior peptide probabilities from existing classifiers to estimate relative protein abundances. The algorithm iteratively updates abundances to maximize the likelihood of observed reads. We first evaluate five-protein simulations with realistic labeling and system errors. A simple Python implementation processes one million reads in under ten seconds on a standard workstation and reduces the mean absolute error by over an order of magnitude relative to a uniform-abundance guess, indicating robust performance in small-scale settings. We also assess scalability with full human-proteome simulations (20 642 proteins). Ten…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Proteomics Techniques and Applications · Machine Learning in Bioinformatics · vaccines and immunoinformatics approaches
