Subcellular proteome niche discovery using semi-supervised functional clustering
Ziyue Zheng, Loay J. Jabre, Matthew McIlvin, Mak A. Saito, Sangwon Hyun

TL;DR
This paper introduces FSPmix, a semi-supervised clustering tool that improves protein localization predictions in subcellular proteomics, especially for non-model organisms with noisy data.
Contribution
FSPmix is a novel semi-supervised clustering method that leverages partial annotations to enhance subcellular localization predictions in proteomics data.
Findings
FSPmix successfully assigned probabilistic localizations to proteins in a marine diatom dataset.
The method uncovered potentially new protein functions based on localization.
FSPmix demonstrated robustness in low signal-to-noise data regimes.
Abstract
Intracellular compartmentalization of proteins underpins their function and the metabolic processes they sustain. Various mass spectrometry-based proteomics methods (subcellular spatial proteomics) now allow high throughput subcellular protein localization. Yet, the curation, analysis and interpretation of these data remain challenging, particularly in non-model organisms where establishing reliable marker proteins is difficult, and in contexts where experimental replication and subcellular fractionation are constrained. Here, we develop FSPmix, a semi-supervised functional clustering method implemented as an open-source R package, which leverages partial annotations from a subset of marker proteins to predict protein subcellular localization from subcellular spatial proteomics data. This method explicitly assumes that protein signatures vary smoothly across subcellular fractions,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Proteomics Techniques and Applications · Machine Learning in Bioinformatics · Bioinformatics and Genomic Networks
