Ridge Regression Estimated Linear Probability Model Predictions of O-glycosylation in Proteins with Structural and Sequence Data
Rajaram Gana, Sona Vasudevan

TL;DR
This study uses ridge regression to predict O-GlcNAc glycosylation in human proteins, demonstrating the importance of structural data alongside sequence information for accurate predictions.
Contribution
It introduces a linear probability model incorporating structural and sequence data to predict O-glycosylation, highlighting structural features' significance.
Findings
Structural attributes significantly predict glycosylation.
Sequence data alone yields lower prediction accuracy.
Structural data improves model performance (KS=99%).
Abstract
The likelihood of O-GlcNAc glycosylation in human proteins is predicted using the ridge regression estimated linear probability model (LPM). To achieve this, sequences from three similar post-translational modifications (PTMs) of proteins occurring at, or very near, the S or T site are analyzed: N-glycosylation, O-mucin type (O-GalNAc) glycosylation, and phosphorylation. Results found include: 1) The consensus composite sequon for O-glycosylation does NOT have W on either side of the glycosylation site. 2) The same holds for the consensus sequon for phosphorylation. 3) For LPM estimation, N-glycosylated sequences are found to be good approximations to non-O-glycosylatable sequences. 4) The selective positioning of an amino acid along the sequence, differentiates the PTMs of proteins. 5) Some N-glycosylated sequences are also phosphorylated at the S or T site. 6) ASA values for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
