Ridge Regression Estimated Linear Probability Model Predictions of   N-glycosylation in Proteins with Structural and Sequence Data

Rajaram Gana; Swagata Naha; Raja Mazumder; Radoslav Goldman; and Sona; Vasudevan

arXiv:1803.06002·q-bio.QM·March 26, 2018

Ridge Regression Estimated Linear Probability Model Predictions of N-glycosylation in Proteins with Structural and Sequence Data

Rajaram Gana, Swagata Naha, Raja Mazumder, Radoslav Goldman, and Sona, Vasudevan

PDF

Open Access

TL;DR

This study develops a ridge regression model to predict N-glycosylation likelihood in human proteins using sequence and structural data, aiding experimental design without requiring prior experimental evidence.

Contribution

The paper introduces a novel ridge regression-based approach that integrates sequence and structural features to predict N-glycosylation in proteins.

Findings

01

Model achieves a Gini coefficient of about 74% (89%)

02

Predicts N-glycosylation likelihood effectively without experimental data

03

Incorporates amino acid distribution, structural attributes, and sequence location

Abstract

Absent experimental evidence, a robust methodology to predict the likelihood of N-glycosylation in human proteins is essential for guiding experimental work. Based on the distribution of amino acids in the neighborhood of the NxS/T sequon (N-site); the structural attributes of the N-site that include Accessible Surface Area, secondary structural elements, main-chain phi-psi, turn types; the relative location of the N-site in the primary sequence; and the nature of the glycan bound, the ridge regression estimated linear probability model is used to predict this likelihood. This model yields a Kolmogorov-Smirnov (Gini coefficient) statistic value of about 74% (89%), which is reasonable.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGlycosylation and Glycoproteins Research · Machine Learning in Bioinformatics · Genomics and Phylogenetic Studies