DeepIEP: a Peptide Sequence Model of Isoelectric Point (IEP/pI) using Recurrent Neural Networks (RNNs)
Esben Jannik Bjerrum

TL;DR
DeepIEP employs an LSTM-based RNN to accurately predict peptide isoelectric points from sequences, capturing sequence-dependent charge interactions without relying solely on pKa values.
Contribution
This study introduces a novel RNN model for peptide IEP prediction that outperforms traditional pKa-based methods and considers sequence context.
Findings
RMSE of 0.28 on test set
R² of 0.95 indicating high accuracy
Prediction aligns with known amino acid pKa rankings
Abstract
The isoelectric point (IEP or pI) is the pH where the net charge on the molecular ensemble of peptides and proteins is zero. This physical-chemical property is dependent on protonable/deprotonable sidechains and their pKa values. Here an pI prediction model is trained from a database of peptide sequences and pIs using a recurrent neural network (RNN) with long short-term memory (LSTM) cells. The trained model obtains an RMSE and R of 0.28 and 0.95 for the external test set. The model is not based on pKa values, but prediction of constructed test sequences show similar rankings as already known pKa values. The prediction depends mostly on the existence of known acidic and basic amino acids with fine-adjusted based on the neighboring sequence and position of the charged amino acids in the peptide chain.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Bioinformatics · Computational Drug Discovery Methods · Chemical Synthesis and Analysis
