Develop machine learning based predictive models for engineering protein solubility
X. Han, X. Wang, K. Zhou

TL;DR
This paper develops machine learning models to predict protein solubility as a continuous variable from amino acid sequences, aiding protein engineering and potentially serving as an indirect predictor of protein activity.
Contribution
It introduces a novel approach predicting protein solubility in continuous values, improving upon binary models and achieving 76.28% accuracy with SVM.
Findings
Achieved 76.28% prediction accuracy using SVM.
Predicted solubility as continuous values enhances protein engineering.
Models can indirectly predict protein activity from sequence.
Abstract
Protein activity is a significant characteristic for recombinant proteins which can be used as biocatalysts. High activity of proteins reduces the cost of biocatalysts. A model that can predict protein activity from amino acid sequence is highly desired, as it aids experimental improvement of proteins. However, only limited data for protein activity are currently available, which prevents the development of such models. Since protein activity and solubility are correlated for some proteins, the publicly available solubility dataset may be adopted to develop models that can predict protein solubility from sequence. The models could serve as a tool to indirectly predict protein activity from sequence. In literature, predicting protein solubility from sequence has been intensively explored, but the predicted solubility represented in binary values from all the developed models was not…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsProtein purification and stability · Protein Structure and Dynamics · Microbial Metabolic Engineering and Bioproduction
