Open-Source Protein Language Models for Function Prediction and Protein Design
Shivasankaran Vanaja Pandi, Bharath Ramsundar

TL;DR
This paper integrates open-source protein language models into a widely-used framework to make protein function prediction and enzyme design more accessible, demonstrating promising results and laying groundwork for future research.
Contribution
The study introduces an accessible platform combining PLMs with DeepChem, enabling broader use in protein research and enzyme design without extensive computational resources.
Findings
Achieves reasonable performance on protein prediction benchmarks.
Demonstrates potential for generating enzyme candidates via embeddings.
Provides a foundation for future enzyme design research.
Abstract
Protein language models (PLMs) have shown promise in improving the understanding of protein sequences, contributing to advances in areas such as function prediction and protein engineering. However, training these models from scratch requires significant computational resources, limiting their accessibility. To address this, we integrate a PLM into DeepChem, an open-source framework for computational biology and chemistry, to provide a more accessible platform for protein-related tasks. We evaluate the performance of the integrated model on various protein prediction tasks, showing that it achieves reasonable results across benchmarks. Additionally, we present an exploration of generating plastic-degrading enzyme candidates using the model's embeddings and latent space manipulation techniques. While the results suggest that further refinement is needed, this approach provides a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Bioinformatics · Biomedical Text Mining and Ontologies · Genetics, Bioinformatics, and Biomedical Research
