BERT and LLMs-Based avGFP Brightness Prediction and Mutation Design
X. Guo, W.Che

TL;DR
This paper employs Transformer and large language models to predict avGFP brightness and design brighter mutants, demonstrating deep learning's potential in protein engineering and offering new methodologies for future research.
Contribution
It introduces a novel approach combining BERT and LLMs for avGFP brightness prediction and mutant design, integrating prior knowledge for improved protein engineering.
Findings
Designed and screened 10 new high-brightness avGFP mutants
Developed a Transformer-based prediction model with high accuracy
Showed the effectiveness of LLMs in guiding protein mutation design
Abstract
This study aims to utilize Transformer models and large language models (such as GPT and Claude) to predict the brightness of Aequorea victoria green fluorescent protein (avGFP) and design mutants with higher brightness. Considering the time and cost associated with traditional experimental screening methods, this study employs machine learning techniques to enhance research efficiency. We first read and preprocess a proprietary dataset containing approximately 140,000 protein sequences, including about 30,000 avGFP sequences. Subsequently, we constructed and trained a Transformer-based prediction model to screen and design new avGFP mutants that are expected to exhibit higher brightness. Our methodology consists of two primary stages: first, the construction of a scoring model using BERT, and second, the screening and generation of mutants using mutation site statistics and large…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Computing and Algorithms · Genomics and Phylogenetic Studies
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Cosine Annealing · Linear Layer · Attention Dropout · Residual Connection · Linear Warmup With Cosine Annealing · Linear Warmup With Linear Decay · Discriminative Fine-Tuning · WordPiece · Multi-Head Attention
