BERT and LLMs-Based avGFP Brightness Prediction and Mutation Design

X. Guo; W.Che

arXiv:2407.20534·q-bio.OT·July 31, 2024

BERT and LLMs-Based avGFP Brightness Prediction and Mutation Design

X. Guo, W.Che

PDF

Open Access

TL;DR

This paper employs Transformer and large language models to predict avGFP brightness and design brighter mutants, demonstrating deep learning's potential in protein engineering and offering new methodologies for future research.

Contribution

It introduces a novel approach combining BERT and LLMs for avGFP brightness prediction and mutant design, integrating prior knowledge for improved protein engineering.

Findings

01

Designed and screened 10 new high-brightness avGFP mutants

02

Developed a Transformer-based prediction model with high accuracy

03

Showed the effectiveness of LLMs in guiding protein mutation design

Abstract

This study aims to utilize Transformer models and large language models (such as GPT and Claude) to predict the brightness of Aequorea victoria green fluorescent protein (avGFP) and design mutants with higher brightness. Considering the time and cost associated with traditional experimental screening methods, this study employs machine learning techniques to enhance research efficiency. We first read and preprocess a proprietary dataset containing approximately 140,000 protein sequences, including about 30,000 avGFP sequences. Subsequently, we constructed and trained a Transformer-based prediction model to screen and design new avGFP mutants that are expected to exhibit higher brightness. Our methodology consists of two primary stages: first, the construction of a scoring model using BERT, and second, the screening and generation of mutants using mutation site statistics and large…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Computing and Algorithms · Genomics and Phylogenetic Studies

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Cosine Annealing · Linear Layer · Attention Dropout · Residual Connection · Linear Warmup With Cosine Annealing · Linear Warmup With Linear Decay · Discriminative Fine-Tuning · WordPiece · Multi-Head Attention