Generative modeling, design and analysis of spider silk protein sequences for enhanced mechanical properties
Wei Lu, David L. Kaplan, Markus J. Buehler

TL;DR
This paper introduces a generative language model for designing novel spider silk protein sequences with enhanced mechanical properties, enabling exploration of sequence-property relationships and synthetic silk development.
Contribution
A novel deep learning model that predicts and designs spider silk sequences with desired mechanical traits, expanding the silkome dataset and aiding synthetic silk engineering.
Findings
Generated sequences exhibit properties not found in nature.
Model accurately classifies and evaluates protein sequences.
Sequence motifs linked to mechanical properties identified.
Abstract
Spider silks are remarkable materials characterized by superb mechanical properties such as strength, extensibility and lightweightedness. Yet, to date, limited models are available to fully explore sequence-property relationships for analysis and design. Here we propose a custom generative large-language model to enable design of novel spider silk protein sequences to meet complex combinations of target mechanical properties. The model, pretrained on a large set of protein sequences, is fine-tuned on ~1,000 major ampullate spidroin (MaSp) sequences for which associated fiber-level mechanical properties exist, to yield an end-to-end forward and inverse generative strategy. Performance is assessed through: (1), a novelty analysis and protein type classification for generated spidroin sequences through BLAST searches, (2) property evaluation and comparison with similar sequences, (3)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSilk-based biomaterials and applications · Biochemical and Structural Characterization · Genomics and Phylogenetic Studies
