Protein Secondary Structure Prediction Using Transformers
Manzi Kevin Maxime

TL;DR
This paper introduces a transformer-based model for predicting protein secondary structures from amino acid sequences, leveraging attention mechanisms and data augmentation to improve accuracy and generalization across variable-length sequences.
Contribution
The work presents a novel transformer architecture tailored for protein secondary structure prediction, incorporating a sliding-window data augmentation technique.
Findings
Transformer achieves high accuracy on CB513 dataset
Effective capturing of local and long-range interactions
Improved generalization across sequence lengths
Abstract
Predicting protein secondary structures such as alpha helices, beta sheets, and coils from amino acid sequences is essential for understanding protein function. This work presents a transformer-based model that applies attention mechanisms to protein sequence data to predict structural motifs. A sliding-window data augmentation technique is used on the CB513 dataset to expand the training samples. The transformer shows strong ability to generalize across variable-length sequences while effectively capturing both local and long-range residue interactions.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Bioinformatics · Protein Structure and Dynamics · Fractal and DNA sequence analysis
