Protein Secondary Structure Prediction Using Transformers

Manzi Kevin Maxime

arXiv:2512.08613·cs.AI·December 10, 2025

Protein Secondary Structure Prediction Using Transformers

Manzi Kevin Maxime

PDF

Open Access

TL;DR

This paper introduces a transformer-based model for predicting protein secondary structures from amino acid sequences, leveraging attention mechanisms and data augmentation to improve accuracy and generalization across variable-length sequences.

Contribution

The work presents a novel transformer architecture tailored for protein secondary structure prediction, incorporating a sliding-window data augmentation technique.

Findings

01

Transformer achieves high accuracy on CB513 dataset

02

Effective capturing of local and long-range interactions

03

Improved generalization across sequence lengths

Abstract

Predicting protein secondary structures such as alpha helices, beta sheets, and coils from amino acid sequences is essential for understanding protein function. This work presents a transformer-based model that applies attention mechanisms to protein sequence data to predict structural motifs. A sliding-window data augmentation technique is used on the CB513 dataset to expand the training samples. The transformer shows strong ability to generalize across variable-length sequences while effectively capturing both local and long-range residue interactions.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Bioinformatics · Protein Structure and Dynamics · Fractal and DNA sequence analysis