ProtAlign: Contrastive learning paradigm for Sequence and structure alignment

Aditya Ranganath; Hasin Us Sami; Kowshik Thopalli; Bhavya Kailkhura; Wesam Sakla

arXiv:2603.06722·cs.LG·March 10, 2026

ProtAlign: Contrastive learning paradigm for Sequence and structure alignment

Aditya Ranganath, Hasin Us Sami, Kowshik Thopalli, Bhavya Kailkhura, Wesam Sakla

PDF

Open Access

TL;DR

This paper introduces ProtAlign, a contrastive learning framework that aligns protein sequences and structures in a shared embedding space, enhancing cross-modal retrieval and downstream protein prediction tasks.

Contribution

ProtAlign is the first contrastive learning approach to unify protein sequence and structure embeddings, enabling improved cross-modal retrieval and functional predictions.

Findings

01

Enhanced cross-modal retrieval accuracy

02

Improved protein function annotation performance

03

Better structural organization understanding

Abstract

Protein language models often take into consideration the alignment between a protein sequence and its textual description. However, they do not take structural information into consideration. Traditional methods treat sequence and structure separately, limiting the ability to exploit the alignment between the structure and protein sequence embeddings. In this paper, we introduce a sequence structure contrastive alignment framework, which learns a shared embedding space where proteins are represented consistently across modalities. By training on large-scale pairs of sequences and experimentally resolved or predicted structures, the model maximizes agreement between matched sequence structure pairs while pushing apart unrelated pairs. This alignment enables cross-modal retrieval (e.g., finding structural neighbors given a sequence), improves downstream prediction tasks such as function…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsProtein Structure and Dynamics · Machine Learning in Bioinformatics · vaccines and immunoinformatics approaches