ProtAlign: Contrastive learning paradigm for Sequence and structure alignment
Aditya Ranganath, Hasin Us Sami, Kowshik Thopalli, Bhavya Kailkhura, Wesam Sakla

TL;DR
This paper introduces ProtAlign, a contrastive learning framework that aligns protein sequences and structures in a shared embedding space, enhancing cross-modal retrieval and downstream protein prediction tasks.
Contribution
ProtAlign is the first contrastive learning approach to unify protein sequence and structure embeddings, enabling improved cross-modal retrieval and functional predictions.
Findings
Enhanced cross-modal retrieval accuracy
Improved protein function annotation performance
Better structural organization understanding
Abstract
Protein language models often take into consideration the alignment between a protein sequence and its textual description. However, they do not take structural information into consideration. Traditional methods treat sequence and structure separately, limiting the ability to exploit the alignment between the structure and protein sequence embeddings. In this paper, we introduce a sequence structure contrastive alignment framework, which learns a shared embedding space where proteins are represented consistently across modalities. By training on large-scale pairs of sequences and experimentally resolved or predicted structures, the model maximizes agreement between matched sequence structure pairs while pushing apart unrelated pairs. This alignment enables cross-modal retrieval (e.g., finding structural neighbors given a sequence), improves downstream prediction tasks such as function…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsProtein Structure and Dynamics · Machine Learning in Bioinformatics · vaccines and immunoinformatics approaches
