TL;DR
This paper explores how attention mechanisms in Transformer models for proteins reveal structural and functional insights, such as folding, binding sites, and biophysical properties, across multiple architectures and datasets.
Contribution
It introduces methods to interpret attention in protein Transformers, linking attention patterns to protein structure and function, with visualizations and cross-architecture validation.
Findings
Attention captures protein folding structure.
Attention targets binding sites.
Attention focuses on biophysical properties with depth.
Abstract
Transformer architectures have proven to learn useful representations for protein classification and generation tasks. However, these representations present challenges in interpretability. In this work, we demonstrate a set of methods for analyzing protein Transformer models through the lens of attention. We show that attention: (1) captures the folding structure of proteins, connecting amino acids that are far apart in the underlying sequence, but spatially close in the three-dimensional structure, (2) targets binding sites, a key functional component of proteins, and (3) focuses on progressively more complex biophysical properties with increasing layer depth. We find this behavior to be consistent across three Transformer architectures (BERT, ALBERT, XLNet) and two distinct protein datasets. We also present a three-dimensional visualization of the interaction between attention and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · WordPiece · Refunds@Expedia|||How do I get a full refund from Expedia? · LAMB · ALBERT · Residual Connection · Label Smoothing · Multi-Head Attention
