Sampling at intermediate temperatures is optimal for training large language models in protein structure prediction
L. Ghiringhelli, A. Zambon, G. Tiana

TL;DR
This study uses a statistical mechanics approach to analyze transformer models for protein structure prediction, identifying optimal intermediate temperatures for training and highlighting the importance of embedding dimension.
Contribution
It introduces a framework sampling the loss landscape at different temperatures, revealing optimal training conditions and the significance of attention matrices in protein contact prediction.
Findings
Transformers lack a first-order transition in loss, allowing good learning at intermediate temperatures.
Optimal embedding dimension correlates with conserved parameters across layers.
Attention matrices are more predictive of protein contact maps at higher temperatures and embedding dimensions.
Abstract
We investigate the parameter space of transformer models trained on protein sequence data using a statistical mechanics framework, sampling the loss landscape at varying temperatures by Langevin dynamics to characterize the low-loss manifold and understand the mechanisms underlying the superior performance of transformers in protein structure prediction. We find that, at variance with feedforward networks, the lack of a first--order--like transition in the loss of the transformer produces a range of intermediate temperatures with good learning properties. We show that the parameters of most layers are highly conserved at these temperatures if the dimension of the embedding is optimal, and we provide an operative way to find this dimension. Finally, we show that the attention matrix is more predictive of the contact maps of the protein at higher temperatures and for higher dimensions of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
