On Recovering Higher-order Interactions from Protein Language Models
Darin Tsui, Amirali Aghazadeh

TL;DR
This paper introduces a Fourier analysis framework to extract higher-order mutational interactions from protein language models efficiently, demonstrating significant reduction in computational cost while maintaining high accuracy.
Contribution
The authors develop a systematic Fourier analysis approach to recover interactions from protein language models, revealing the model's sparsity properties and enabling scalable interaction extraction.
Findings
ESM2 model is dominated by three regions in the sparsity-ruggedness plane.
High interaction recovery accuracy with R^2=0.72 in sparse regions.
Achieved 15,000-fold reduction in computational time.
Abstract
Protein language models leverage evolutionary information to perform state-of-the-art 3D structure and zero-shot variant prediction. Yet, extracting and explaining all the mutational interactions that govern model predictions remains difficult as it requires querying the entire amino acid space for sites using sequences, which is computationally expensive even for moderate values of (e.g., ). Although approaches to lower the sample complexity exist, they often limit the interpretability of the model to just single and pairwise interactions. Recently, computationally scalable algorithms relying on the assumption of sparsity in the Fourier domain have emerged to learn interactions from experimental data. However, extracting interactions from language models poses unique challenges: it's unclear if sparsity is always present or if it is the only metric needed to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiomedical Text Mining and Ontologies · Machine Learning in Bioinformatics · Topic Modeling
