Sparse generative modeling via parameter-reduction of Boltzmann machines: application to protein-sequence families
Pierre Barrat-Charlaix, Anna Paola Muntoni, Kai Shimagaki, Martin, Weigt, Francesco Zamponi

TL;DR
This paper introduces a parameter-reduction method for Boltzmann machines used in protein sequence modeling, significantly simplifying models while maintaining their predictive power and robustness.
Contribution
It presents a novel iterative decimation procedure that reduces couplings in Boltzmann models, improving interpretability and robustness without sacrificing performance.
Findings
Removed over 90% of couplings in protein models
Preserved predictive and generative capabilities
Models became less sensitive to noise
Abstract
Boltzmann machines (BM) are widely used as generative models. For example, pairwise Potts models (PM), which are instances of the BM class, provide accurate statistical models of families of evolutionarily related protein sequences. Their parameters are the local fields, which describe site-specific patterns of amino-acid conservation, and the two-site couplings, which mirror the coevolution between pairs of sites. This coevolution reflects structural and functional constraints acting on protein sequences during evolution. The most conservative choice to describe the coevolution signal is to include all possible two-site couplings into the PM. This choice, typical of what is known as Direct Coupling Analysis, has been successful for predicting residue contacts in the three-dimensional structure, mutational effects, and in generating new functional sequences. However, the resulting PM…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
