DiffPaSS -- High-performance differentiable pairing of protein sequences using soft scores
Umberto Lupo, Damiano Sgarbossa, Martina Milighetti, Anne-Florence, Bitbol

TL;DR
DiffPaSS is a novel differentiable framework that efficiently pairs protein sequences by optimizing similarity scores, outperforming existing methods and aiding in protein complex structure prediction without requiring sequence alignment.
Contribution
We introduce DiffPaSS, a flexible, fast, and hyperparameter-free differentiable method for pairing protein sequences across various scores, applicable to aligned and non-aligned sequences.
Findings
DiffPaSS outperforms existing algorithms on benchmark datasets.
It effectively predicts protein complex structures.
Applicable to both aligned and non-aligned sequences.
Abstract
Identifying interacting partners from two sets of protein sequences has important applications in computational biology. Interacting partners share similarities across species due to their common evolutionary history, and feature correlations in amino acid usage due to the need to maintain complementary interaction interfaces. Thus, the problem of finding interacting pairs can be formulated as searching for a pairing of sequences that maximizes a sequence similarity or a coevolution score. Several methods have been developed to address this problem, applying different approximate optimization methods to different scores. We introduce DiffPaSS, a differentiable framework for flexible, fast, and hyperparameter-free optimization for pairing interacting biological sequences, which can be applied to a wide variety of scores. We apply it to a benchmark prokaryotic dataset, using mutual…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Bioinformatics · Gene expression and cancer classification · Algorithms and Data Compression
