Identifying relevant positions in proteins by Critical Variable   Selection

Silvia Grigolon; Silvio Franz; Matteo Marsili

arXiv:1503.03815·q-bio.QM·April 12, 2016

Identifying relevant positions in proteins by Critical Variable Selection

Silvia Grigolon, Silvio Franz, Matteo Marsili

PDF

TL;DR

This paper introduces Critical Variable Selection, a new method to identify key sites in proteins from sequence data, capturing complex dependencies beyond pairwise correlations and revealing biologically relevant structural and functional sites.

Contribution

The paper presents a novel method for extracting relevant protein sites from sequence alignments that captures higher-order dependencies and complements existing analysis techniques.

Findings

01

Recovers information beyond pairwise correlations

02

Works effectively with small datasets of a few hundred sequences

03

Identifies biologically relevant sites consistent with known data

Abstract

Evolution in its course found a variety of solutions to the same optimisation problem. The advent of high-throughput genomic sequencing has made available extensive data from which, in principle, one can infer the underlying structure on which biological functions rely. In this paper, we present a new method aimed at extracting sites encoding structural and func- tional properties from a set of protein primary sequences, namely a Multiple Sequence Alignment. The method, called Critical Variable Selection, is based on the idea that subsets of relevant sites cor- respond to subsequences that occur with a particularly broad frequency distribution in the dataset. By applying this algorithm to in silico sequences, to the Response Regulator Receiver and to the Voltage Sensor Domain of Ion Channels, we show that this procedure recovers not only information encoded in single site statistics and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.