Protein Repeats from First Principles
Pablo Turjanski, R. Gonzalo Parra, Roc\'io Espada, Ver\'onica Becher,, Diego U. Ferreiro

TL;DR
This paper introduces a mathematical framework to analyze protein repeats, revealing that natural proteins often contain repetitive sequences within families, and presents a fast method for classifying proteins based on these repeats.
Contribution
It provides a systematic quantification of protein repetitiveness and a novel classifier for protein family assignment based on repeat analysis.
Findings
Long perfect repeats are rare within individual proteins.
Natural repeat proteins show abundant repeats of 6+ amino acids within families.
Repetitiveness also occurs in globular domains, not just repeat proteins.
Abstract
Some natural proteins display recurrent structural patterns. Despite being highly similar at the tertiary structure level, repetitions within a single repeat protein can be extremely variable at the sequence level. We propose a mathematical definition of a repeat and investigate the occurrences of these in different protein families. We found that long stretches of perfect repetitions are infrequent in individual natural proteins, even for those which are known to fold into structures of recurrent structural motifs. We found that natural repeat proteins are indeed repetitive in their families, exhibiting abundant stretches of 6 amino acids or longer that are perfect repetitions in the reference family. We provide a systematic quantification for this repetitiveness, and show that this form of repetitiveness is not exclusive of repeat proteins, but also occurs in globular domains. A…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsProtein Structure and Dynamics · Algorithms and Data Compression · Enzyme Structure and Function
