# Distribution of phenotype sizes in sequence-to-structure   genotype-phenotype maps

**Authors:** Susanna Manrubia, Jose A. Cuesta

arXiv: 1702.00351 · 2017-04-20

## TL;DR

This paper analytically derives the distribution of phenotype sizes in sequence-to-structure genotype-phenotype maps, revealing how different features influence the size distribution and implications for evolvability.

## Contribution

It introduces models that interpolate between powerlaw and lognormal distributions, providing insights into the factors shaping phenotype size distributions.

## Key findings

- Distribution of phenotype sizes varies between powerlaw and lognormal.
- Features of the sequence-to-structure map determine the size distribution.
- Models help understand evolvability and navigability of genotype space.

## Abstract

An essential quantity to ensure evolvability of populations is the navigability of the genotype space. Navigability relies on the existence of sufficiently large genotype networks, that is ensembles of sequences with the same phenotype that guarantee an efficient random drift through sequence space. The number of sequences compatible with a given structure (e.g. the number of RNA sequences folding into a particular secondary structure, or the number of DNA sequences coding for the same protein structure) is astronomically large in all functional molecules investigated. However, an exhaustive experimental or computational study of all RNA folds or all protein structures becomes impossible even for moderately long sequences. Here, we analytically derive the distribution of phenotype sizes for a hierarchy of models which successively incorporate features of increasingly realistic sequence-to-structure genotype-phenotype maps. The main feature of these models relies on the characterization of each phenotype through a prototypical sequence whose sites admit a variable fraction of letters of the alphabet. Our models interpolate between two limit distributions: a powerlaw distribution, when the ordering of sites in the prototypical sequence is strongly constrained, and a lognormal distribution, as suggested for RNA, when different orderings of the same set of sites yield different phenotypes. Our main result is the qualitative and quantitative identification of those features of the sequence-to-structure map that lead to different distributions of phenotype sizes.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1702.00351/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/1702.00351/full.md

## References

44 references — full list in the complete paper: https://tomesphere.com/paper/1702.00351/full.md

---
Source: https://tomesphere.com/paper/1702.00351