Calculating complexity of large randomized libraries

Yong Kong

arXiv:1410.5851·q-bio.QM·May 28, 2024

Calculating complexity of large randomized libraries

Yong Kong

PDF

TL;DR

This paper develops formulas and software to accurately calculate the mean and variance of unique sequences in large, randomized libraries with arbitrary nucleotide ratios, aiding in library design and evaluation.

Contribution

It introduces new formulas and a computer program to compute statistics of large randomized libraries with arbitrary nucleotide ratios, surpassing previous methods limited to small, equal ratios.

Findings

01

Nucleotide ratios significantly influence library statistics.

02

Skewed ratios require larger libraries for the same diversity.

03

The software can handle libraries with mutations in over 20 amino acids.

Abstract

Randomized libraries are increasingly popular in protein engineering and other biomedical research fields. Statistics of the libraries are useful to guide and evaluate randomized library construction. Previous works only give the mean of the number of unique sequences in the library, and they can only handle equal molar ratio of the four nucleotides at a small number of mutation sites. We derive formulas to calculate the mean and variance of the number of unique sequences in libraries generated by cassette mutagenesis with mixtures of arbitrary nucleotide ratios. Computer program was developed which utilizes arbitrary numerical precision software package to calculate the statistics of large libraries. The statistics of library with mutations in more than $20$ amino acids can be calculated easily. Results show that the nucleotide ratios have significant effects on these statistics. The…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.