Coding limits on the number of transcription factors
Shalev Itzkovitz, Tsvi Tlusty, Uri Alon

TL;DR
This study investigates the upper limits of transcription factor numbers in genomes, revealing bounds related to DNA recognition and supporting coding theory predictions about binding specificity and functional similarity.
Contribution
It provides evidence of bounded transcription factor super-family numbers and links these bounds to DNA recognition mechanisms and coding theory predictions.
Findings
Number of transcription factors from most super-families is bounded.
Maximum transcription factors correlate with DNA bases recognized.
Similar binding sequences tend to regulate similar functions.
Abstract
Transcription factor proteins bind specific DNA sequences to control the expression of genes. They contain DNA binding domains which belong to several super-families, each with a specific mechanism of DNA binding. The total number of transcription factors encoded in a genome increases with the number of genes in the genome. Here, we examined the number of transcription factors from each super-family in diverse organisms. We find that the number of transcription factors from most super-families appears to be bounded. For example, the number of winged helix factors does not generally exceed 300, even in very large genomes. The magnitude of the maximal number of transcription factors from each super-family seems to correlate with the number of DNA bases effectively recognized by the binding mechanism of that super-family. Coding theory predicts that such upper bounds on the number of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenomics and Chromatin Dynamics · Gene Regulatory Network Analysis · Gene expression and cancer classification
