A Binary Representation of the Genetic Code
Louis R. Nemzer

TL;DR
This paper proposes a new binary coding scheme for the genetic code that captures structural and physicochemical properties, enabling analysis of mutation effects and amino acid properties.
Contribution
It introduces a hierarchical binary representation of the genetic code that models mutations as binary operations and correlates amino acid properties with code structure.
Findings
Binary representation reflects mutation types as binary operations
Amino acid properties correlate with specific code bits
Mutation impact on protein function can be estimated
Abstract
This article introduces a novel binary representation of the canonical genetic code based on both the structural similarities of the nucleotides, as well as the physicochemical properties of the encoded amino acids. Each of the four mRNA bases is assigned a unique 2-bit identifier, so that the 64 triplet codons are each indexed by a 6-bit label. The ordering of the bits reflects the hierarchical organization manifested by the DNA replication/repair and tRNA translation systems. In this system, transition and transversion mutations are naturally expressed as binary operations, and the severities of the different point mutations can be analyzed. Using a principal component analysis, it is shown that the physicochemical properties of amino acids related to protein folding also correlate with certain bit positions of their respective labels. Thus, the likelihood for a point mutation to be…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
