Symmetry and Minimum Principle at the Basis of the Genetic Code

A. Sciarrino; P.Sorba

arXiv:1704.00940·q-bio.OT·April 5, 2017

Symmetry and Minimum Principle at the Basis of the Genetic Code

A. Sciarrino, P.Sorba

PDF

Open Access

TL;DR

This paper reviews a symmetry-based mathematical model of the genetic code, demonstrating its applications in understanding codon usage, amino-acid properties, and evolutionary dynamics through a minimum energy principle.

Contribution

It introduces and applies the Crystal Basis Model, a symmetry-based framework, to analyze genetic code structure, codon-anticodon interactions, and evolutionary patterns.

Findings

01

Sum rules for codon usage probabilities verified.

02

Relations between amino-acid properties established.

03

Genetic code evolution modeled with good agreement.

Abstract

The importance of the notion of symmetry in physics is well established: could it also be the case for the genetic code? In this spirit, a model for the Genetic Code based on continuous symmetries and entitled the "Crystal Basis Model" has been proposed a few years ago. The present paper is a review of the model, of some of its first applications as well as of its recent developments. Indeed, after a motivated presentation of our mathematical model, we illustrate its pertinence by applying it for the elaboration and verification of sum rules for codon usage probabilities, as well as for establishing relations and some predictions between physical-chemical properties of amino-acids. Then, defining in this context a "bio-spin" structure for the nucleotides and codons, the interaction between a couple of codon-anticodon can simply be represented by a (bio) spin-spin potential. This…

Tables4

Table 1. Table 1: The eukariotic code

codon	a.a.	codon	a.a	codon	amino acid	codon	a.a.
CCC	Pro P	UCC	Ser S	GCC	Ala A	ACC	Thr T
CCU	Pro P	UCU	Ser S	GCU	Ala A	ACU	Thr T
CCG	Pro P	UCG	Ser S	GCG	Ala A	ACG	Thr T
CCA	Pro P	UCA	Ser S	GCA	Ala A	ACA	Thr T
CUC	Leu L	UUC	Phe F	GUC	Val V	AUC	Ile I
CUU	Leu L	UUU	Phe F	GUU	Val V	AUU	Ile I
CUG	Leu L	UUG	Leu L	GUG	Val V	AUG	Met M
CUA	Leu L	UUA	Leu L	GUA	Val V	AUA	Ile I
CGC	Arg R	UGC	Cys C	GGC	Gly G	AGC	Ser S
CGU	Arg R	UGU	Cys C	GGU	Gly G	AGU	Ser S
CGG	Arg R	UGG	Trp W	GGG	Gly G	AGG	Arg R
CGA	Arg R	UGA	Stop	GGA	Gly G	AGA	Arg R
CAC	His H	UAC	Tyr Y	GAC	Asp D	AAC	Asn N
CAU	His H	UAU	Tyr Y	GAU	Asp D	AAU	Asn N
CAG	Gln Q	UAG	Stop	GAG	Glu E	AAG	Lys K
CAA	Gln Q	UAA	Stop	GAA	Glu E	AAA	Lys K

Table 2. Table 2: Assignments of the codons of the eukariotic code in the crystal basis model. The upper label denotes different irreducible representations.

codon	a.a	$J_{H}$	$J_{V}$	$J_{H, 3}$	$J_{V, 3}$	codon	a.a.	$J_{H}$	$J_{V}$	$J_{H, 3}$	$J_{V, 3}$
. CCC	Pro P	3/2	3/2	3/2	3/2	UCC	Ser S	3/2	3/2	1/2	3/2
. CCU	Pro P	(1/2	3/2 $)^{1}$	1/2	3/2	UCU	Ser S	(1/2	3/2 $)^{1}$	$-$ 1/2	3/2
. CCG	Pro P	(3/2	1/2 $)^{1}$	3/2	1/2	UCG	Ser S	(3/2	1/2 $)^{1}$	1/2	1/2
. CCA	Pro P	(1/2	1/2 $)^{1}$	1/2	1/2	UCA	Ser S	(1/2	1/2 $)^{1}$	$-$ 1/2	1/2
. CUC	Leu L	(1/2	3/2 $)^{2}$	1/2	3/2	UUC	Phe F	3/2	3/2	$-$ 1/2	3/2
. CUU	Leu L	(1/2	3/2 $)^{2}$	$-$ 1/2	3/2	UUU	Phe F	3/2	3/2	$-$ 3/2	3/2
. CUG	Leu L	(1/2	1/2 $)^{3}$	1/2	1/2	UUG	Leu L	(3/2	1/2 $)^{1}$	$-$ 1/2	1/2
. CUA	Leu L	(1/2	1/2 $)^{3}$	$-$ 1/2	1/2	UUA	Leu L	(3/2	1/2 $)^{1}$	$-$ 3/2	1/2
. CGC	Arg R	(3/2	1/2 $)^{2}$	3/2	1/2	UGC	Cys C	(3/2	1/2 $)^{2}$	1/2	1/2
. CGU	Arg R	(1/2	1/2 $)^{2}$	1/2	1/2	UGU	Cys C	(1/2	1/2 $)^{2}$	$-$ 1/2	1/2
. CGG	Arg R	(3/2	1/2 $)^{2}$	3/2	$-$ 1/2	UGG	Trp W	(3/2	1/2 $)^{2}$	1/2	$-$ 1/2
. CGA	Arg R	(1/2	1/2 $)^{2}$	1/2	$-$ 1/2	UGA	Ter	(1/2	1/2 $)^{2}$	$-$ 1/2	$-$ 1/2
. CAC	His H	(1/2	1/2 $)^{4}$	1/2	1/2	UAC	Tyr Y	(3/2	1/2 $)^{2}$	$-$ 1/2	1/2
. CAU	His H	(1/2	1/2 $)^{4}$	$-$ 1/2	1/2	UAU	Tyr Y	(3/2	1/2 $)^{2}$	$-$ 3/2	1/2
. CAG	Gln Q	(1/2	1/2 $)^{4}$	1/2	$-$ 1/2	UAG	Ter	(3/2	1/2 $)^{2}$	$-$ 1/2	$-$ 1/2
. CAA	Gln Q	(1/2	1/2 $)^{4}$	$-$ 1/2	$-$ 1/2	UAA	Ter	(3/2	1/2 $)^{2}$	$-$ 3/2	$-$ 1/2
. GCC	Ala A	3/2	3/2	3/2	1/2	ACC	Thr T	3/2	3/2	1/2	1/2
. GCU	Ala A	(1/2	3/2 $)^{1}$	1/2	1/2	ACU	Thr T	(1/2	3/2 $)^{1}$	$-$ 1/2	1/2
. GCG	Ala A	(3/2	1/2 $)^{1}$	3/2	$-$ 1/2	ACG	Thr T	(3/2	1/2 $)^{1}$	1/2	$-$ 1/2
. GCA	Ala A	(1/2	1/2 $)^{1}$	1/2	$-$ 1/2	ACA	Thr T	(1/2	1/2 $)^{1}$	$-$ 1/2	$-$ 1/2
. GUC	Val V	(1/2	3/2 $)^{2}$	1/2	1/2	AUC	Ile I	3/2	3/2	$-$ 1/2	1/2
. GUU	Val V	(1/2	3/2 $)^{2}$	$-$ 1/2	1/2	AUU	Ile I	3/2	3/2	$-$ 3/2	1/2
. GUG	Val V	(1/2	1/2 $)^{3}$	1/2	$-$ 1/2	AUG	Met M	(3/2	1/2 $)^{1}$	$-$ 1/2	$-$ 1/2
. GUA	Val V	(1/2	1/2 $)^{3}$	$-$ 1/2	$-$ 1/2	AUA	Ile I	(3/2	1/2 $)^{1}$	$-$ 3/2	$-$ 1/2
. GGC	Gly G	3/2	3/2	3/2	$-$ 1/2	AGC	Ser S	3/2	3/2	1/2	$-$ 1/2
. GGU	Gly G	(1/2	3/2 $)^{1}$	1/2	$-$ 1/2	AGU	Ser S	(1/2	3/2 $)^{1}$	$-$ 1/2	$-$ 1/2
. GGG	Gly G	3/2	3/2	3/2	$-$ 3/2	AGG	Arg R	3/2	3/2	1/2	$-$ 3/2
. GGA	Gly G	(1/2	3/2 $)^{1}$	1/2	$-$ 3/2	AGA	Arg R	(1/2	3/2 $)^{1}$	$-$ 1/2	$-$ 3/2
. GAC	Asp D	(1/2	3/2 $)^{2}$	1/2	$-$ 1/2	AAC	Asn N	3/2	3/2	$-$ 1/2	$-$ 1/2
. GAU	Asp D	(1/2	3/2 $)^{2}$	$-$ 1/2	$-$ 1/2	AAU	Asn N	3/2	3/2	$-$ 3/2	$-$ 1/2
. GAG	Glu E	(1/2	3/2 $)^{2}$	1/2	$-$ 3/2	AAG	Lys K	3/2	3/2	$-$ 1/2	$-$ 3/2
. GAA	Glu E	(1/2	3/2 $)^{2}$	$-$ 1/2	$-$ 3/2	AAA	Lys K	3/2	3/2	$-$ 3/2	$-$ 3/2

Table 3. Table 3: The vertebral mitochondrial code. In bold (italic) the anticodons reading quadruplets (resp. doublets).

codon	a.a.	anticodon	codon	a.a.	anticodon
. CCC	P		UCC	S
. CCU	P		UCU	S
. CCG	P	UGG	UCG	S	UGA
. CCA	P		UCA	S
. CUC	L		UUC	F
. CUU	L		UUU	F	GAA
. CUG	L	UAG	UUG	L
. CUA	L		UUA	L	UAA
. CGC	R		UGC	C
. CGU	R		UGU	C	GCA
. CGG	R	UCG	UGG	W
. CGA	R		UGA	W	UCA
. CAC	H		UAC	Y
. CAU	H	GUG	UAU	Y	GUA
. CAG	Q		UAG	Te	—–
. CAA	Q	UUG	UAA	Ter	—–
. GCC	A		ACC	T
. GCU	A		ACU	T
. GCG	A	UGC	ACG	T	UGU
. GCA	A		ACA	T
. GUC	V		AUC	I
. GUU	V		AUU	I	GAU
. GUG	V	UAC	AUG	M
. GUA	V		AUA	M	CAU
. GGC	G		AGC	S
. GGU	G		AGU	S	GCU
. GGG	G	UCC	AGG	Ter	—-
. GGA	G		AGA	Ter	—–
. GAC	D		AAC	N
. GAU	D	GUC	AAU	N	GUU
. GAG	E		AAG	K
. GAA	E	UUC	AAA	K	UUU

Table 4. Table 4: Inequalities derived in the Early and in the Eukaryotic Genetic Code. The value of the parameters c H subscript 𝑐 𝐻 c_{H} and c V subscript 𝑐 𝑉 c_{V} is different in the two codes.

	Early Code		Eukaryotic Code
a.a.	Parameters	Inequalities	Parameters	Inequalities
Thr	—	$P_{C} > P_{G}$	$\| c_{H} \| < 3 \| c_{V} \|$	$P_{C} > P_{U}$
			$\| c_{H} \| > 3 \| c_{V} \|$	$P_{C} > P_{G}$
Arg	—	—	$\| c_{V} \| < \| c_{H} \| < 2 \| c_{V} \|$	$P_{C} > P_{G}$
			$\| c_{H} \| < \| c_{V} \|$	$P_{C} > P_{U}$
Pro	$\| c_{V} \| < c_{H}$	$P_{A} > P_{U}$	$\| c_{V} \| < 1 / 4 \| c_{H} \|$	$P_{C} > P_{U}$
	$\| c_{V} \| > c_{H}$	$P_{A} > P_{G}$	$\| c_{V} \| > 1 / 4 \| c_{H} \|$	$P_{U} > P_{C}$ , $P_{A} > P_{G}$
Leu	—	$P_{G} > P_{C}$	$\| c_{H} \| < 2 / 3 \| c_{V} \|$ ,	$P_{U} > P_{C}$
			$\| c_{H} \| > 2 / 3 \| c_{V} \|$ ,	$P_{C} > P_{U}$ , $P_{G} > P_{A}$
Ala	$\| c_{V} \| < c_{H}$	$P_{U} > P_{A}$ , $P_{C} > P_{G}$	—	$P_{C} > P_{U}$
	$\| c_{V} \| > c_{H}$	$P_{U} > P_{C}$ , $P_{A} > P_{G}$
Gly	—	—	$\| c_{H} \| < \| c_{V} \|$	$P_{C} > P_{G}$
			$\| c_{H} \| > \| c_{V} \|$	$P_{G} > P_{C}$ , $P_{A} > P_{U}$
Val	$\| c_{V} \| < 4 c_{H}$	$P_{C} > P_{G}$ , $P_{U} > P_{A}$	$\| c_{H} \| < 3 \| c_{V} \|$	$P_{C} > P_{U}$ , $P_{G} > P_{A}$
	$\| c_{V} \| > 4 c_{H}$	$P_{C} > P_{U}$ , $P_{G} > P_{A}$	$\| c_{H} \| > 3 \| c_{V} \|$	$P_{C} > P_{G}$ , $P_{U} > P_{A}$
Ser	$\| c_{V} \| < 3 c_{H}$	$P_{G} > P_{C}$ , $P_{A} > P_{U}$	$\| c_{V} \| < \frac{5}{4} \| c_{H} \|$	$P_{C} > P_{U}$
	$\| c_{V} \| > 3 c_{H}$	$P_{G} > P_{A}$ , $P_{C} > P_{U}$	$\| c_{V} \| > \frac{5}{4} \| c_{H} \|$	$P_{U} > P_{C}$ , $P_{A} > P_{G}$

Equations130

s l (2)_{H}

s l (2)_{H}

C \equiv (+, +)

\displaystyle\Bigg{.}sl(2)_{V}\updownarrow

\displaystyle\Bigg{.}G\equiv(+,-)

\displaystyle J_{-}(u\otimes v)=\left\{\begin{array}[]{ll}J_{-}u\otimes v&\exists\,n\geq 1\mbox{ such that }J_{-}^{n}u\neq 0\mbox{ and }J_{+}^{n}v=0\\ u\otimes J_{-}v&\mbox{otherwise}\\ \end{array}\right.

\displaystyle J_{-}(u\otimes v)=\left\{\begin{array}[]{ll}J_{-}u\otimes v&\exists\,n\geq 1\mbox{ such that }J_{-}^{n}u\neq 0\mbox{ and }J_{+}^{n}v=0\\ u\otimes J_{-}v&\mbox{otherwise}\\ \end{array}\right.

\displaystyle J_{+}(u\otimes v)=\left\{\begin{array}[]{ll}u\otimes J_{+}v&\exists\,n\geq 1\mbox{ such that }J_{+}^{n}v\neq 0\mbox{ and }J_{-}^{n}u=0\\ J_{+}u\otimes v&\mbox{otherwise}\\ \end{array}\right.

(\frac{1}{2}, \frac{1}{2}) \otimes (\frac{1}{2}, \frac{1}{2}) = (1, 1) \oplus (1, 0) \oplus (0, 1) \oplus (0, 0)

(\frac{1}{2}, \frac{1}{2}) \otimes (\frac{1}{2}, \frac{1}{2}) = (1, 1) \oplus (1, 0) \oplus (0, 1) \oplus (0, 0)

\begin{array}[]{lcccc}\to\,\,su(2)_{H}&(0,0)&(\mbox{CA})&\qquad\qquad(1,0)&(\begin{array}[]{ccc}\mbox{CG}&\mbox{UG}&\mbox{UA}\\ \end{array})\\ \downarrow\\ su(2)_{V}&(0,1)&\left(\begin{array}[]{c}\mbox{CU}\\ \mbox{GU}\\ \mbox{GA}\\ \end{array}\right)&\qquad\qquad(1,1)&\left(\begin{array}[]{ccc}\mbox{CC}&\mbox{UC}&\mbox{UU}\\ \mbox{GC}&\mbox{AC}&\mbox{AU}\\ \mbox{GG}&\mbox{AG}&\mbox{AA}\\ \end{array}\right)\end{array}

\begin{array}[]{lcccc}\to\,\,su(2)_{H}&(0,0)&(\mbox{CA})&\qquad\qquad(1,0)&(\begin{array}[]{ccc}\mbox{CG}&\mbox{UG}&\mbox{UA}\\ \end{array})\\ \downarrow\\ su(2)_{V}&(0,1)&\left(\begin{array}[]{c}\mbox{CU}\\ \mbox{GU}\\ \mbox{GA}\\ \end{array}\right)&\qquad\qquad(1,1)&\left(\begin{array}[]{ccc}\mbox{CC}&\mbox{UC}&\mbox{UU}\\ \mbox{GC}&\mbox{AC}&\mbox{AU}\\ \mbox{GG}&\mbox{AG}&\mbox{AA}\\ \end{array}\right)\end{array}

Q = J_{H, 3} + \frac{1}{4} C_{V} (J_{V, 3} + 1) - \frac{1}{4}

Q = J_{H, 3} + \frac{1}{4} C_{V} (J_{V, 3} + 1) - \frac{1}{4}

C_{α} = (J_{α, 3})^{2} + \frac{1}{2} n \in Z_{+} \sum k = 0 \sum n (J_{α -})^{n - k} (J_{α +})^{n} (J_{α -})^{k}

C_{α} = (J_{α, 3})^{2} + \frac{1}{2} n \in Z_{+} \sum k = 0 \sum n (J_{α -})^{n - k} (J_{α +})^{n} (J_{α -})^{k}

(\frac{1}{2}, \frac{1}{2}) \otimes (\frac{1}{2}, \frac{1}{2}) \otimes (\frac{1}{2}, \frac{1}{2}) = (\frac{3}{2}, \frac{3}{2}) \oplus 2 (\frac{3}{2}, \frac{1}{2}) \oplus 2 (\frac{1}{2}, \frac{3}{2}) \oplus 4 (\frac{1}{2}, \frac{1}{2})

(\frac{1}{2}, \frac{1}{2}) \otimes (\frac{1}{2}, \frac{1}{2}) \otimes (\frac{1}{2}, \frac{1}{2}) = (\frac{3}{2}, \frac{3}{2}) \oplus 2 (\frac{3}{2}, \frac{1}{2}) \oplus 2 (\frac{1}{2}, \frac{3}{2}) \oplus 4 (\frac{1}{2}, \frac{1}{2})

\displaystyle({\textstyle{\frac{3}{2}}},{\textstyle{\frac{3}{2}}})\equiv\left(\begin{array}[]{cccc}\mbox{CCC}&\mbox{UCC}&\mbox{UUC}&\mbox{UUU}\\ \mbox{GCC}&\mbox{ACC}&\mbox{AUC}&\mbox{AUU}\\ \mbox{GGC}&\mbox{AGC}&\mbox{AAC}&\mbox{AAU}\\ \mbox{GGG}&\mbox{AGG}&\mbox{AAG}&\mbox{AAA}\\ \end{array}\right)

\displaystyle({\textstyle{\frac{3}{2}}},{\textstyle{\frac{3}{2}}})\equiv\left(\begin{array}[]{cccc}\mbox{CCC}&\mbox{UCC}&\mbox{UUC}&\mbox{UUU}\\ \mbox{GCC}&\mbox{ACC}&\mbox{AUC}&\mbox{AUU}\\ \mbox{GGC}&\mbox{AGC}&\mbox{AAC}&\mbox{AAU}\\ \mbox{GGG}&\mbox{AGG}&\mbox{AAG}&\mbox{AAA}\\ \end{array}\right)

\displaystyle({\textstyle{\frac{3}{2}}},{\textstyle{\frac{1}{2}}})^{1}\equiv\left(\begin{array}[]{cccc}\mbox{CCG}&\mbox{UCG}&\mbox{UUG}&\mbox{UUA}\\ \mbox{GCG}&\mbox{ACG}&\mbox{AUG}&\mbox{AUA}\\ \end{array}\right)

\displaystyle({\textstyle{\frac{3}{2}}},{\textstyle{\frac{1}{2}}})^{2}\equiv\left(\begin{array}[]{cccc}\mbox{CGC}&\mbox{UGC}&\mbox{UAC}&\mbox{UAU}\\ \mbox{CGG}&\mbox{UGG}&\mbox{UAG}&\mbox{UAA}\\ \end{array}\right)

\displaystyle({\textstyle{\frac{1}{2}}},{\textstyle{\frac{3}{2}}})^{1}\equiv\left(\begin{array}[]{cc}\mbox{CCU}&\mbox{UCU}\\ \mbox{GCU}&\mbox{ACU}\\ \mbox{GGU}&\mbox{AGU}\\ \mbox{GGA}&\mbox{AGA}\\ \end{array}\right)\qquad\qquad({\textstyle{\frac{1}{2}}},{\textstyle{\frac{3}{2}}})^{2}\equiv\left(\begin{array}[]{cc}\mbox{CUC}&\mbox{CUU}\\ \mbox{GUC}&\mbox{GUU}\\ \mbox{GAC}&\mbox{GAU}\\ \mbox{GAG}&\mbox{GAA}\\ \end{array}\right)

\displaystyle({\textstyle{\frac{1}{2}}},{\textstyle{\frac{1}{2}}})^{1}\equiv\left(\begin{array}[]{cc}\mbox{CCA}&\mbox{UCA}\\ \mbox{GCA}&\mbox{ACA}\\ \end{array}\right)\qquad\qquad({\textstyle{\frac{1}{2}}},{\textstyle{\frac{1}{2}}})^{2}\equiv\left(\begin{array}[]{cc}\mbox{CGU}&\mbox{UGU}\\ \mbox{CGA}&\mbox{UGA}\\ \end{array}\right)

\displaystyle({\textstyle{\frac{1}{2}}},{\textstyle{\frac{1}{2}}})^{3}\equiv\left(\begin{array}[]{cc}\mbox{CUG}&\mbox{CUA}\\ \mbox{GUG}&\mbox{GUA}\\ \end{array}\right)\qquad\qquad({\textstyle{\frac{1}{2}}},{\textstyle{\frac{1}{2}}})^{4}\equiv\left(\begin{array}[]{cc}\mbox{CAC}&\mbox{CAU}\\ \mbox{CAG}&\mbox{CAA}\\ \end{array}\right)

P (X Z N) = n_{t o t} \to \infty lim \frac{n _{X Z N}}{n _{t o t}}

P (X Z N) = n_{t o t} \to \infty lim \frac{n _{X Z N}}{n _{t o t}}

P (X Z A) + P (X Z C) + P (X Z G) + P (X Z U) = 1

P (X Z A) + P (X Z C) + P (X Z G) + P (X Z U) = 1

P (X Z N) = P (b . s .; J_{H}, J_{V}, J_{H, 3}, J_{V, 3})

P (X Z N) = P (b . s .; J_{H}, J_{V}, J_{H, 3}, J_{V, 3})

P (X Z N) = ρ^{X Z} (J_{H}, J_{V}, J_{H, 3}, J_{V, 3}) + f_{b s}^{X Z} (J_{H}, J_{V}, J_{H, 3}, J_{V, 3})

P (X Z N) = ρ^{X Z} (J_{H}, J_{V}, J_{H, 3}, J_{V, 3}) + f_{b s}^{X Z} (J_{H}, J_{V}, J_{H, 3}, J_{V, 3})

f_{b s}^{X Z} (J_{H}, J_{V}, J_{H, 3}, J_{V, 3}) \approx F_{b s}^{X Z} (J_{H}; J_{H, 3}) + G_{b s}^{X Z} (J_{V}; J_{V, 3})

f_{b s}^{X Z} (J_{H}, J_{V}, J_{H, 3}, J_{V, 3}) \approx F_{b s}^{X Z} (J_{H}; J_{H, 3}) + G_{b s}^{X Z} (J_{V}; J_{V, 3})

P (N C C) + P (N C A) =

P (N C C) + P (N C A) =

ρ_{C + A}^{N C} + F_{b s}^{N C} (\frac{3}{2}; x) + G_{b s}^{N C} (\frac{3}{2}; y) + F_{b s}^{N C} (\frac{1}{2}; x^{'}) + G_{b s}^{N C} (\frac{1}{2}; y^{'})

P (N C G) + P (N C U) =

P (N C G) + P (N C U) =

ρ_{G + U}^{N C} + F_{b s}^{N C} (\frac{3}{2}; x) + G_{b s}^{N C} (\frac{3}{2}; y) + F_{b s}^{N C} (\frac{1}{2}; x^{'}) + G_{b s}^{N C} (\frac{1}{2}; y^{'})

P (N C C) + P (N C A) - P (N C G) - P (N C U) = ρ_{C + A}^{N C} - ρ_{G + U}^{N C} = \mbox C o n s t .

P (N C C) + P (N C A) - P (N C G) - P (N C U) = ρ_{C + A}^{N C} - ρ_{G + U}^{N C} = \mbox C o n s t .

P (W U C) + P (W U A) - P (W U G) - P (W U U)

P (W U C) + P (W U A) - P (W U G) - P (W U U)

P (C GC) + P (C G A) - P (C GG) - P (C G U)

P (GGC) + P (GG A) - P (GGG) - P (GG U)

P (X Z C) + P (X Z A) = \mbox C o n s t . (X Z = N C, C U, G U, C G, GG)

P (X Z C) + P (X Z A) = \mbox C o n s t . (X Z = N C, C U, G U, C G, GG)

A_{0} = exp [- k \sum 4 α_{c} C_{H}^{k} + 4 β_{c} C_{V}^{k} + 2 γ_{c} J_{3, H}^{k}]

A_{0} = exp [- k \sum 4 α_{c} C_{H}^{k} + 4 β_{c} C_{V}^{k} + 2 γ_{c} J_{3, H}^{k}]

T = 8 c_{H} J_{H}^{c} \cdot J_{H}^{a} + 8 c_{V} J_{V}^{c} \cdot J_{V}^{a}

T = 8 c_{H} J_{H}^{c} \cdot J_{H}^{a} + 8 c_{V} J_{V}^{c} \cdot J_{V}^{a}

J_{α}^{c} \cdot J_{α}^{a} = \frac{1}{2} {(J_{α}^{c} \oplus J_{α}^{a})^{2} - (J_{α}^{c})^{2} - (J_{α}^{a})^{2}}

J_{α}^{c} \cdot J_{α}^{a} = \frac{1}{2} {(J_{α}^{c} \oplus J_{α}^{a})^{2} - (J_{α}^{c})^{2} - (J_{α}^{a})^{2}}

< X Z N ∣ T ∣ N^{a} Z_{c}^{a} X_{c}^{a} >=

< X Z N ∣ T ∣ N^{a} Z_{c}^{a} X_{c}^{a} >=

< X Z N ∣ (8 c_{H} J_{H}^{c} \cdot J_{H}^{a} + 8 c_{V} J_{V}^{c} \cdot J_{V}^{a}) δ_{M^{a}, N_{c}^{a}} ∣ M^{a} Z_{c}^{a} X_{c}^{a} >

⟹_{E v o l u t i o n} < X Z N ∣8 c_{H} J_{H}^{c} \cdot J_{H}^{a} + 8 c_{V} J_{V}^{c} \cdot J_{V}^{a} ∣ M^{a} Z_{c}^{a} X_{c}^{a} >

\displaystyle c_{H}^{AAN}>0\;\Longrightarrow\;c_{H}^{AAN}<0\;\Longrightarrow\;\left\{\begin{array}[]{c}\;c_{H}^{AAY}>0\\ \\ \;c_{H}^{AAR}<0\end{array}\right.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRNA and protein synthesis mechanisms · DNA and Nucleic Acid Chemistry · Bacteriophages and microbial interactions

Full text

**Symmetry and Minimum Principle at the Basis

of the Genetic Code111Based on talks given at: BelBI2016 International Symposium, University of Belgrade, Serbia

and BIOMAT 2016 International Symposium, Nankai University, Tianjin,China**

A. Sciarrino

*I.N.F.N., Sezione di Napoli

Complesso Universitario di Monte S. Angelo

Via Cinthia, I-80126 Napoli, Italy

[email protected]*

P.Sorba

*LAPTH,Laboratoire d’Annecy-le-Vieux de Physique Théorique CNRS

Université de Savoie

Chemin de Bellevue, BP 110,

F-74941 Annecy-le-Vieux, France

E-mail: [email protected]*

To appear in BIOMAT 2016, 326 - 362, 2017

Abstract

The importance of the notion of symmetry in physics is well established: could it also be the case for the genetic code? In this spirit, a model for the Genetic Code based on continuous symmetries and entitled the “Crystal Basis Model” has been proposed a few years ago. The present paper is a review of the model, of some of its first applications as well as of its recent developments. Indeed, after a motivated presentation of our mathematical model, we illustrate its pertinence by applying it for the elaboration and verification of sum rules for codon usage probabilities, as well as for establishing relations and some predictions between physical-chemical properties of amino-acids. Then, defining in this context a “bio-spin” structure for the nucleotides and codons, the interaction between a couple of codon-anticodon can simply be represented by a (bio) spin-spin potential. This approach will constitute the second part of the paper where, imposing the minimum energy principle, an analysis of the evolution of the genetic code can be performed with good agreement with the generally accepted scheme. A more precise study of this interaction model provides informations on codon bias, consistent with data.

Keywords: crystal basis model, codon usage frequency, physical-chemical properties of amino acids, codon-anticodon interaction, evolution genetic code, codon bias

1 Introduction

The sciences of life offer an important domain of investigations for the physicist. Already about seventy years ago, Erwin Schrödinger provided in his book “What is life ?” [1] some ideas about the possible role of a “new physics” in this domain, imagining for example mutations to be directly linked to quantum leads. As can be read there:

“ living matter, while not eluding the “laws of physics” as established up to date, is likely to involve “other laws of physics” hitherto unknown, which however, once they have been revealed, will form just as integral a part of science as the former”.

Among the mathematical tools which played in the second part of the twentieth century and are still playing an essential role in theoretical physics, and in particular in particle physics, is the one of Group theory, this concept being usually called in physics Symmetry, or Invariance. It is this notion which is at the basis of our model for describing the genetic code and developing a theoretical approach of its biological properties.

The idea of symmetry, or invariance, can be used in different ways, but to illustrate the one we need today, let us take an example. Consider an electron $e^{-}$ . An important physical quantity attached to it is its spin. And actually, as you know, there are two states for the spin of the electron, called up and down, or + and - (or +1/2 and -1/2 following the notation you choose, and we use to say that the spin of $e^{-}$ is 1/2). Mathematically, these two states can be seen as orthogonal vectors of the 2-dim complex Euclidean space222Such a space is denoted as a 2-dim Hilbert space., on which acts the group of 2 by 2 unitary matrices, called $\mathcal{SU}(2)$ , transforming one state into another one. It is a Lie group and considering its Lie algebra, there exists an element on it, a $2\times 2$ matrix with eigenvalues +1/2 and -1/2 associated to the eigenvectors which are the states up and down. If you consider a vector boson, (e.g. the $W$ boson which mediates the weak interaction) there are three states of spin, denoted +1, 0, -1 and we can represent the elements of the group $\mathcal{SU}(2)$ by $3\times 3$ matrices acting on a 3-dim Hilbert space. It is this notion that we will use to construct our model describing the genetic code.

At this point, let us mention two essential aspects of genetics on which we propose to use our model: the DNA structure on one hand and the mechanism of polypeptide fixation from codons on the other hand. But, it might be good to start by reminding some essential features on the genetic code. First, as well known, the DNA macromolecule is constituted by two chains of nucleotides wrapped in a double helix shape. There are four different nucleotides, characterized by their bases: adenine (A) and guanine (G) deriving from purine, and cytosine (C) and thymine (T) coming from pyrimidine. Note also the A (reps. T) base in one strand is connected with two hydrogen bonds to a T (resp. A) base in the other strand, while a C (resp. G) base is related to a G (reps. C) base with three hydrogen bonds. The genetic information is transmitted to the cytoplasm via the messenger ribonucleic acid (mRNA). During this operation, called transcription, the A, G, C, T bases in the DNA are associated respectively to the U, C, G, A bases, U denoting the uracile base. Then it will be through a ribosome that a triplet of nucleotides or codon will be related to an amino acid (a.a.). More precisely, a codon is defined as an ordered sequence of three nucleotides, e.g. AAG, ACG, etc., and one enumerates in this way $4\times 4\times 4=64$ different codons. In the universal eukariotic code (see Table 1), 61 of such triplets can be connected in an unambiguous way to the amino-acids, except the three following triplets UAA, UAG and UGA, which are called non-sense or stop codons, the role of which is to stop the biosynthesis. Indeed the genetic code is the association between codons and amino-acids. But since one distinguishes only 20 amino-acids 333Alanine (Ala), Arginine (Arg), Asparagine (Asn), Aspartic acid (Asp), Cysteine (Cys), Glutamine (Gln), Glutamic acid (Glu), Glycine (Gly), Histidine (His), Isoleucine (Ile), Leucine (Leu), Lysine (Lys), Methionine (Met), Phenylalanine (Phe), Proline (Pro), Serine (Ser), Threonine (Thr), Tryptophane (Trp), Tyrosine (Tyr), Valine (Val). related to the 61 codons, it follows that the genetic code is degenerate. Still considering the standard eukariotic code, one observes sextets, quadruplets, triplet, doublets and singlet of codons, each multiplet corresponding to a specific amino-acid (a.a.).

In the mathematical framework we have proposed [2], the codons appear as composite states of nucleotides. More precisely, the codons are obtained as tensor products of nucleotides, the four nucleotides being assigned to the fundamental representation of the quantum group ${\mathcal{U}_{q}}(sl(2)\oplus sl(2))$ in the limit of the deformation parameter $q\to 0$ . The use of a quantum group in the limit $q\to 0$ is essential to take into account the nucleotide ordering (see Table 1). Of course, the reader who is not interested in the mathematical aspects can jump over them and focus his attention on the biophysical results which are presented hereafter. However, for the reader who wishes to better understand our approach, we have devoted a rather developed “tutorial” on group theory at the end of this review. The first part of this appendix deals with general notions and properties of Lie groups while the second part shows explicitly how are constructed the codons as representation states of the quantum group above mentioned.

We have distinguished two parts in this review.

The first one starts with a rapid recalling of the main aspects of our model that we called “Crystal Basis Model”. It is followed by two examples of applications. The first one concerns the setting of sum rules for codon usage probabilities [3]: it is deduced that the sum of usage probabilities of codons with C and A in the third position for the quartets and/or sextets is independent of the biological species for vertebrates. The second application deals with the physical-chemical properties of amino-acids for which a set of relations have been derived and compared with the experimental data [4]. A prediction for the not yet measured thermo-dynamical parameters of three amino-acids is also proposed.

Another important notion in physics is the principle of minimal action, or use of minimum of energy. This is the second main idea that we will keep with us in the second part of this review in which a codon-anticodon interaction potential is proposed [5], still in the framework of the “Crystal Basis Model”. Such a study will first allow to determine the structure of the minimum set of 22 anticodons allowing the translational-transcription for animal mitochondrial code. The results are in very good agreement with the observed anticodons. Then, the evolution of the genetic code is considered, with 20 amino-acids encoded from the beginning, from the viewpoint of codon-anticodon interaction. Following the same spirit as above, a determination of the structure of the anticodons in the Ancient, Archetypal and Early Genetic codes is obtained [6]. Most of our results agree with the generally accepted scheme. Finally, keeping still at hand the minimization of our codon-anticodon interaction potential, codon bias are discussed, providing inequalities between codon usage probabilities for quartets of codons [7]. Performing this study separately for the Early and for the Eukariotic genetic code, we observe a consistency with the obtained results as well as good agreement with the available data. Last but not least, an analysis of the coherent change of sign, in the evolution from the Early to the Eukaryotic code, of the two parameters regulating our interaction potential is performed.

Some general remarks are gathered in the conclusion, while, as already mentioned, a large appendix is devoted to the mathematical aspect of symmetry.

As this paper is essentially a review of the Crystal Basis Model, we have limited the references and only provided those directly connected to our approach. The interested reader can find in each quoted paper the relative biography.

2 PART 1: Crystal basis model and application

2.1 A group theoretical model of the genetic code

We consider the four nucleotides as basic states of the $({\textstyle{\frac{1}{2}}},{\textstyle{\frac{1}{2}}})$ representation of the ${\mathcal{U}}_{q}(sl(2)\oplus sl(2))$ quantum enveloping algebra in the limit $q\to 0$ [2]. A triplet of nucleotides will then be obtained by constructing the tensor product of three such four-dimensional representations. Actually, this approach mimicks the group theoretical classification of baryons made out from three quarks in elementary particles physics, the building blocks being here the A, C, G, T/U nucleotides. The main and essential difference stands in the property of a codon to be an ordered set of three nucleotides, which is not the case for a baryon.

Constructing such pure states is made possible in the framework of any algebra ${\mathcal{U}}_{q\to 0}({\mathcal{G}})$ with ${\mathcal{G}}$ being any (semi)-simple classical Lie algebra owing to the existence of a special basis, called crystal basis, in any (finite dimensional) representation of ${\mathcal{G}}$ . The algebra ${\mathcal{G}}=sl(2)\oplus sl(2)$ appears the most natural for our purpose. The complementary rule in the DNA–mRNA transcription may suggest to assign a quantum number with opposite values to the couples (A,T/U) and (C,G). The distinction between the purine bases (A,G) and the pyrimidine ones (C,T/U) can be algebraically represented in an analogous way. Thus considering the fundamental representation $({\textstyle{\frac{1}{2}}},{\textstyle{\frac{1}{2}}})$ of $sl(2)\oplus sl(2)$ and denoting $\pm$ the basis vector corresponding to the eigenvalues $\pm{\textstyle{\frac{1}{2}}}$ of the $J_{3}$ generator in any of the two $sl(2)$ corresponding algebras, we will assume the following “biological” spin structure:

[TABLE]

the subscripts $H$ (:= horizontal) and $V$ (:= vertical) being just added to specify the algebra.

Now, we consider the representations of ${\mathcal{U}}_{q}(sl(2))$ and more specifically the crystal bases obtained when $q\to 0$ . Introducing in ${\mathcal{U}}_{q\to 0}(sl(2))$ the operators $J_{+}$ and $J_{-}$ after modification of the corresponding simple root vectors of ${\mathcal{U}}_{q}(sl(2))$ , a particular kind of basis in a ${\mathcal{U}}_{q}(sl(2))$ -module can be defined. Such a basis is called a crystal basis and carries the property to undergo in a specially simple way the action of the $J_{+}$ and $J_{-}$ operators: as an example, for any couple of vectors $u,v$ in the crystal basis ${\bf B}$ , one gets $u=J_{+}v$ if and only if $v=J_{-}u$ . More interesting for our purpose is the crystal basis in the tensorial product of two representations. Then the following theorem holds [8] (written here in the case of $sl(2)$ ):

Theorem 2.1: Let ${\bf B}_{1}$ and ${\bf B}_{2}$ be the crystal bases of the $M_{1}$ and $M_{2}$ ${\mathcal{U}}_{q\to 0}(sl(2))$ -modules respectively.Then for $u\in{\bf B}_{1}$ and $v\in{\bf B}_{2}$ , we have:

[TABLE]

Note that the tensor product of two representations in the crystal basis is not commutative. In the case of our model, we only need to construct the $n$ -fold tensor product of the fundamental representation $({\textstyle{\frac{1}{2}}},{\textstyle{\frac{1}{2}}})$ of ${\mathcal{U}}_{q\to 0}(sl(2)\oplus sl(2))$ by itself.

In Table 2 we report the assignments of the codons of the eukariotic code (the upper label denotes different irreducible representations) and, respectively the amino-acid content of the $\otimes^{3}({\textstyle{\frac{1}{2}}},{\textstyle{\frac{1}{2}}})$ representations.The codon content in each of the obtained irreducible representations is also expressed at the end of this subsection.

Let us insist on the choice of the crystal basis, which exists only in the limit $q\to 0$ . In a codon the order of the nucleotides is of fundamental importance (e.g. CCU $\to$ Pro, CUC $\to$ Leu, UCC $\to$ Ser). If we want to consider the codons as composite states of the (elementary) nucleotides, this surely cannot be done in the framework of Lie (super)algebras. Indeed in the Lie theory, the composite states are obtained by performing tensor products of the fundamental irreducible representations. They appear as linear combinations of the elementary states, with symmetry properties determined from the tensor product (i.e. for $sl(n)$ , by the structure of the corresponding Young tableaux).

On the contrary the crystal basis provides us with the mathematical structure to build composite states as pure states, characterised by the order of the constituents. In order to dispose of such a basis, we need to consider the limit $q\to 0$ . Note that in this limit we do not deal anymore either with a Lie algebra or with an universal deformed enveloping algebra.

To represent a codon, we have to perform the tensor product of three $({\textstyle{\frac{1}{2}}},{\textstyle{\frac{1}{2}}})$ representations of $\mathcal{U}_{q\to 0}(sl(2)\oplus sl(2))$ . However, it is well-known (see Table 1) that in a multiplet of codons relative to a specific amino-acid, the two first bases constituent of a codon are “relatively stable”, the degeneracy being mainly generated by the third nucleotide. We consider first the tensor product:

[TABLE]

where inside the parenthesis, $j=0,{\textstyle{\frac{1}{2}}},1$ is put in place of the $2j+1=1,2,3$ respectively dimensional $sl(2)$ representation. We get, using Theorem 2.1, the following tableau:

[TABLE]

From Table 2, the dinucleotide states formed by the first two nucleotides in a codon can be put in correspondence with quadruplets, doublets or singlets of codons relative to an amino-acid. Note that the sextets (resp. triplets) are viewed as the sum of a quadruplet and a doublet (resp. a doublet and a singlet). Let us define the “charge” $Q$ of a dinucleotide state by

[TABLE]

$J_{\alpha,3}$ ( $\alpha=H,V$ ) stands for the diagonalised $sl(2)_{\alpha}$ generator. The operator $C_{\alpha}$ is a Casimir operator of ${\mathcal{U}}_{q\to 0}(sl(2)_{\alpha})$ in the crystal basis. It commutes with $J_{\alpha\pm}$ and $J_{\alpha,3}$ and its eigenvalues on any vector basis of an irreducible representation of highest weight $j$ is $j(j+1)$ , that is the same as the undeformed standard second degree Casimir operator of $sl(2)$ . Its explicit expression is

[TABLE]

Note that for $sl(2)_{q\to 0}$ the Casimir operator is an infinite series of powers of $J_{\alpha\pm}$ . However in any finite irreducible representation only a finite number of terms gives a non-vanishing contribution.

The dinucleotide states are then split into two octets with respect to the charge $Q$ : the eight strong dinucleotides associated to the quadruplets (as well as those included in the sextets) of codons satisfy $Q>0$ , while the eight weak dinucleotides associated to the doublets (as well as those included in the triplets) and eventually to the singlets of codons satisfy $Q<0$ . Let us remark that by the change $C\leftrightarrow A$ and $U\leftrightarrow G$ , which is equivalent to the change of the sign of $J_{\alpha,3}$ or to reflexion with respect to the diagonals of the eq.(2.1), the 8 strong dinucleotides are transformed into weak ones and vice-versa.

If we consider the three-fold tensor product, the content into irreducible representations of ${\mathcal{U}}_{q\to 0}(sl(2)\oplus sl(2))$ is given by:

[TABLE]

The structure of the irreducible representations of the r.h.s. of eq. (12) is (the upper labels denote different irreducible representations):

[TABLE]

2.2 Applications

2.2.1 Sum rules of codon usage probabilities

Let $XZN$ be a codon in a multiplet encoding an amino acid, where the labels $X,Z,N$ stands for any of the four bases $A,C,G,U/T$ . We define the relative frequency of usage of the codon $XZN$ as the ratio between the number of times $n_{XZN}$ the codon $XZN$ is used in the biosynthesis of the amino acid, and the total number $n_{tot}$ of synthesised amino acid,. Then, the frequency of usage of a codon in a multiplet is connected, in the limit of very large $n_{tot}$ , to its probability of usage $P(XZN)$ :

[TABLE]

with the normalization

[TABLE]

The pattern of codon usage varies between species and even among tissues within a species. Most of the analyses of the codon usage frequencies have adressed to analyze the relative abundance of specified codons in different genes of the same biological species or in the comparison of the relative abundance in the same gene for different biological species. No attention, at our knowledge, has been paid to analyse codon usage frequency summed over the whole available sequences to infer global correlations between different biological species.

The aim of the paper [3] was to investigate this aspect and to predict a general law which should be satisfied by all the biological species belonging to vertebrates.

From the definition of the usage probability for a codon $XZN$ , see eq. (19), it follows that our analysis and predictions hold for biological species with large enough statistics of codons. In the crystal basis model of the genetic code, each codon $XZN$ is described by a state belonging to an irreducible representation denoted $(J_{H},J_{V})^{\xi}$ ( $\xi$ specifying the representation) of the algebra ${\mathcal{U}_{q}}(sl(2)_{H}\oplus sl(2)_{V})$ in the limit $q\to 0$ . It is natural in this model to write the usage probability as a function of the biological species (b.s.), of the particular amino-acid and of the labels $J_{H}$ , $J_{V}$ , $J_{H,3}$ , $J_{V,3}$ describing the state $XZN$ .

Assuming the dependence of the amino-acid to be completely determined by the set of labels $Js$ , we write

[TABLE]

Let us now make the hypothesis that we can write the r.h.s. of eq. (21) as the sum of two contributions: a universal function $\rho$ independent on the biological species at least for vertebrates and a b.s. depending function $f_{bs}$ , i.e.

[TABLE]

From the analysis of the available data, we assume that the contribution of $f_{bs}$ is not negligible but could be smaller than the one due to $\rho$ . As each state describing a codon is labelled by the quantum labels of two commuting $sl(2)$ , it is reasonable, at first approximation, to assume

[TABLE]

Now, let us analyse in the light of the above considerations the usage probability for the quartets Ala, Gly, Pro, Thr and Val and for the quartet sub-part of the sextets Arg (i.e. the codons of the form CGN), Leu (i.e. CUN) and Ser (i.e. UCN).

For Thr, Pro, Ala and Ser we can write, using Table 2 and eqs. (21)-(23), with $N=A,C,G,U$ ,

[TABLE]

where we have denoted by $\rho_{C+A}^{NC}$ the sum of the contribution of the universal function (i.e. not depending on the biological species) $\rho$ relative to $NCC$ and $NCA$ , while the labels $x,y,x^{\prime},y^{\prime}$ depend on the nature of the first two nucleotides $NC$ , see Table 2. For the same amino acid we can also write

[TABLE]

Using the results of Table 2, we can remark that the difference between eq. (2.2.1) and eq. (2.2.1) is a quantity independent of the biological species,

[TABLE]

In the same way, considering the cases of Leu, Val, Arg and Gly, we obtain with $W=C,G$

[TABLE]

Since the probabilities for one quadruplet are normalised to one, from eqs. (2.2.1)-(2.2.1) we deduce that for all the eight amino acids the sum of probabilities of codon usage for codons with last A and C (or U and G) nucleotide is independent of the biological species, i.e.

[TABLE]

Moreover, assuming that for sextets the functions $F$ and $G$ depend really on the nature of the encoded amino acid rather than on the dinucleotide, we derive in a completely analogous way as above that for the amino acid Ser the sum $P^{\prime}_{C+A}(S)=P(UCA)+P(AGC)$ is independent of the biological species. Note the that we normalize to 1 the probabilities of a quartet in a sextet.

A statistical discussion of the sum rules, in the more general context of correlations between the probabilities $P(XZN)$ , can be found in [10].

An analysis with more recent data for more biological species can be found in [11].

2.3 Physico-chemical properties of amino-acids:relations and predictions

It is a known observation that a relationship exists between the codons and the physical-chemical properties of the coded amino acids. The observed pattern is read either as a relic of some kind of interaction between the amino acids and the nucleotides at an early stage of evolution or as the existence of a mechanism relating the properties of codons with those of amino acids.

It is also observed that the relationship depends essentially on the nature of the second nucleotide in the codons and it holds when the second nucleotide is A, U, C, not when it is G. To our knowledge neither the anomalous behaviour of G nor the existence of a closest relationship between some of the amino acids is understood. In [4] we provided an explanation of both these facts in the framework of the crystal basis model of the genetic code.

2.3.1 Relationship between the physical-chemical properties of amino

acids

We assume that some physical-chemical property of a given amino acid are related to the nature of the codons, in particular they depend on the following mathematical features, written in hierarchical order:

the irreducible representation of the dinucleotide formed by the first two nucleotides; 2. 2.

the sign of the charge $Q$ eq.(10) on the dinucleotide state; 3. 3.

the value of the third component of $J_{V,3}$ inside a fixed irreducible representations for the dinucleotides; 4. 4.

the upper label(s) of the codon irreducible representation(s);

Not all the physical-chemical properties are supposed to follow the scheme above; some of them are essentially given by the specific chemical structure of the amino acid itself. In the following, we analyse the physical-chemical properties of the amino acids in the light of the dinucleotide content of the irreducible representations of eq. (9).

– Representation (0,0): the codons of the form CAN (N = C, U, G, A) all belong to the irreducible representation $({\textstyle{\frac{1}{2}}},{\textstyle{\frac{1}{2}}})^{4}$ and code for His and Gln, both being coded by doublets and differing by the value of $J_{V,3}$ . Then we expect that the physical-chemical properties of His and Gln are very close.

– Representation (1,0): we analyse the codons CG ( $Q>0$ ), UG, UA (both $Q<0$ ). The codons CGS (S = C, G), resp. CGW (W = U, A), belonging to irreducible representation $(3/2,1/2)^{2}$ , resp. $(1/2,1/2)^{2}$ , all code for Arg, so we do not have any relation. The codons UGS, resp. UGW, belonging to irreducible representation $(3/2,1/2)^{2}$ , resp. $(1/2,1/2)^{2}$ , code for Cys and Trp, resp. the other Cys and Ter. So we expect some affinity between the physical-chemical properties of Cys and Trp, not very strong indeed as the former is encoded by a doublet and the latter by a singlet. The codons UAN, belonging to the irreducible representation $(3/2,1/2)^{2}$ , code for the Tyr and Ter. So we expect some affinity between the amino acids coded by UGN and UAN, in particular between Cys and Tyr both being coded by doublets.

– Representation (0,1): we analyse the codons CU, GU (both $Q>0$ ) and GA ( $Q<0$ ). The codons CUY and GUY (Y = C, U), resp. CUR and GUR (R = G, A), belonging to irreducible representation $(1/2,3/2)^{2}$ , resp. $(1/2,1/2)^{3}$ , code for Leu and Val. Therefore we do not have any relation between amino acids coded by the same dinucleotide, but we expect that the physical-chemical properties of Leu and Val are close since CU and GU both belong to the same irreducible representation and are both strong. The codons GAN belong to the irreducible representation $(1/2,3/2)^{2}$ and they code Asp and Glu (both doublets). Then we expect the physical-chemical properties of Asp and Glu to be very close.

– Representation (1,1): the dinucleotide irreducible representation ( $1,1$ ) contains five states with $Q>0$ (CC, UC, GC, AC, GG). The codons CCN and UCN (resp. GCN and ACN) belong to four different irreducible representations and code for Pro and Ser (resp. Ala and Thr). We expect a strong affinity between the physical-chemical properties of Pro and Ser on the one hand and between the physical-chemical properties of Ala and Thr on the other hand. The codons GGN belong to two different irreducible representations and code for Gly, so we expect an affinity of physical-chemical properties of Gly with those of Pro, Ser, Ala, Thr. Now let us look at the four states with $Q<0$ (UU, AU, AG, AA). The codons UUN belong to two different irreducible representations and code for Leu, the doublet subpart of the sextet, and for Phe (doublet). An affinity is expected between the physical-chemical properties of these two amino acids. The codons AUN belong to two different irreducible representations and code Ile (triplet) and Met (singlet) and, in fact, the values of physical-chemical properties of these two amino acids are not very different. The codons AGN belong to two different irreducible representations and code for Ser and Arg, the doublet subpart of the sextet, so an affinity between the physical-chemical properties of these codons is expected. The codons AAN belong to the same irreducible representation ( $3/2,3/2$ ) and code for Asn and Lys, so the values of the physical-chemical properties of these amino acids should be close.

Note that for the three sextets (Arg, Leu, Ser) the quartet (doublet) subpart is coded by a codon with a strong (weak) dinucleotide.

2.3.2 Discussion

We have compared our theoretical predictions with 10 physical-chemical properties:

•

the Chou-Fasman conformational parameters $P_{\alpha}$ , $P_{\beta}$ and $P_{\tau}$ which gives a measure of the probability of the amino acids to form respectively a helix, a sheet and a turn. The sum $P_{\alpha}+P_{\beta}$ appears more appropriate to characterise the generic structure forming potential and the difference $P_{\alpha}-P_{\beta}$ the helix forming potential, this quantity depending more on the particular amino acid. So we compare with $P_{\alpha}$ + $P_{\beta}$ and $P_{\tau}$ ;

•

the Grantham polarity $P_{G}$ ;

•

the relative hydrophilicity $R_{f}$ ;

•

the thermodynamic activation parameters at 298 K: $\Delta H$ (enthalpy, in kJ/mol), $\Delta G$ (free energy, in kJ/mol) and $\Delta S$ (entropy, in J/mole/K);

•

the negative of the logarithm of the dissociation constants at 298 K: $pK_{a}$ for the $\alpha$ -COOH group and $pK_{b}$ for the $\alpha$ -NH ${}_{3}^{+}$ group;

•

the isoelectronic point $pI$ , i.e. the $pH$ value at which no electrophoresis occurs.

The comparison between the theoretical relations and the experimental values shows:

( $\cong$ means strong affinity, $\approx$ affinity, $\sim$ weak affinity):

•

His $\cong$ Gln – The agreement, except for $pI$ , is very good.

•

Asp $\cong$ Glu – The agreement, except for $P_{\tau}$ , is very good.

•

Asn $\cong$ Lys $\sim$ Arg, Ser – The agreement, except for $pI$ and $P_{\tau}$ is very good. The comparison with the values of physical-chemical properties of Ser and Arg is satisfactory.

•

Cys $\cong$ Tyr $\approx$ Trp – Except for $R_{f}$ , the agreement between the first two amino acids is very good, while with Trp is satisfactory.

•

Leu $\cong$ Val – The agreement is very good.

•

Pro $\cong$ Ser $\approx$ Gly – The agreement is very good, except for $P_{\alpha}+P_{\beta}$ and $\Delta H$ , and with Gly more than satisfactory.

•

Ala $\cong$ Thr $\approx$ Gly, Pro, Ser – The agreement is very good between the first two amino acids except for $P_{\tau}$ and satisfactory with the others except for the conformational parameters.

•

Ile $\cong$ Met $\approx$ Phe – The agreement is very good between the first two amino acids and satisfactory with Phe.

So we predict that for Asp and Glu, one should find $\Delta H\approx 60$ kJ/mol, $-\Delta S\approx 135$ kJ/mol/K and $\Delta G\approx 100$ kJ/mol.

In conclusion, the values of physical-chemical properties show, with a few exceptions, a pattern of correlations which is expected from the assumptions of the crystal basis model. The remarked property that the amino acids coded by codons whose second nucleotide is G do not share similarity in the physical-chemical properties with other amino acids does find an explication in the model, as it is immediate to verify that there are no two states with G in second position which share simultaneously the properties of belonging to the same irreducible representation and being characterised by the same value of $Q$ .

More details and illustrative Tables can be found in [4].

3 PART 2: A “minimum” principle in the genetic code

3.1 A “minimum” principle in the mRNA editing

The “minimum” principles, in their different formulations, have played and play a very relevant role in any mathematically formulated scientific theory. The key point of a “minimum” principle is to state that an event happens along the path that minimizes a suitable function. The mathematical formulation of a sequence in RNA or DNA in the crystal basis model allows to investigate if some “minimum” principle can be applied to the genetic code.

In [9], we have investigated the possibility to explain the position of a nucleotide insertion in mRNA, the so called mRNA editing. The deep mechanism which causes RNA editing is still unknown. The understanding of the event is complicated: from a thermodynamics point of view a change, i.e. C $\to$ U, takes place if it is favored in the change of entalpy or entropy, but should this be the case, the change should appear in all the organisms. Moreover from a microscopic (quantum mechanical) point of view, the change should occur in both directions, i.e.. C $\leftrightarrow$ U. It seems that the primary aim of mRNA editing is the evolution and conservation of protein structures, creating a meaningful coding sequence specific for a particular amino acid sequence.

The purpose of the paper [9] was to propose an effective model to describe the RNA editing. Our model does not explain why, where and in which organisms editing happens, but it gives a framework to understand some specific features of the phenomenon.

A consequence of the crystal basis model is that any nucleotide sequence is characterized as an element of a vector space. Therefore, functions can be defined on this space and can be computed on the sequence of codons. In particular any codon is identified by a set of four half-integer labels and functions can be defined on the codons. We make the assumption that the location sites for the insertion of a nucleotide should minimize the following function for the mRNA or cDNA

[TABLE]

where the sum in $k$ is over all the codons in the edited sequence, $C_{H}^{k}$ ( $C_{V}^{k}$ ) and $J_{3,H}^{k}$ ( $J_{3,V}^{k}$ ), are the values of the Casimir operator, see eq.(11) and of the third component of the generator of the $sl(2)_{H}$ ( $sl(2)_{V}$ ), in the irreducible representation to which the $k$ -th codon belongs, see Table 2. In (29) the simplified assumption that the dependence of ${\mathcal{A}}_{0}$ on the irreducible representation to which the codon belongs is given only by the values of the Casimir operators has been made. The parameters $\alpha_{c},\beta_{c},\gamma_{c}$ are constants, depending on the biological species.

The minimum of $\mathcal{A}_{0}$ has to be computed in the whole set of configurations satisfying to the constraints: i) the starting point should be the mtDNA and ii) the final peptide chain should not be modified. It is obvious that the global minimization of expression eq.(29) is ensured if $\mathcal{A}_{0}$ takes the smallest value locally, i.e. in the neighborhood of each insertion site. The form of the function $\mathcal{A}_{0}$ is rather arbitrary; one of the reasons of this choice is that the chosen expression is computationally quite easily tractable. If the parameters $\alpha_{c},\beta_{c},\gamma_{c}$ are strictly positive with $\gamma_{c}/6>\beta_{c}>\alpha_{c}$ , the minimization of eq.(29) explains the observed configurations in almost all the considered cases, for more details see [9].

3.2 A “minimum” principle in the interaction codon-anticodon

Given a codon444In the paper we use the notation $N=C,A,G,U.;\;\;R=G,A.\;(purine);\;\;Y=C,U.\;(pyrimidine)$ . $XYZ$ ( $X,Y,Z\in\{C,A,G,U\}$ ) we conjecture that an anticodon $\,X^{a}Y^{a}Z^{a}$ , where $\,Y^{a}Z^{a}=Y_{c}X_{c}$ , $N_{c}$ denoting the nucleotide complementary to the nucleotide $N$ according to the Watson-Crick pairing rule555This property is observed to be verified in most, but not in all, the observed cases. To simplify we shall assume it., pairs to the codon $XYZ$ , i.e. it is most used to “read” the codon $XYZ$ if it minimizes the operator ${\mathcal{T}}$ , explicitly written in eq.(30) and computed between the “states”, which can be read from Table 3, describing the codon and anticodon in the “crystal basis model”. We write both codons (c) and anticodons (a) in $5"\to 3"$ direction. As an anticodon is antiparallel to codon, the 1st nucleotide (respectively the 3rd nucleotide) of the anticodon is paired to the 3rd (respectively the 1st) nucleotide of the codon.

[TABLE]

where:

•

$c_{H}.c_{V}$ are constants depending on the “biological species” and weakly depending on the encoded a.a., as we will later specify.

•

$J_{H}^{c},J_{V}^{c}$ (resp. $J_{H}^{a},J_{V}^{a}$ ) are the labels of ${\mathcal{U}_{q\to 0}}(su(2)_{H}\oplus su(2)_{V})$ specifying the state

describing the codon $XYZ$ (resp. the anticodon $NY_{c}X_{c}$ pairing the codon $XYZ$ ).

•

$\vec{J_{\alpha}^{c}}\cdot\vec{J_{\alpha}^{a}}$ ( $\alpha=H,V$ ) should be read as

[TABLE]

and $\vec{J_{\alpha}^{c}}\oplus\vec{J_{\alpha}^{a}}\equiv\vec{J_{\alpha}^{T}}$ stands for the irreducible representation which the codon-anticodon state under consideration belongs to, the tensor product of $\vec{J_{\alpha}^{c}}$ and $\vec{J_{\alpha}^{a}}$ being performed according to the rule of [8], choosing the codon as first vector and the anticodon as second vector. Note that $\vec{J_{\alpha}}^{2}$ should be read as the Casimir operator whose eigenvalues are given by $J_{\alpha}(J_{\alpha}+1)$ .

As we are interested in finding the composition of the 22 anticodons, minimun number to ensure a faihful translation, we shall assume that the used anticodon for each quartet and each doublet is the one which minimizes the averaged value of the operator given in eq.(30), the average being performed over the 4 (2) codons for quadruplets (doublets), see next section. We have found that the anticodons minimizing the conjectured operator ${\mathcal{T}}$ given in eq.(30), averaged over the concerned multiplets, are in very good agreement, the results depending only on the signs of the two coupling constants, with the observed ones, even if we have made comparison with a limited database.

The fact that the crystal basis model is able to explain, in a relatively simple way, the observed anticodon-codon pairing which has its roots on the stereochemical properties of nucleotides strongly suggests that our modelisation is able to incorporate some crucial features of the complex physico-chemical structure of the genetic code. Incidentally let us remark that the model explains the symmetry codon anticodon remarked. Let us stress that our modelisation has a very peculiar feature which makes it very different from the standard 4-letter alphabet, used to identify the nucleotides, as well as with the usual modelisation of nucleotide chain as spin chain. Indeed the identification of the nucleotides with the fundamental irrep. of ${\mathcal{U}}_{q}(su(2)_{H}\oplus su(2)_{V})$ introduces a sort of double “bio-spin”, which allows the description of any ordered sequence of $n$ nucleotides as as state of an irrep. and allows to describe interactions using the standard powerful mathematical language used in physical spin models.

In the paper [5] we have faced the problem to find the structure of the mimimum set of anticodons and, then, we have used a very simple form for the operator ${\mathcal{T}}$ . We have not at all discussed the possible appearance of any other anticodon, which should require a more quantitative discussion. For such analysis, as well as for the eukaryotic code, the situation may be different and more than an anticodon may pair to a quartet.

The pattern, which in the general case may show up, is undoubtedly more complicated, depending on the biological species and on the concerned biosynthesis process, but it is natural to argue that the usage of anticodons exhibits the general feature to assure an “efficient” translation process by a number of anticodons, minimum with respect to the involved constraints. A more refined and quantitative analysis, which should require more data, depends on the value of these constants.

However our analysis strongly suggests that the minimum number of anticodons should be 32 (3 for the sextets, 2 for quadruplets and triplet and 1 for doublets and singlets).

In conclusion, we have found that the anticodons minimizing the conjectured operator ${\mathcal{T}}$ given in eq.(30), averaged over the concerned multiplets, are in very good agreement, the results depending only on the signs of the two coupling constants, with the observed ones, even if we have made comparison with a limited database.

The fact that the crystal basis model is able to explain, in a relatively simple way, the observed anticodon-codon pairing which has its roots on the stereochemical properties of nucleotides strongly suggests that our modelisation is able to incorporate some crucial features of the complex physico-chemical structure of the genetic code. Incidentally let us remark that the model explains the symmetry codon anticodon remarked. Let us stress that our modelisation has a very peculiar feature which makes it very different from the standard 4-letter alphabet, used to identify the nucleotides, as well as with the usual modelisation of nucleotide chain as spin chain. Indeed the identification of the nucleotides with the fundamental irrep. of ${\mathcal{U}}_{q}(su(2)_{H}\oplus su(2)_{V})$ introduces a sort of double “bio-spin”, which allows the description of any ordered sequence of $n$ nucleotides as as state of an irrep. and allows to describe interactions using the standard powerful mathematical language used in physical spin models.

3.3 The “minimum” principle in the evolution of genetic code

Using the minimum principle stated in Subsection 3.2 in [6] we have analyzed and mathematically modellised the evolution of the genetic code in the framework on the so called “codon capture theory .

Let us briefly summarize and comment our results.

We determine the structure of the anticodons in the Ancient, Archetypal and Early Genetic codes, that are all reconciled in a unique frame. Most of our results agree with the generally accepted scheme. Moreover the pattern of the model is surprisingly coherent.

The pattern of the Ancient Code can be summarized by saying that in this primordial code the a.a., which would be encoded by a doublet of the type XZY are encoded by a codon ending with a C. Similarly the a.a. which would be encoded by a doublet of the type XZR are encoded by a codon ending with a G.

Indeed, in the Ancient Genetic Code, the sign of $c_{V}$ for the weak dinucleotides is undetermined, i.e. the minimization does not depend on the sign of $c_{V}$ . In our model, this means that there is no distinction between C (U) and G (A). This is coherent since at this stage there is not yet a distinction between the doublet XZR and XZY. On the contrary for strong dinucleotides for which the role of XZR and XZY is the same up to the Standard Genetic Code, the sign is fixed and it does not change during the evolution. For strong dinucleotides and almost half of the weak ones666Let us remark that the sign of $c_{H}$ does not change for the weak dinucleotides which has the value of $J_{3,H}=0$ . there is a change in $c_{H}$ just when the codon degeneracy appears, that is going from the Ancient to the Archetypal code, and the “ wobble mechanism ” is called in. For all weak dinucleotides, the sign of $c_{V}$ is now determined and there is a further change in the sign of $c_{H}$ and of $c_{V}$ when the correspondence between doublets and a.a. is fixed.

Let us remark that:

•

for each codon there are at least two anticodons with the same value of ${\mathcal{T}}$ and viceversa. This degeneracy can be removed by further terms of the interaction, not yet taken into account, but it can be also read as the “codon disappearance” before a new readjustment of the code.

•

we remark that the presently less used (in the average) codons, for the a.a. encoded by doublets, are those with last nucleotide G or U, while the most used are those with last nucleotide A or C. So it is natural to ask the question: why most of the ancestral codons encoding a.a. in the Ancient Code are now repressed ? Naively, one should expect that the ancestral codon should be the most used one.

•

in our model the sign of $c_{H}$ for a.a. encoded by XZY in the Early Code is the same than the one in the Ancient Code, while for a.a. encoded by XZR is the opposite. Cys is an exception, but in this case the anticodon is different in the two codes. This kind of argument cannot be immediately applied to a.a. encoded by quartets because in most cases the anticodon in the Archetypal, Early or Mitochondrial Code is not the same as the one appearing in the Ancient Code and, moreover, there is an important effect due to the averaging over four codons.

Analogous analysis of the codon usage frequencies for species following the Standard Code confirms generally such a pattern, but the presence of anticodons in the Standard Code is more complicated, so we do not want to refer to these data.

Moreover, in our model naturally the anticodon with first nucleotide A does never appear, in good agreement with the observed data.

In our model we can express the evolution of the genetic code through the following pattern of the codon-anticodon interaction as

[TABLE]

In the first row of eq.(32), the presence of the Kronecher delta $\delta_{M^{a},N^{a}_{c}}$ enforces the Watson-Crick coupling mechanism implying $M^{a}=N^{a}_{c}$ , while in the second row $M^{a}$ can be any nucleotide and the selection is implemented by the value of the operator ${\mathcal{T}}$ , computed between the concerned states and, eventually, averaged over the multiplet taking into account the codon usage probabilities. As example of typical behavior of the constant $c_{H}$ for weak dinucleotides, we consider the case of the AA dinucleotide:

[TABLE]

The change of the sign in the coupling constants is a mathematical description to frame the modification of the interaction codon-anticodon due to the change of the molecular structure of the nucleotides in the anticodons and of the (non local) structure of the tRNA.

Of course we have to assume that the constants $c_{H}$ and $c_{V}$ depend on the “time” even if, at this stage, only the change of the sign in the coupling constants has been considered.

Presumably the genetic code has not evolved along one path, It could be that multiple branching points showed up in the course of the evolution with the advent of different genetic codes, and then the standard genetic code would have emerged as the one exhibiting selective advantages. For example, one can imagine that not all the changes of the signs of $c_{H}$ and $c_{V}$ would have occurred at the same time, and, therefore, that several intermediate codes would have arisen between, say, the Ancient and the Archetypal Code. As a consequence, we believe it is more reasonable not to write a time-dependent evolution equation, but to verify that the existing genetic code, that is the branching point which has survived, satisfies the required optimality conditions.

3.4 The “minimum” principle to explain the codon bias

As already stated the genetic code is degenerate in the sense that a multiplet is used to encode most of the amino-acids. Some codons in the multiplets are used much more frequently than others to encode a particular amino-acid, i.e. there is a “codon usage bias”. The non-uniform usage of synonymous codons is a widespread phenomenon and it is experimentally observed that the pattern of codon usage varies between species.

The main reasons for the codon usage biases are believed to be: the genetic coding error minimization, the CG content, the abundance of specific anticodons in the tRNA. No clear indication comes out for the existence of one or more factors which universally engender the codon bias, on the contrary the role of some factors is controversial.

In paper [7] we have analyzed possible effects of the codon-anticodon interactions defined by the operator given in eq.(30) on the codon bias, according to the approach introduced in [5], and to propose semi-quantitative predictions of the codon bias. Moreover we briefly analyzed the codon usage bias variation along the evolution of the genetic code on the basis of the model developed in [6]. In the following, we will be concerned about amino acids encoded by quartets. For the ones encoded by a sextet, that we consider as the sum of a quartet and a doublet, only the quartet will be considered. The method we developed is essentially based on the determination of the minimum values of an operator which can be seen as an interaction potential between a codon and its corresponding anticodon. A possible general pattern of the bias is searched by deriving inequalities for the codon usage probabilities.

With reference to Subsection 3.2 we have to minimize an expression of the type:

[TABLE]

Let us recall that the expression $<N^{\prime}Y^{\prime\prime}X^{\prime\prime}|\mathcal{T}|XYN>$ has to be read as

[TABLE]

where we have used the correspondence

[TABLE]

and $\lambda$ is the eigenvalue of $\mathcal{T}$ on the state $|J_{H}^{c},J_{V}^{c};J_{H,3}^{c},J_{V,3}^{c}>\otimes\,|J_{H}^{a},J_{V}^{a};J_{H,3}^{a},J_{V,3}^{a}>$ , see [5] for more details. As the $P_{XYN}$ have to satisfy, in addition to eq.(20), a set of unknown constraints, we cannot impose the minimization condition in a rigorous manner, so we proceed by a heuristic method. Using the results of Subsection 2.2.1, we are left with only two probabilities in eq.(37) and we try to argue which from the two present $P_{XYN}$ is enhanced respect to the other one. For this aim we compare the two probabilities which appear after the substitution of the other two, using eq.(28), that have the greatest coefficient. In this way we will get in our expression a constant terms, depending on $c_{H}$ , $c_{V}$ and, generally, on $K$ ( $K$ being the constant which appears in the r.h.s. of eq.(28), which has the highest possible value, without, possibly, any specific assumption on the value of the parameters, except for the assumed sign. Then, in order to minimize the expression, it is reasonable to require that the probability with the lowest coefficient has a higher value than the other one. Nextly, in some cases, we can derive another inequality for the complementary probabilities, according to $K>0,5$ or $K<0,5$ .

For a more detailed discussion of the difference between the minimization procedure for the Early Genetic Code and the Eukaryotic Genetic Code, as well as on the assumed behavior of the coefficient $c_{H}$ and $c_{V}$ we refer to [7].

The outcomes derived, which we summarize in Table 4, are in an amazing agreement with the observed data, nevertheless the over-simplifying assumptions of our theoretical scheme and despite that in the real world the number of operating anticodons is greater that the minimum number 31, which implies that the matching of an a.a. encoded by a quartet is done by more than two anticodons. Moreover let us remark that the results found in the Early Genetic Code survive in the Eukaryotic Genetic Code, suggesting that we have caught some feature of a very relevant mechanism. So we argue that codon-anticodon interaction plays a relevant role in the codon usage bias. Moreover it seems that, in despite of its apparently fragmented behavior, the codon bias exhibits a sort of universal feature that our approach and the Crystal Basis Model is able to take into account. Let us remark that, in general, for plants and bacteria and, for some extent for invertebrates, the agreement is less satisfactory. Likely this reflects the fact that the choice of the specimen for these species is too rough. The experimental data should be taken from smaller, suitably chosen, subsets of the species.

Our model seems to support the idea that the codon usage bias reflects two aspects of the tRNA population: firstly, where there are multiple species of tRNAs with different anticodons, the codons translated by the most abundant tRNA species are preferred; secondly, when a tRNA can translate more than one codon, the codon best recognized by the anticodon is preferred. In our language the most abundant tRNA and the best recognized codon are the ones which minimize the ${\mathcal{T}}$ operator. The good agreement of our results with data suggests that it may be interesting to perform a more detailed analysis of the two parameters controlling our codon-anticodon interaction potential, in a general context taking into account the complexity and the evolution of species. In particular, what does it mean when similar parameters $c_{H}$ and $c_{V}$ conditions allow a good prediction of codon usage for different biological sets ? For example, for Pro and Ser, it seems that, on one hand, vertebrates and invertebrates share similar on conditions $c_{H}$ and $c_{V}$ and, on the other hand, plants (and bacteria for Pro) do the same. Does the species dependent codon bias depend on the time appearance of the amino acids ?

Finally there is much debate on the exact reasons for the selection of translationally optimal codons: to increase the translational efficiency or the accuracy of the translation ? At the moment our model does not give at all any indication for the favoured mechanism. Possibly some hints can be obtained by a study of the mutation bias along with the minimization of the codon-anticodon interaction.

4 Conclusions

The above presented model for the genetic code can be seen as a attempt towards a theoretical approach in the complex domain of the sciences of life. Two main notions, already contained in the title of this review, have been used: Symmetry and Minimum Energy Principle. The first one had a particularly spectacular development during the twentieth century, as well as in mathematics under the general label of ”Group Theory” as in several domains of fundamental physics such as Relativity, Quantum Physics and High Energy Physics. The second one, generally called the ”principle of least action”, appeared earlier with the works of Leibniz, Fermat, Euler and is generally attributed to Maupertuis, who, during the Siècle des Lumières, felt that “Nature is thrifty in all its actions”.

As developed in this contribution, the Group Theory approach we proposed seems well adapted to represent the constituent actors in the genetic code and to describe some of their effects: in other words, the Crystal Basis Model allows a rather powerful ”parametrization” of our problem. On the other hand, the least action principle is a more accepted and used notion in the different domains of physics, may be owing to its naturalness. In this spirit, the spin-spin like potential we have built to describe the codon-anticodon interaction can also appear owing to its conceptual simplicity.

For these reasons, it seems to us worthwhile to pursue investigations in the framework of our model. Such developments could be carried out as well as in the mathematical side as in the phenomenological one. Let us for example note the construction of a distance between two sequences of DNA or RNA [13] still in the framework of our model: this work deserves to be developed and applied. In the context of evolution of the Genetic Code, it looks worthwhile to study in more details the behavior of the parameters CH and CV the role of which is determinant. More generally, a refinement of the interaction potential deserves to be considered.

Finally, let us end this contribution by emphasizing that, among the important questions which deserve to be considered, the adaptability of the code with the increasing complexity of the organisms is a crucial one. It will be worthwhile to see to what extend our methods can be used for such a problem. In [14] a mathematical model, always in our framework, has been presented in which the main features (numbers of encoded a.a., dimensions and structure of synonymous codon multiplet) are obtained, requiring stability of the genetic code against mutations modeled by suitable operators.

Thus, our scheme appears rather well adapted to reproduce the features of the codon capture theory as well as to provide a mathematical framework for a more quantitative and detailed description of the theory.

5 Appendix: A short tutorial on Group Theory

The aim of this short lesson on symmetry is to provide the necessary informations for the reader who desires to follow the mathematical aspects of group theory used in the crystal basis model developed in this review. The first part deals with general notions of a Lie group, while in the second part the notions of quantum group and so-called crystal basis are presented.

5.1 A I: Symmetry and Group Theory

Definition 1: A group ${\mathcal{G}}$ is a set of elements together with a composition law - we denote it by “ . ” - such that:

$\forall\,x,y\in{\mathcal{G}}\,\quad\qquad x.y\in{\mathcal{G}}\qquad\qquad\qquad(internal\,law)$ 2. 2.

$\forall\,x,y,z\in{\mathcal{G}}\qquad(x.y).z=x.(y.z)\qquad\;(associativity)$ 3. 3.

$\exists\,e\in{\mathcal{G}},\;\mid\forall\,x\in{\mathcal{G}}\qquad\;\;x.e=e.x=x\qquad(e\equiv identity)$ 4. 4.

$\forall\,x\in{\mathcal{G}},\;\mid\exists\,x^{-1}\in{\mathcal{G}}\qquad x.x^{-1}=x^{-1}.x=e\qquad(x^{-1}inverse\,of\,x)$

Examples: $Z=\{n\}$ , $n$ integer or ${\mathbf{R}}=\{real\,numbers\}$ with “ + ” as internal law. $\mathbf{P_{n}}$ : group of permutations of n objects. Set of rotations in the plane around an origin.

Actually, in Physics, groups are never considered abstractly, but as action on some set $\mathbf{S}$ : we will talk about group of transformations. More precisely:

Definition 2: Let ${\mathcal{G}}$ be a group, $\mathbf{S}$ a set. An action of ${\mathcal{G}}$ on $\mathbf{S}$ is an application: ${\mathcal{G}}\times\mathbf{S}\rightarrow\mathbf{S}$ that is: $(g,s)\rightarrow g(s$ ) if $g\in{\mathcal{G}}$ and $s\in\mathbf{S}$ such that: $\forall\,g,g\prime\in{\mathcal{G}}$ and $\forall\,s\in\mathbf{S}$ : $g(g\prime(s))=(g.g\prime)(s)$ and $e(s)=s$

A (transformation) group needs to be “represented”.

Definition 3: A linear representation of a group ${\mathcal{G}}$ in a vector space $\mathbf{V}$ (itself defined on ${\mathbf{R}}$ or on ${\mathbf{C}}$ ) is an homomorphism $D$ of ${\mathcal{G}}$ on the group of linear and invertible operators on $\mathbf{V}$ , that is: $\forall\,g\in{\mathcal{G}}\rightarrow D(g)$ such that: $\forall g,g\prime\,\in{\mathcal{G}}:D(g).D(g\prime)=D(g.g\prime)$ .

We note that $D(g)$ is a $n\,\times\,n$ matrix if $\mathbf{V}$ is of (finite) n-dimension.

We like to say that “Nature is full of symmetries $\ldots$ and of symmetry breaking” . For instance, let us imagine the three dimensional real space $\mathbf{R_{3}}$ as an homogeneous isotropic space and let us put (in an idealistic way) an electron $e^{-}$ at the point O. Then the interaction of $e^{-}$ with a second charged particle $f$ will only depend of the distance between $e^{-}$ and $f$ : in other words, the physics will be invariant under the three dimensional group of rotations (we denote it $\mathcal{S}O(3)$ , see below) around O. Now, let us introduce a magnetic field $\vec{B}$ going in a certain direction $z$ : the interaction of $e^{-}$ with $\vec{B}$ will no more be invariant under the whole $\mathcal{SO}(3)$ group, but only under a “subgroup” of it consisting of rotations in a plane perpendicular to the $z$ axis. So, we have performed a “breaking” of the original symmetry, and only a part of the previous set of symmetry transformations will remain as good symmetries. This phenomenon is general and it is worthwhile to know the remaining symmetries. Whence the importance of the following - and natural - definition:

Definition 4: A subgroup $\mathcal{H}$ of a group $\mathcal{G}$ is a (non-empty) part $\mathcal{H}\subset\mathcal{G}$ , which is a group with the composition law induced by $\mathcal{G}$ . $\mathcal{H}$ is a proper subgroup of $\mathcal{G}$ if $\mathcal{H}\neq\mathcal{H}$ and $\mathcal{H}\neq\{e\}$ .

Types of ( symmetry ) groups in physics:

•

with a finite number of elements: case of crystallographic groups;

•

with an infinite number of elements:

–

discrete groups (number of elements in one to one correspondence with ${\mathbf{Z}}$ (i.e. the set of integers);

–

continuous groups: Lie groups for gauge theories, classification of particles, spin group, group of symmetry of space-time ( Poincaré group,..)

An example of Lie group:

Group of rotations in the real plane around an origin O. It is then defined by $2\times 2$ orthogonal matrices (with real entries):

[TABLE]

such a matrix transforming the 2 dim. real vector $\vec{X}=(x,y)$ into $R(\theta)\vec{X}=\vec{X\prime}$ with components:

[TABLE]

such that :

[TABLE]

Let us recall that:

[TABLE]

leading, after a simple computation, to :

[TABLE]

where

[TABLE]

$\mathbf{M}$ is called the infinitesimal generator of the rotation group in two dimension, itself usually denoted by $\mathcal{O}(2)$ . $\theta$ is called the parameter of $\mathcal{O}(2)$ . Actually, we have just considered one of the simplest Lie group: it has one and only one continuous parameter ( $\theta$ determines completely the angle of rotation around the origin for the group element $R(\theta)$ ).

Let us make things more complicated, and consider now the group of two by two unitary matrices, that is the group of $2\times 2$ complex matrices $\mathcal{U}$ satisfying

[TABLE]

but with the components of $\vec{X}$ being complex numbers: $x=a+ib,y=c+id$ and the scalar product $(\vec{X},\vec{X})$ being defined by:

[TABLE]

Then, the condition $(\mathcal{U}\vec{X},\mathcal{U}\vec{X})=(\vec{X},\vec{X})$ will be satisfied, for any $\vec{X}$ , if and only if: $\mathcal{U}^{{\dagger}}=\mathcal{U}^{-1}$ with $\mathcal{U}^{{\dagger}}$ obtained by replacing each entry of $\mathcal{U}$ by its complex conjugate and transposing the matrix (with respect to the main diagonal) and $\mathcal{U}^{-1}$ denoting the inverse of $\mathcal{U}$ ( $\mathcal{U}\mathcal{U}^{-1}=\mathbf{1}$ , where $\mathbf{1}$ is the $2\times 2$ identity matrix).

Example:

[TABLE]

The usual notation for the group of unitary $2\times 2$ matrices is $\mathcal{U}(2)$ , and is $\mathcal{SU}(2)$ for its subgroup with elements $\mathcal{U}$ of $\mathcal{U}(2)$ satisfying det ( $\mathcal{U}$ )= 1. As for the $\mathcal{O}(2)$ case, an exponential expression can be obtained for any $\mathcal{U}$ in $\mathcal{SU}(2)$ :

[TABLE]

where the three $2\times 2$ $\sigma$ matrices are the so called Pauli matrices defined as follows:

[TABLE]

and satisfying the commutation relations:

[TABLE]

In $\mathcal{SU}(2)$ there are three real continuous parameters, namely a, b and c and then three infinitesimal generators: $\sigma_{1},\sigma_{2},\sigma_{3}$ .

In any Lie group, the infinitesimal generators form a basis of the Lie algebra of the corresponding Lie group.

Let us give the general definition of a Lie algebra A:

Definition 5: A Lie algebra is an algebra, that is first a linear vector space on $\bf{R}$ (or on $\bf{C}\dots$ ) with a second internal law @ satisfying (if we denote + the first law of the vector space):

[TABLE]

Moreover, this second law satisfies:

[TABLE]

One can easily note that, in our case, the @ internal law is just the commutator, denoted by [., .]:

[TABLE]

The property of a Lie group (or continuous group of transformations satisfying some more analytical properties that we will not consider here in order not to overload this first introductory lesson)element to be written as an exponential of an element of its Lie algebra is particularly useful. Indeed, it will now be possible to work most of the time (at least when topological questions are not on purpose) with the Lie algebra, that is replacing tedious and enormous computations on the group by computations involving mainly linear algebras! That will be particularly precious for constructing and studying representations of Lie groups.

Actually, the $\mathcal{SU}(2)$ group is also known as the “spin” group in elementary particle physics.

Now, let us consider a little more to notion of representations, as mathematically defined by Definition 3. Indeed, the same group can act non trivially on spaces of different dimensions. The $\mathcal{SU}(2)$ group is defined by the set of $2\times 2$ unitary matrices: then its natural space of representation is the 2-dim. complex plane. We can say that the “fundamental” representation of $\mathcal{SU}(2)$ is given by the $2\times 2$ unitary matrices of determinant = 1, acting on the 2 dimensional complex plane ${\bf C}_{2}$ , which we call the “representation” space.

We will now construct other $\mathcal{SU}(2)$ representations. But, before, let us remark that any element $\mathcal{U}=exp(a\sigma_{1}+b\sigma_{2}+c\sigma_{3})$ in $\mathcal{SU}(2)$ can be transformed into another element of $\mathcal{SU}(2)\;V=U\prime\;U\;(U\prime)^{-1}$ which is diagonal. Indeed, the matrix $H=a\sigma_{1}+b\sigma_{2}+c\sigma_{3}$ is Hermitian, that is $H=H^{{\dagger}}$ (see definition above) and so can be diagonalized by an unitary matrix $U\prime$ , whence V diagonal. Note that $U$ and $V$ are mathematically equivalent in the sense that there is a change of basis in ${\bf C}_{2}$ - actually given by $U\prime$ - which will allow to see $U$ as a diagonal matrix. And we will have:

[TABLE]

At the Lie algebra level, the diagonal generator ( $1/2\sigma_{3}$ ) has two eigenvalues: +1/2 and -1/2 associated to the two eigenvectors (1, 0) and (0,-1): for a physicist, these are the two spin states $\uparrow$ and $\downarrow$ . Moreover, one can see that the matrices: $\sigma_{\pm}=\sigma_{1}\pm i\sigma_{2}$ transforms the vector (0,1) into (1,0) and (1,0) into the null vector (0,0) (resp. (1,0) into (0,1) and (0,1) into (0,0)) (they are called “raising and lowering operators” ). We also note the commutation relations:

[TABLE]

But one knows that there are not only particles of spin 1/2, there are also particles of spin 0, 1, 3/2, 2,..They will lie in other representations of the group $\mathcal{SU}(2)$ . Let us see how to construct them. For such a purpose we need to define the tensorial product of two vector spaces.

Definition 6: Let V and $\bf{V\prime}$ two vector spaces of respective dimensions $n$ and $n\prime$ , and respective basis ( $\vec{e_{1}},\ldots,\vec{e_{n}}$ ) and ( $\vec{e_{1}},\ldots,\vec{e_{n}\prime}$ ). We define the tensor product ${\bf V}\otimes{\bf V\prime}$ as the vector space of dimension $n\times n\prime$ with elements: $\sum_{(i,j)}\,\alpha(i,j)[\vec{e_{i}}\otimes\vec{e_{j}\prime}]$ with $i=1,\ldots,n$ and $j=1,\ldots,n\prime$ and $\alpha(i,j)\,\in{\bf R}$ . If the group $\mathcal{G}$ acts on ${\bf V}$ via the representation $D$ and on $\bf{V\prime}$ via the representation $D\prime$ such that $\forall\,g\in\mathcal{G}\rightarrow D(g)$ acts on $\bf{V}$ and $\forall\,g\in\mathcal{G}\rightarrow D\prime(g)$ acts on $\bf{V\prime}$ , on $\bf{V}\otimes\bf{V\prime}$ one can define the $\mathcal{G}$ -epresentation $D\otimes D\prime$ such that: $\forall\,g\in\mathcal{G}\rightarrow(D\otimes D\prime)(g)=D(g)\otimes D\prime(g)$ .

Now, let us take as V the 2-dimensional complex vector space ${\bf C}_{2}$ and consider the action of $\mathcal{G}=\mathcal{SU}(2)$ on ${\bf V}\otimes{\bf V}$ . Then any $U\,\in\mathcal{G}$ acts on any vector $v\otimes v\prime$ of ${\bf V}\otimes{\bf V}$ as:

[TABLE]

But we know that we can rewrite U as:

[TABLE]

Then, infinitesimally, we will have:

[TABLE]

therefore, infinitesimally - or in other words: at the Lie algebra level, we will have:

[TABLE]

So, let us start with the vector: $\uparrow\otimes\uparrow$ . The $\sigma_{-}$ -action will give:

[TABLE]

But we know that we have four vectors in the tensor space under consideration. After some simple computation, one can see that the vector: $(\uparrow\otimes\downarrow-\downarrow\otimes\uparrow)$ is such that the action of $\sigma_{-}$ as well as the action of $\sigma_{+}$ on it gives 0. Thus, the four dimensional space ${\bf C}_{2}\otimes{\bf C}_{2}$ is indeed a good representation space for $\mathcal{G}=\mathcal{SU}(2)$ , but it splits into two subspaces , one of dimension 3 and one of dimension 1, each of them being a good representation of $\mathcal{G}$ . Actually, each of these two ${\bf C}_{2}\otimes{\bf C}_{2}$ subspaces are “invariant subspaces under $\mathcal{G}$ ” , and we have obtained what are called “irreducible representations of $\mathcal{G}$ ”. Let us define correctly these objects:

Definition 7: Let D be a representation of the group $\mathcal{G}$ in V. the subspace $\bf{E}\subset\bf{V}$ is an invariant subspace of V under D if:

[TABLE]

Definition 8: The representation D of $\mathcal{G}$ in V is irreducible if there is no invariant subspace, except the trivial one (i.e.: 0). If not D is said reducible.

In the just considered case, the representation ${\bf C}_{2}\otimes{\bf C}_{2}$ of $\mathcal{SU}(2)$ , is reducible and decomposes in two separate representations, we have a “partition” of the space)) irreducible $\mathcal{SU}(2)$ representations, of dimension 3 and 1 respectively. Remark: The basis $\sigma_{+}$ , $\sigma_{-}$ and $\sigma_{3}$ is more likely associated to the Lie algebra of the group $\mathcal{S}l(2,R)$ , often - and incorrectly - written $\mathcal{S}l(2)$ , and defined as the group of $2\times 2$ real matrices with determinant = 1. It is a non-compact form of $\mathcal{SU}(2)$ ; both $\mathcal{S}l(2)$ , and $\mathcal{SU}(2)$ possess the same set of irreducible finite dimensional representations.

Conclusions from the above discussion:

•

from the $2$ -dimensional representation of $\mathcal{SU}(2)$ we have constructed the (irreducible) $3$ -dimensional representation and the 1-dimensional (or trivial)one. The first one corresponds to the “spin” one representation, with the three states with eigenvalue 1, 0, and -1, while the second is the “spin 0” representation with only one state with 0 eigenvalue;

•

we have also “appreciated” the powerfulness of a Lie group to have a Lie algebra which allows easier computations (a Lie algebra satisfying the property of a linear algebra).

More on (finite dimensional) representations of the $\mathcal{SU}(2)$ group: Any irreducible finite dimensional representation of $\mathcal{SU}(2)$ is usually denoted $D(j)$ with $j$ being a positive (or null ) integer or half integer. The $D(j)$ representation contains $(2j+1)$ states, each state being an eigenstate of the generator corresponding to $\sigma_{3}$ with eigenvalue: $j,j-1,\ldots,-j+1,-j$ respectively. The “spin 1” representation above discussed is then D(1) with three states associated to the eigenvalue $+1,0,-1$ respecively., while the “spin 1/2” representation $D(1/2)$ contains two states with $\sigma_{3}$ eigenvalue +1/2 and 1/2. Let us add that the product $D(j)\otimes D(j\prime)$ of the two representations $D(j)$ and $D(j\prime)$ of $\mathcal{SU}(2)$ decomposes as the sum of the irreducible representations:

[TABLE]

Finally, let us mention that the main Lie groups, at least with the most simple properties, are the following (they are called “simple groups” but we will not overload our text with the definition of a simple group)

•

$\mathcal{SO}(n)$ : orthogonal groups in $n$ -dimensional real space (i.e. group of real $n\times n$ orthogonal matrices);

•

$\mathcal{SU}(n)$ : unitary groups in $n$ -dimensional complex space (i.e. group of $n\times n$ complex unitary matrices);

•

$\mathcal{S}p(n)$ : group of $2n\times 2n$ symplectic matrices.

•

there are also 5 “exceptional” groups: the word exceptional is used because they do not enter in infinite series as the above ones.

More details and informations- and complete definitions - on this section can be found in [12].

5.2 A II: Quantum Groups and Crystal basis

It is of course not our purpose to develop in detail the theory of quantum groups as it appeared in the works of V. Drinfeld on one hand and of M. Jimbo on the other hand in the middle of the eighties, but to provide the minimum of definitions and properties of a quantum group $\mathcal{U}_{q}(g)$ , g being the Lie algebra of a Lie group $\mathcal{G}$ and $q$ denoting the corresponding deformation parameter. We have noticed in the first part of the Appendix that a Lie algebra $G$ of a Lie group $\mathcal{G}$ has a structure of linear vector space. Let us consider now the universal enveloping algebra of this Lie algebra, i.e. the space of polynomials and formal power series in $g\in G$ on which we apply the commutation relations appropriate for that Lie algebra. Then the quantum group $\mathcal{U}_{q}(g)$ will be a deformation relative to the parameter $q$ of the universal enveloping algebra $g$ . More explicitly, let us consider the example of the $\mathcal{U}_{q}((Sl(2))$ quantum group and let us denote $J_{+},J_{-}$ and $J_{3}$ the generators corresponding $\sigma_{+}$ , $\sigma_{-}$ and $\sigma_{3}$ in the $2$ -dimensional space representation, then we have:

[TABLE]

We remark that when the parameter $q\to 1$ , one recovers the $\mathcal{S}l(2)$ commutation relations777The commutation relations of eq.(56) follow from eq.(50) defining $J_{i}=\sigma_{i}/2$ , $i=1,2,3$ .

[TABLE]

Another important limit is the one corresponding to $q\to 0$ . A detailed study of this case has first been done by M. Kashiwara [8] who found particularly well behaved base called“ Crystal Base” . Particularly interesting for our purpose is the rule providing the product of two irreducible representations of $\mathcal{U}_{q}(G)$ when $q\to 0$ . A remarkable property of such a rule is that the elements in the obtained representation spaces arising from the product of two representations are not linear combinations of states, as in the case of an usual group as $\mathcal{S}l(2)$ (see the example considered in Appendix A I), but only made of a single product of the form $u\prime\otimes v\prime$ with $u\prime\in\,B_{1}$ and $v\prime\in\,B_{2}$ : see Theorem 2.1 in Subsection 2.1.

We can make more explicit the way of computing such states by considering the example of the product $D(3/2)\otimes D(1)$ which decomposes as - see above in Appendix AI:

[TABLE]

In Fig.1, we have representedby black points on an horizontal line the four $J_{3}$ eigenstates $3/2,1/2,-1/2,-3/2$ of $D(3/2$ ) and on a vertical line the three $J_{3}$ eigenstates $1,0,-1$ of $D(1)$ . Using the above theorem, the six states in the obtained representation $D(5/2)$ appear as the black points in the upper elbow constituted by the two - one horizontal and one vertical - segments, the same for $D(3/2)$ in the lower elbow, while finally the two states of the $D(1/2$ ) representation states show up in the small horizontal segment below:

It is this property which is intensively used in our symmetry approach of the genetic code. In our model, the used quantum group is $\mathcal{U}_{q}(Sl(2)\oplus Sl(2))$ , $q\to 0$ . Let us make more explicit the construction of dinuclotides from the product:

[TABLE]

and following the prescription given in the scheme(1)in Subsection 2.1. Starting from the state CC, the action of $J_{-}$ in $Sl(2)_{H}$ will provide UC and UU successively, while the action of $J_{-}$ in $Sl(2)_{V}$ gives GC and GG from CC, AC and AG from UC, and finally AU and AA from UU. Using once more the diagrammatic rule above - that is the Kashiwara theorem - one gets CU as a singlet of $Sl(2)_{H}$ but member of a triplet of $Sl(2)_{V}$ , in the same way CG as a singlet of $Sl(2)_{V}$ but in a triplet of $Sl(2)_{H}$ and finally CA as a singlet of both $Sl(2)$ .

6 Acknowledgments

P.S. would like to express his gratitude to Professor R. Mondaini for his kind invitation to present our results at the BIOMAT 2016 International Symposium, Tianjin,China, and to encourage us to write a developed review on our model.

He is also indebted to Professor B. Dragovich for his warm invitation as a speaker to the BelBI2016 International Symposium, Belgrade, and also for his constant and friendly support in the development of our model.

Bibliography14

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] E. Schrödinger, What is life ? , Cambridge University Press (1944).
2[2] L. Frappat, A. Sciarrino and P. Sorba, Phys.Lett. A 250 214, (1998).
3[3] L. Frappat, A. Sciarrino and P. Sorba, Phys.Lett. A 311 264, (2003).
4[4] L. Frappat, A. Sciarrino and P. Sorba, J.Biol.Phys. 28 17, (2002).
5[5] A. Sciarrino and P. Sorba, Bio Systems 107 113, (2012).
6[6] A. Sciarrino and P. Sorba, Bio Systems 111 175, (2013).
7[7] A. Sciarrino and P. Sorba, Bio Systems 141 20, (2016).
8[8] M. Kashiwara, Commun.Math.Phys. 133 , 249 (1990).