Kullback-Leibler Potential for Non-Ergodic Replication Dynamics:An Information-Theoretic Second Law

Tatsuaki Tsuruyama

arXiv:2508.19894·math-ph·November 20, 2025

Kullback-Leibler Potential for Non-Ergodic Replication Dynamics:An Information-Theoretic Second Law

Tatsuaki Tsuruyama

PDF

TL;DR

This paper introduces an information-theoretic second law based on the Kullback-Leibler divergence to quantify information degradation during replication, with implications for biological processes like DNA replication.

Contribution

It develops a mathematical model using Markov processes and Gaussian convolution to analyze information degradation and extends the framework to biological information processes.

Findings

01

KLD decreases monotonically during replication, acting as a Lyapunov function.

02

The framework links free energy to information degradation in biological systems.

03

Provides a unified thermodynamic view of information replication and repair.

Abstract

This study aims to quantify and visualize the degradation of fidelity (information degradation) that inevitably accompanies the replication of information within the framework of information thermodynamics and to propose an information-theoretic formulation of the second law based on this phenomenon. While previous research in information thermodynamics has focused on the thermodynamic costs associated with information "erasure'' or "measurement'' through concepts such as Landauer's principle and mutual information, little systematic discussion has addressed the inherently irreversible nature of "replication'' itself and the accompanying degradation of information structure. In this study, we construct a mathematical model of information replication using a discrete Markov model and Gaussian convolution, and quantify changes in information at each replication step: Shannon entropy,…

Figures3

Click any figure to enlarge with its caption.

Equations201

X = j = 1 ⨆ m X_{j},

X = j = 1 ⨆ m X_{j},

Δ (X) := {p \in R^{d} : p \geq 0, 1^{⊤} p = 1},

Δ (X) := {p \in R^{d} : p \geq 0, 1^{⊤} p = 1},

(pT) (y) = x \in X \sum p (x) T (x, y), p_{n + 1} = p_{n} T,

(pT) (y) = x \in X \sum p (x) T (x, y), p_{n + 1} = p_{n} T,

x \in X_{j} ⟹ T (x, X_{j}) = 1 (j = 1, \dots, m),

x \in X_{j} ⟹ T (x, X_{j}) = 1 (j = 1, \dots, m),

w_{j} (p_{n + 1}) = w_{j} (p_{n}), (T p)^{(j)} = T_{j} p^{(j)} .

w_{j} (p_{n + 1}) = w_{j} (p_{n}), (T p)^{(j)} = T_{j} p^{(j)} .

w_{j} (p) := p (X_{j}) = x \in X_{j} \sum p (x), p^{(j)} (x) := {p (x) / w_{j} (p), 0, x \in X_{j}, x \in / X_{j} .

w_{j} (p) := p (X_{j}) = x \in X_{j} \sum p (x), p^{(j)} (x) := {p (x) / w_{j} (p), 0, x \in X_{j}, x \in / X_{j} .

p = j = 1 \sum m w_{j} (p) p^{(j)} .

p = j = 1 \sum m w_{j} (p) p^{(j)} .

I_{j} := {π \in P (X_{j}) : π T_{j} = π} .

I_{j} := {π \in P (X_{j}) : π T_{j} = π} .

Π (p_{0}) := {π = j = 1 \sum m w_{j} (p_{0}) π_{j} : π_{j} \in I_{j}},

Π (p_{0}) := {π = j = 1 \sum m w_{j} (p_{0}) π_{j} : π_{j} \in I_{j}},

\exists ν \in Δ (X_{j}), ε > 0 s.t. K (x, \cdot) \geq ε ν (\cdot) \forall x \in X_{j},

\exists ν \in Δ (X_{j}), ε > 0 s.t. K (x, \cdot) \geq ε ν (\cdot) \forall x \in X_{j},

δ (K) := 1 - x, x^{'} min y \in X_{j} \sum min {K (x, y), K (x^{'}, y)},

δ (K) := 1 - x, x^{'} min y \in X_{j} \sum min {K (x, y), K (x^{'}, y)},

w_{j} (p_{+}) = w_{j} (p),

w_{j} (p_{+}) = w_{j} (p),

(T p)^{(j)} = T_{j} p^{(j)} .

(T p)^{(j)} = T_{j} p^{(j)} .

(T p)^{(j)} (A)

(T p)^{(j)} (A)

= \frac{\sum _{x \in X_{j}} p ( x ) T _{j} ( x , A )}{w _{j} ( p )}

= x \in X_{j} \sum \frac{p ( x )}{w _{j} ( p )} T_{j} (x, A)

= (T_{j} p^{(j)}) (A) .

D_{KL} (p ∥ π) = j = 1 \sum m w_{j} (p) D_{KL} (p^{(j)} ∥ π_{j}) + j = 1 \sum m w_{j} (p) lo g \frac{w _{j} ( p )}{w _{j}} .

D_{KL} (p ∥ π) = j = 1 \sum m w_{j} (p) D_{KL} (p^{(j)} ∥ π_{j}) + j = 1 \sum m w_{j} (p) lo g \frac{w _{j} ( p )}{w _{j}} .

D_{KL} (p ∥ π)

D_{KL} (p ∥ π)

= j = 1 \sum m x \in X_{j} \sum w_{j} (p) p^{(j)} (x) [lo g \frac{p ^{(j)} ( x )}{π _{j} ( x )} + lo g \frac{w _{j} ( p )}{w _{j}}]

= j = 1 \sum m w_{j} (p) D_{KL} (p^{(j)} ∥ π_{j}) + j = 1 \sum m w_{j} (p) lo g \frac{w _{j} ( p )}{w _{j}} .

V (p_{n + 1}) \leq V (p_{n}) .

V (p_{n + 1}) \leq V (p_{n}) .

V (p_{n}) = j = 1 \sum m w_{j} (p_{0}) D_{KL} (p_{n}^{(j)} ∥ π_{j}^{*}) n \to \infty 0.

V (p_{n}) = j = 1 \sum m w_{j} (p_{0}) D_{KL} (p_{n}^{(j)} ∥ π_{j}^{*}) n \to \infty 0.

D_{KL} (T p ∥ π)

D_{KL} (T p ∥ π)

= j = 1 \sum m w_{j} (p_{0}) D_{KL} (T_{j} p^{(j)} ∥ π_{j}) .

D_{KL} (T_{j} r ∥ π_{j}) \leq D_{KL} (r ∥ π_{j}) (π_{j} T_{j} = π_{j}),

D_{KL} (T_{j} r ∥ π_{j}) \leq D_{KL} (r ∥ π_{j}) (π_{j} T_{j} = π_{j}),

D_{KL} (T p ∥ π) \leq j = 1 \sum m w_{j} (p_{0}) D_{KL} (p^{(j)} ∥ π_{j}) = D_{KL} (p ∥ π) .

D_{KL} (T p ∥ π) \leq j = 1 \sum m w_{j} (p_{0}) D_{KL} (p^{(j)} ∥ π_{j}) = D_{KL} (p ∥ π) .

Π_{P, δ} (p_{0}) := {π \in Δ (X) \forall j, i \in X_{j} \sum π_{i} - w_{j} (p_{0}) \leq δ}, V_{δ} (p) := π \in Π_{P, δ} (p_{0}) in f D_{KL} (p ∥ π) .

Π_{P, δ} (p_{0}) := {π \in Δ (X) \forall j, i \in X_{j} \sum π_{i} - w_{j} (p_{0}) \leq δ}, V_{δ} (p) := π \in Π_{P, δ} (p_{0}) in f D_{KL} (p ∥ π) .

w_{j}^{⋆} = [\frac{w _{j} ( p )}{τ ^{⋆}}]_{[a_{j}, b_{j}]}, a_{j} := w_{j} (p_{0}) - δ, b_{j} := w_{j} (p_{0}) + δ,

w_{j}^{⋆} = [\frac{w _{j} ( p )}{τ ^{⋆}}]_{[a_{j}, b_{j}]}, a_{j} := w_{j} (p_{0}) - δ, b_{j} := w_{j} (p_{0}) + δ,

V_{δ} (p) = j = 1 \sum m w_{j} (p) lo g \frac{w _{j} ( p )}{w _{j}^{⋆}}, w_{j}^{⋆} = [\frac{w _{j} ( p )}{τ ^{⋆}}]_{[a_{j}, b_{j}]}, j \sum w_{j}^{⋆} = 1.

V_{δ} (p) = j = 1 \sum m w_{j} (p) lo g \frac{w _{j} ( p )}{w _{j}^{⋆}}, w_{j}^{⋆} = [\frac{w _{j} ( p )}{τ ^{⋆}}]_{[a_{j}, b_{j}]}, j \sum w_{j}^{⋆} = 1.

T_{σ} (u, v) := \frac{exp ( - ( u ^{2} + v ^{2} ) / ( 2 σ ^{2} ) )}{a , b \in Z \sum exp ( - ( a ^{2} + b ^{2} ) / ( 2 σ ^{2} ) )}, (u, v) \in Z^{2}, σ > 0,

T_{σ} (u, v) := \frac{exp ( - ( u ^{2} + v ^{2} ) / ( 2 σ ^{2} ) )}{a , b \in Z \sum exp ( - ( a ^{2} + b ^{2} ) / ( 2 σ ^{2} ) )}, (u, v) \in Z^{2}, σ > 0,

(I * T_{σ}) (x, y) := u, v \in Z \sum I (x - u, y - v) T_{σ} (u, v),

(I * T_{σ}) (x, y) := u, v \in Z \sum I (x - u, y - v) T_{σ} (u, v),

q_{n} = p_{n} T_{σ}, p_{n + 1} = q_{n},

q_{n} = p_{n} T_{σ}, p_{n + 1} = q_{n},

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Kullback–Leibler Divergence Potential for Non-Ergodic Replication Dynamics: An Information-Theoretic Second Law

Tatsuaki Tsuruyama E-mail: [email protected] Department of Physics, Tohoku University, Sendai 980-8578, Japan

Department of Drug Discovery Medicine, Kyoto University, Kyoto 606-8501, Japan

Department of Clinical Laboratory, Kyoto Tachibana University, Kyoto 607-8175, Japan

Abstract

While previous research in information thermodynamics has focused on the thermodynamic costs associated with information “erasure” or “measurement” through concepts such as Landauer’s principle and mutual information, little systematic discussion has addressed the inherently irreversible nature of “replication” itself and the accompanying degradation of information structure. In this study, we construct a mathematical model of information replication using a discrete Markov model and Gaussian convolution, and quantify changes in information at each replication step: Shannon entropy, cross-entropy, and the Kullback–Leibler divergence (KLD). The monotonic decrease of the KLD potential $V$ (the minimal KLD to a reachable steady set) exhibits a Lyapunov-like property, which can be interpreted as a potential analogous to the free energy in the process by which a nonequilibrium system converges to a particular steady state. Furthermore, we extend this framework to the potential applicability to biological information processes such as DNA replication, showing that the free energy required for degradation and repair can be expressed in terms of KLD. This contributes to building a unified information-thermodynamic framework for operations such as replication, transmission, and repair of information.

1 General framework and main theorem

1.1 Block invariance and reachable steady set

The information replication operations we study (image blurring, base substitution with proofreading, etc.) are inherently local: one step only rearranges probability mass within a constrained neighborhood (spatial or compositional), but does not freely mix all microscopic states. As a result, certain coarse observables, e.g., the total mass inside a spatial region of an image, or the adenine(A) -thymine (T) rich block and guanine(G) -cytosine (C) rich block in DNA to be replicated as a template, are effectively conserved by every replication step. Empirical mutation processes are asymmetric between the AT and GC blocks. First, in various bacteria, the spectrum of spontaneous mutation is generally biased toward AT (C/G $\to$ T/A transitions dominate), implying a baseline GC $\to$ AT pressure [1]. Second, cytosines in CpG dinucleotides (often methylated as 5mC) are hypermutable via deamination, which elevates C $\to$ T transitions and locally accelerates GC $\to$ AT decay [2, 3]. Third, germline substitution rates vary strongly with the local sequence context; in humans, the 7-mer sequence context explains most of the variability in polymorphism levels [4]. In contrast to AT-directed mutational bias, recombination-associated GC-biased gene conversion (gBGC) favors the fixation of GC alleles and can increase GC content in high recombination regions without invoking fitness differences [5, 6, 7].

These well-documented asymmetries motivate modeling replication on a block partition that separates AT- from GC-rich components. In our framework, block invariance captures slow exchange between AT and GC pools while allowing within-block relaxation; consequently, the KLD potential $V$ measures the departure from the stable block set under realistic composition-dependent mutation / repair biases. When such conserved coarse variables exist, the Markov dynamics does not converge to a single global invariant distribution; instead, the state space decomposes into components that evolve independently. We call this situation non-ergodic in the sense that long-term behavior retains memory of the initial coarse structure through the conserved weights.

(i) In the image model, a blockwise Gaussian convolution forbids smoothing across block boundaries; the total intensity in each block is preserved at every step. (ii) In the DNA model, a block diagonal substitution kernel reflects biochemical or sequence constraints (e.g., AT- vs. GC-rich segments); block masses (AT vs. GC) remain fixed while within-block compositions relax. (iii) More generally, locality, topological separation, or routing restrictions in copying channels induce invariant ’blocks’ that prevent global mixing. Because replication preserves these coarse variables, the appropriate equilibrium notion is not a single point but a set of steady states obtained by fully relaxing within each invariant component while keeping the initial coarse weights fixed. This motivates modeling the state space as a disjoint union of blocks and defining a reachable steady set against which we will measure the distance using a KL-based potential. We emphasise that non-trivial block partitions $\mathcal{X}=\bigsqcup_{j=1}^{m}\mathcal{X}_{j}$ with $m>1$ are meaningful precisely when the dynamics itself possesses non-mixing components, so that probability mass cannot freely move between all microscopic states. In the fully mixing (ergodic) case, the only natural choice is the trivial partition with a single block, and our potential $V$ reduces to the standard KLD to a unique invariant steady state.

Framework and notation (matrix semantics made explicit)

We consider a finite state space partitioned into $m\in\mathbb{N}$ disjoint blocks

[TABLE]

with $|\mathcal{X}|=d<\infty$ . The probability simplex over $\mathcal{X}$ is

[TABLE]

and we write $p=(p_{x})_{x\in\mathcal{X}}\in\Delta(\mathcal{X})$ .

A one-step Markov kernel $T$ acts on the right by

[TABLE]

so $T$ is a $d\times d$ row-stochastic matrix ( $T(x,y)\geq 0$ and $\sum_{y}T(x,y)=1$ for each row $x$ ). The matrix dimension $d\times d$ equals the number of states $|\mathcal{X}|$ . Rows index the current state, columns the next state. Each block kernel $T_{j}$ has size $|\mathcal{X}_{j}|\times|\mathcal{X}_{j}|$ and updates the conditional $p^{(j)}$ while preserving its mass $w_{j}(p_{0})$ . This row-vector/right-action convention is used throughout: $p_{n+1}=p_{n}\,T$ .

Semantics: the row index is the current state, the column index is the next state, and the matrix size $d\times d$ equals the number of states. Restricting to a block $\mathcal{X}_{j}$ yields a subkernel $T_{j}$ of size $|\mathcal{X}_{j}|\times|\mathcal{X}_{j}|$ .

Block-invariant dynamics.

The dynamics is block-invariant if the transitions never cross the block boundaries:

[TABLE]

equivalently $T(x,y)=0$ for $y\notin\mathcal{X}_{j}$ whenever $x\in\mathcal{X}_{j}$ . Matrix-wise, $T$ is block diagonal: $T=\mathrm{diag}(T_{1},\dots,T_{m})$ . Under (2), block masses are conserved and conditionals evolve block-wise:

[TABLE]

Coarse variables (block masses) and within-block conditionals.

For $p\in\mathcal{P}(\mathcal{X})$ define

[TABLE]

Then $p$ decomposes as a convex mixture of conditionals within the block:

[TABLE]

For each block, invariant measures and nonemptiness of the steady set is given by:

[TABLE]

Existence. Since every $T_{j}$ is a finite row-stochastic matrix, it admits at least one invariant distribution (e.g. by Brouwer’s fixed point theorem on the simplex or Perron–Frobenius for $T_{j}^{\top}$ ), hence $\mathcal{I}_{j}\neq\emptyset$ for all $j$ . Given an initial distribution $p_{0}$ , the reachable steady set is

[TABLE]

which is nonempty. Intuitively, $\Pi(p_{0})$ collects all steady states that (i) fully relax within each block and (ii) preserve the initial coarse masses $w_{j}(p_{0})$ fixed by non-ergodic constraints.

Block primitivity.

Primitivity of a block kernel $T_{j}$ (irreducible and aperiodic; equivalently, $(T_{j})^{n}>0$ for some $n$ ) is invoked only to conclude that the within-block conditionals converge to a unique invariant $\pi_{j}^{\ast}\in\mathcal{I}_{j}$ . For a block kernel $K$ , with constant $\varepsilon$ , means

[TABLE]

which implies total-variation contraction at most $1-\varepsilon$ per step. We also define coefficient

[TABLE]

which bounds the one-step total-variation contraction by $\delta(K)$ .

1.2 Key Lemmas

Lemma 1 (Preservation of block mass and conditional evolution).

Let $p_{+}=Tp$ with block-invariant $T$ as in (2). Then for all $j=1,\dots,m$ ,

[TABLE]

and, whenever $w_{j}(p)>0$ ,

[TABLE]

Proof.

By block invariance, $T(x,\mathcal{X}_{j})=1$ for $x\in\mathcal{X}_{j}$ and [math] otherwise. Hence $w_{j}(p_{+})=(Tp)(\mathcal{X}_{j})=\sum_{x\in\mathcal{X}}p(x)\,T(x,\mathcal{X}_{j})=\sum_{x\in\mathcal{X}_{j}}p(x)\cdot 1+\sum_{x\notin\mathcal{X}_{j}}p(x)\cdot 0=\sum_{x\in\mathcal{X}_{j}}p(x)=w_{j}(p)$ .

which proves (9). For any measurable $A\subseteq\mathcal{X}_{j}$ ,

[TABLE]

establishing (10). ∎

Lemma 2 (Block decomposition of KLD).

Let $\pi=\sum_{j=1}^{m}w_{j}\,\pi_{j}$ with $w_{j}>0$ and $\pi_{j}\in\mathcal{P}(\mathcal{X}_{j})$ supported in disjoint blocks $\mathcal{X}_{j}$ . Then, for any $p\in\mathcal{P}(\mathcal{X})$ ,

[TABLE]

Proof.

Using the disjointness $\mathcal{X}=\bigsqcup_{j}\mathcal{X}_{j}$ and writing $p(x)=w_{j}(p)\,p^{(j)}(x)$ for $x\in\mathcal{X}_{j}$ ,

[TABLE]

which is (12). ∎

Equations (9)–(10) show that the dynamics preserves coarse variables $\{w_{j}\}$ and evolves conditionals within each block, while (12) cleanly separates within-block divergences from a coarse-mass mismatch term. These identities will be the backbone for the Lyapunov property established in Sec. 1.3.

1.3 Lyapunov property of the KLD potential

Statement. Let $p_{n+1}=Tp_{n}$ with a block-invariant kernel $T=\mathrm{diag}(T_{1},\dots,T_{m})$ (Sec. 1.1).

Theorem 1 (KLD potential is Lyapunov under block invariance).

For all $n\geq 0$ ,

[TABLE]

If, in addition, each block kernel $T_{j}$ is primitive (irreducible and aperiodic) with unique invariant $\pi_{j}^{\ast}\in\mathcal{I}_{j}$ , then

[TABLE]

Proof.

Fix $\pi\in\Pi(p_{0})$ and write $\pi=\sum_{j=1}^{m}w_{j}(p_{0})\,\pi_{j}$ with $\pi_{j}\in\mathcal{I}_{j}$ . Block masses are conserved along the trajectory (Lemma 1), hence $w_{j}(Tp)=w_{j}(p)=w_{j}(p_{0})$ . By the KL block decomposition (Lemma 2),

[TABLE]

Applying the data-processing inequality (DPI) blockwise,

[TABLE]

and inserting (17) into (16) yields

[TABLE]

Taking the infimum over $\pi\in\Pi(p_{0})$ proves (14). If each $T_{j}$ is primitive with invariant $\pi_{j}^{\ast}$ , then $p_{n}^{(j)}\to\pi_{j}^{\ast}$ and (15) follows. ∎

Robustness to weak inter-block leakage and re-partitioning

Motivation.

Secs. 1.2–1.3 establish a Lyapunov principle for $V$ under exact block invariance. To examine robustness to weak violations (small inter-block leakage or re-partitioning), we introduce a leakage-tolerant potential $V_{\delta}$ that continuously extends $V$ and preserves its qualitative behavior up to $O(\delta)$ slack.

Leakage-tolerant admissible set and closed form (corrected).

When leakage is small over the observation window, relax coarse masses by $\pm\delta$ and define

[TABLE]

By KL block decomposition and KKT optimality, the optimizer has the form

[TABLE]

where $\tau^{\star}>0$ is chosen so that $\sum_{j}w_{j}^{\star}=1$ . Hence

[TABLE]

Since $f(\tau):=\sum_{j}\bigl[w_{j}(p)/\tau\bigr]_{[a_{j},b_{j}]}$ is continuous and strictly decreasing in $\tau>0$ , $\tau^{\star}$ is found by a short bisection on $(0,\infty)$ .

A self-contained derivation and a sufficient threshold for strict decrease under leakage are provided in Appendix A.

Continuity and small-leakage persistence.

As $\delta\to 0$ , $V_{\delta}(p)\to V(p)$ . Hence, under sufficiently small leakage, the qualitative properties proved for $V$ (monotonicity along the dynamics, long-time limit, and the thermodynamic reading $k_{\mathrm{B}}T\,V_{\delta}$ per refresh) persist up to an $O(\delta)$ relaxation.

Coarsening by merging leaking blocks.

If mutually leaking blocks are merged to form a coarser partition $\mathcal{Q}\succeq\mathcal{P}$ , data processing implies $V_{\mathcal{Q}}(p)\leq V_{\mathcal{P}}(p)$ . Coarsening restores exact block invariance at the partition level, at the price of a more conservative (smaller) KL-based potential and a weaker maintenance-cost bound.

2 Instantiations

2.1 Image replication via Gaussian convolution

Block-patterned model.

First, we model an image with an explicit block (mosaic) structure: the pixel grid is partitioned into $m$ disjoint rectangular regions that remain isolated in the non-ergodic variant (no cross-region mass transfer). Figure 2 uses three blockings of the same $256\times 256$ image: $2\times 2$ (top), $4\times 4$ (middle), and $128\times 128$ (bottom). The ergodic variant corresponds to the same update without any blocking (global mixing).

We model a single replication step as a Markov smoothing map $T_{\sigma}$ implemented by a discrete Gaussian convolution on $I:\mathbb{Z}^{2}\to[0,1]$ followed by re-quantization. The kernel coincides with $T_{\sigma}$ here. The kernel (discrete, renormalized) is given by:

[TABLE]

so that $\sum_{u,v}T_{\sigma}(u,v)=1$ . The (global) discrete convolution is

[TABLE]

and in the non-ergodic (blockwise) variant the sum is restricted to pixels belonging to the same predefined region, which simply makes the $T_{\sigma}$ block diagonal.

Let $p_{n}$ denote the (normalized) mosaic block distribution, histogram, before step $n$ . The convolution (21) followed by the same binning induces the one-step Markov channel in the mosaic block distribution, denoted $T_{\sigma}$ so that

[TABLE]

with row vectors acting on the right.

Recorded metrics.

We view the histogram on a finite state space $\mathcal{X}$ (intensity bins).

[TABLE]

For Fig. 2 (Appendix B), values are shown in bits (base 2). Each panel annotates a certified $\varepsilon$ and the corresponding $\delta(T_{\sigma})$ for its (blockwise) kernel.

Changing minimizers.

When a block admits multiple invariant measures ( $\mathcal{I}_{j}$ not a singleton, e.g. reducible blocks), the minimizer $\pi^{\star}\!\in\!\Pi(p_{0})$ of $V(p_{n})$ can change along the trajectory even though $V$ remains nonincreasing. A concrete three-state reducible example with an explicit switching of the KLD minimizer is given in Appendix G.

2.2 DNA replication as a block-diagonal substitution process

Biophysical context.

Second, we model DNA replication with the proofreading/repairing mechanism after copying. DNA polymerases move on copied DNA stochastically with forward/backward steps, proofreading, and context-dependent kinetics [8, 9, 10, 11, 12]. To capture non-ergodicity induced by compositional structure (e.g. AT- vs. GC-rich segments), we model replication as a block-diagonal Markov process on nucleotides.

Modeling stance.

The AT/GC split is a biologically motivated stylized coarse-graining: it captures slow exchange between compositionally distinct pools. Small AT $\leftrightarrow$ GC exchange can be handled by the leakage tolerance potential $V_{\delta}$ or by coarsening the partition (see “Robustness to weak interblock leakage”).

Block-diagonal dynamics.

Let $\mathcal{X}=\{\mathrm{A},\mathrm{T},\mathrm{C},\mathrm{G}\}$ with two blocks $\mathcal{X}_{1}=\{\mathrm{A},\mathrm{T}\}$ and $\mathcal{X}_{2}=\{\mathrm{C},\mathrm{G}\}$ . A one-step substitution kernel (copying) with block invariance (2) is

[TABLE]

with $\alpha,\beta,\gamma,\delta\in(0,1)$ encoding effective substitution tendencies per step (arising from selectivity, context, and post-replicative processing). The block masses $w_{1}=p_{\mathrm{A}}+p_{\mathrm{T}}$ and $w_{2}=p_{\mathrm{C}}+p_{\mathrm{G}}$ are conserved by the lemma 1.

Proofreading/repair channel.

To incorporate proofreading/repair, we add a blockwise operator $\mathcal{R}=\mathrm{diag}(R_{1},R_{2})$ that is invoked with probability $\rho\in[0,1]$ after extension:

[TABLE]

with typically $\alpha^{\prime},\beta^{\prime},\gamma^{\prime},\delta^{\prime}<\alpha,\beta,\gamma,\delta$ (improved selectivity). The effective one-step kernel is the convex mixture.

[TABLE]

which remains block diagonal and therefore satisfies (2). The replication update reads

[TABLE]

Primitivity in this setting. For a two-state block $[[1-a,a],[b,1-b]]$ , primitiveness is equivalent to $a>0$ and $b>0$ with $(a,b)\neq(1,1)$ ; in particular, $a,b\in(0,1)$ suffices. Under the mixture $\widetilde{T}=(1-\rho)T+\rho\mathcal{R}$ , if $\mathcal{R}$ has strictly positive entries then any $\rho>0$ makes all effective entries positive, hence each block becomes primitive and $V(p_{n})\to 0$ .

KLD potential with unique block invariants.

Assume that each block chain admits a unique invariant $\pi_{j}^{\ast}\in\mathcal{I}_{j}$ . Then, combining (1) with (12) gives

[TABLE]

By Theorem 1, $V$ is a Lyapunov function for the blockwise dynamics: $V(p_{n+1})\leq V(p_{n})$ for all $n$ , and $V(p_{n})\to 0$ as $n\to\infty$ .

Local detailed balance derivation.

Within each two-state block, write the undriven rates as

[TABLE]

At equilibrium (no chemical drive), detailed balance with the block’s stationary distribution $\pi^{\ast}$ gives

[TABLE]

When a proofreading cycle provides a chemical affinity $\Delta\mu>0$ per round-trip, the local detailed balance (cycle thermodynamics) shifts log rate odds by $\Delta\mu/k_{\mathrm{B}}T$ :

[TABLE]

Equivalently, in odds form,

[TABLE]

Thus, the drive fixes odds (ratios) but not individual rates; pinning $\alpha^{\prime},\beta^{\prime}$ (or $\gamma^{\prime},\delta^{\prime}$ ) separately requires an additional timescale constraint (for example, keeping $\alpha^{\prime}+\beta^{\prime}$ fixed), which we do not assume here. As is standard in stochastic thermodynamics, the exponential tilt of rate odds by the chemical cycle affinity follows from network cycle thermodynamics. Equations (32)–(33) encode local detailed balance for rate odds; they shape steady odds and relaxation speed but do not by themselves imply primitivity, which is a structural (irreducible+aperiodic) property; cf. Theorem 2.

Note. The exponential tilt in Eq. (33) biases odds but does not by itself specify the steady entropy production rate; the latter can remain positive even when $V\to 0$ under primitive blocks.

Energetic bias (phenomenology).

In kinetic proofreading a chemical drive provides an affinity $\Delta\mu$ per cycle (e.g., via nucleotide hydrolysis). Within each invariant block, this drive tilts the two-state odds toward the proofreading-favored direction according to Eq. (33), sharpening the steady composition. Because $V$ decomposes over blocks, a stronger drive typically steepens the within-block basins (reducing the Bernoulli variance around $\pi_{j}^{\ast}$ ) and thus accelerates the monotone decay of $V$ guaranteed by Theorem 1. Crucially, the existence and monotonicity of $V$ do not require a drive; the drive only modifies the speed and the steady odds.

Primitivity of $2\times 2$ blocks and relation to Eq. (35).

For the two-state blocks in Example 2, a practical sufficient condition for block primitivity is strict positivity of every entry of the effective $2\times 2$ transition kernel: $\alpha_{\rm eff},\beta_{\rm eff},\gamma_{\rm eff},\delta_{\rm eff}\in(0,1)$ . Then each block is irreducible and aperiodic (hence primitive), and (31) implies $V(p_{n})\to 0$ . If $\alpha_{\rm eff}=0$ or $\beta_{\rm eff}=0$ , irreducibility fails; if $\alpha_{\rm eff}=\beta_{\rm eff}=1$ (and analogously for CG), period-2 oscillations arise unless additional repair/proofreading steps restore positive diagonal entries. Equation (33) (Eq. 35) expresses how $\Delta\mu$ biases rate ratios; it determines the steady odds and relaxation speed but is not, by itself, a primitivity condition, which depends on positivity and aperiodicity of the effective kernels.

Notes. The sign of $\Delta\mu$ sets the favored direction; reversing the operating bias corresponds to $\Delta\mu\!\to\!-\Delta\mu$ . At stationarity the block-wise odds inherit the exponential tilt: $\pi_{A}/\pi_{T}=\beta^{\prime}/\alpha^{\prime}$ and $\pi_{C}/\pi_{G}=\delta^{\prime}/\gamma^{\prime}$ , consistent with the proofreading picture of Hopfield [13]. This connects proofreading accuracy to energetic cost in line with kinetic proofreading thermodynamics [13, 14].

Recorded metrics and bookkeeping.

At each step, we record: $H(q_{n})$ , $H_{\times}(n)=H(p_{n},q_{n})$ , $D_{\mathrm{KL}}(p_{n}\|q_{n})$ , and $V(p_{n})$ , with monotone $V$ ensured by Theorem 1. Because $\widetilde{T}$ is block diagonal by construction, coarse transitions between $\{\mathrm{AT},\mathrm{GC}\}$ vanish at one step. Block-level confusion rates under small inter-block leakage are analyzed in the leakage section.

Readouts and figures.

The trajectories in Fig.2 show an increase in $H$ and $H_{\times}$ , a decrease in $D_{\mathrm{KL}}(p_{n}\|q_{n})$ , and a monotone decay of $V(p_{n})$ , in agreement with Theorem 1. Figure 3 shows the potential landscape $V(p)=w_{1}V_{1}(x)+w_{2}V_{2}(y)$ on $(x,y)$ , where $x$ is the A fraction in $\{\mathrm{A},\mathrm{T}\}$ and $y$ the C fraction in $\{\mathrm{C},\mathrm{G}\}$ (Appendix D).

The contour plot shows the KLD potential $V(p)=w_{1}D_{\mathrm{KL}}\!\bigl(p^{(1)}\!\parallel\!\pi_{1}^{\ast}\bigr)+w_{2}D_{\mathrm{KL}}\!\bigl(p^{(2)}\!\parallel\!\pi_{2}^{\ast}\bigr)$ on $(x,y)\in[0,1]^{2}$ , where $x$ is the fraction A in the AT block and $y$ is the fraction C in the GC block. The unique minimum (blue marker) occurs at $(x^{\ast},y^{\ast})=(\pi_{1}^{\ast}(\mathrm{A}),\pi_{2}^{\ast}(\mathrm{C}))\approx(0.338,0.600)$ , i.e., at the block-wise invariants determined by the effective rates. Because $V$ is a sum of blockwise KL terms, the level sets are nearly axis-aligned ellipses (here close to circular since $w_{1}=w_{2}=0.5$ and the curvatures are similar). A second-order expansion around $(x^{\ast},y^{\ast})$ gives $V(x,y)\approx\tfrac{w_{1}}{2\,\mu(1-\mu)}(x-\mu)^{2}+\tfrac{w_{2}}{2\,\nu(1-\nu)}(y-\nu)^{2}$ with $\mu=\pi_{1}^{\ast}(\mathrm{A})\approx 0.338$ and $\nu=\pi_{2}^{\ast}(\mathrm{C})=0.600$ , yielding coefficients $\simeq 1.12$ and $\simeq 1.04$ (nats), respectively—consistent with the nearly isotropic contours. Increasing proofreading bias decreases $\mu(1-\mu)$ or $\nu(1-\nu)$ , steepening the basin and accelerating the monotonic decay of $V(p_{n})$ (Theorem 1). The separable, convex landscape highlights that, under block invariance, the non-ergodic relaxation proceeds independently within blocks and converges to the reachable steady point $(x^{\ast},y^{\ast})$ (Fig. 3).

Note on parameter sets. Figure 2 uses a symmetric GC block ( $\gamma_{\!e}=\delta_{\!e}$ ; $\pi_{2}^{\ast}=(0.5,0.5)$ ), whereas Figure 3 uses an asymmetric GC block leading to $\pi_{2}^{\ast}=(0.60,0.40)$ . The two figures illustrate, respectively, symmetric and biased within-block odds.

An information-theoretic second law

Standing assumptions.

We consider a discrete-time Markov evolution $p_{n+1}=Tp_{n}$ on a finite state space, with a block-invariant kernel $T=\mathrm{diag}(T_{1},\dots,T_{m})$ as in Sect. 1. Let $q_{n}:=p_{n}T$ denote the one-step image of $p_{n}$ . The KLD potential is

[TABLE]

where $\Pi(p_{0})$ is the reachable steady set (Def. (1)). By Theorem 1, $V(p_{n+1})\leq V(p_{n})$ for all $n$ .

Step variables and their roles.

Define the potential drop and the one-step mismatch by

[TABLE]

Nonnegativity holds as follows: (i) $\Delta V_{n}\geq 0$ by the Lyapunov monotonicity of $V$ ; (ii) $D_{n}\geq 0$ because KLD is nonnegative (and $D_{n}=0$ iff $p_{n}=q_{n}$ ).

Decomposition of the step production.

Define the dimensionless step production

[TABLE]

It gathers the irreversible loss within one update ( $D_{n}$ ) and the progress toward the reachable steady set ( $\Delta V_{n}$ ).

Proposition 1 (Nonnegativity and telescoping sum).

For any block-invariant trajectory $\{p_{n}\}_{n=0}^{N}$ ,

[TABLE]

Proof.

By (34) and Theorem 1, $\Delta V_{n}\geq 0$ ; KLD nonnegativity gives $D_{n}\geq 0$ , hence $\mathcal{S}_{n}\geq 0$ . Summing yields $\sum_{n}\Delta V_{n}=V(p_{0})-V(p_{N})$ . ∎

Equation (36) states that for any horizon $N$ ,

[TABLE]

Thus the accumulated one-step mismatch and the state-function decrease are jointly nonnegative—an information-theoretic second-law inequality under non-ergodic (block-invariant) dynamics. The key departure from the classical ergodic setting is that $V$ references the reachable steady set rather than a single invariant distribution.

Theorem 2 (Primitivity of block kernels).

Let $T$ be a block transition kernel. If each block is primitive (irreducible and aperiodic), then the system admits a unique invariant distribution within each block, and the dynamics converge to this invariant.

Theorem 3 (Second law for non-ergodic replication: minimal form).

Equivalently to Proposition 1, for all $n$ and $N\geq 1$ ,

[TABLE]

In physical units, set $\sigma_{\rm rep}^{(n)}:=k_{\mathrm{B}}T\,\mathcal{S}_{n}$ and $\Sigma_{\rm rep}^{(N)}:=k_{\mathrm{B}}T\,\mathcal{S}^{[0:N)}$ .

Proof.

Immediate from Proposition 1. ∎

Remark 1 (Tightness conditions).

$\mathcal{S}_{n}=0$ * iff $D_{n}=\Delta V_{n}=0$ , i.e., (i) $p_{n}=q_{n}$ (a fixed point of $T$ ), and (ii) $V(p_{n+1})=V(p_{n})$ , which requires blockwise tightness of DPI as in Remark 1. Otherwise, $\mathcal{S}_{n}>0$ .*

Thermodynamic reading.

$V$ measures the distance to the reachable steady set; its drop $k_{\mathrm{B}}T\,\Delta V_{n}$ plays the role of an informational free-energy decrease, while $k_{\mathrm{B}}T\,D_{n}$ quantifies stepwise dissipative loss of distinguishability. Hence $\mathcal{S}_{n}$ acts as a nonnegative production:

[TABLE]

The monotonicity of $V$ uses Lemma 2 and DPI applied blockwise, $D_{\mathrm{KL}}(T_{j}r\|\pi_{j})\leq D_{\mathrm{KL}}(r\|\pi_{j})$ with $\pi_{j}T_{j}=\pi_{j}$ . If blocks leak, $\Pi(p_{0})$ ceases to align with the one-step map and $V$ may increase in one step We describe the blocks leak in Appendix A.

Extensions (time-dependent/continuous-time).

(1) Time-dependent kernels. If each $T_{n}$ shares the same block partition holds blockwise, then $V(p_{n+1})\leq V(p_{n})$ still holds and the theorem remains valid with $q_{n}=p_{n}T_{n}$ . (2) Continuous-time CTMC. For generator $Q$ and propagator $K(h)=e^{hQ}$ , small- $h$ expansions give $D_{\mathrm{KL}}(p\|pK(h))=\tfrac{h^{2}}{2}\,\dot{\mathcal{I}}(p;Q)+O(h^{3})\geq 0$ , while $V(p(t+h))\leq V(p(t))$ for all $h>0$ (Theorem 2 ensures positivity/primitivity). Integrating yields the continuous-time analogue:

[TABLE]

Summary.

We (i) defined $D_{n}$ and $\Delta V_{n}$ , (ii) separated their sources of nonnegativity (KLD vs. Lyapunov monotonicity), (iii) derived the telescoping-form inequality, and (iv) characterized equality, interpretation, and extensions. This mirrors Hatano–Sasa/Esposito–Van den Broeck–type formulations for Markov jump processes while crucially referencing a set of reachable steady states rather than a single invariant measure.

3 Discussion

KLD potential as informational free energy.

Under block invariance, the coarse variables $\{w_{j}\}$ are conserved (Lemma 1) while the within-block conditionals relax toward block-wise invariants. The potential $V$ in (1) therefore measures the nonequilibrium information retained beyond the conserved coarse structure. Its monotone decay (14) provides an information-theoretic second law for non-ergodic replication: fine-scale distinguishability is irreversibly lost, whereas coarse composition is preserved.

Per-step bookkeeping and physical units.

The minimal second law above decomposes the step production into two nonnegative parts, $D_{n}$ and $\Delta V_{n}$ , cf. (45). In physical units, the potential drop defines a informational free-energy change.

[TABLE]

while $k_{\mathrm{B}}T\,D_{n}$ quantifies the stepwise dissipated “informational heat.” Together they account for the net degradation incurred by one replication step.

Ergodic vs. non-ergodic limits.

If the reachable steady set $\Pi(p_{0})$ collapses to a single invariant distribution (ergodic limit), $V$ reduces to the standard KL-to-steady Lyapunov function. Under non-ergodic block-invariant constraints, $V$ is minimal KLD to the steady set $\Pi(p_{0})$ , so its decay quantifies relaxation under conserved coarse masses. This distinction is essential for replication processes where structural constraints inhibit global mixing.

Tightness and equality cases.

The equality in (14) requires blockwise tightness of the data processing inequality (Remark 1); operationally, this corresponds to being in (or effectively in) a blockwise steady state. Despite such fixed points, the decrease is strict and $V$ serves as a sensitive progress variable for replication-driven relaxation.

Thermodynamic meaning of a non-zero asymptotic $V$ .

Interpretation at long times. The following clarifies the thermodynamic reading of a non-zero asymptotic $V$ . Our KLD potential $V$ quantifies distance to the reachable steady set that preserves coarse variables (block masses). In a driven NESS with primitive blocks, $V(p_{n})\!\to\!0$ even though steady dissipation (housekeeping heat) can remain positive; $V$ is a state function, not a direct measure of instantaneous entropy production. A strictly positive long-time value emerges only when the trajectory is persistently kept away from the reachable set, e.g. due to weak inter-block leakage, non-primitive blocks, or active enforcement of coarse masses. In that case, choosing $\pi_{j}=p^{(j)}$ yields

[TABLE]

so $k_{\mathrm{B}}T\,V(p)$ is the minimal informational free-energy cost to refresh the state back to the admissible coarse constraint. If refreshes occur with frequency $f$ , the corresponding maintenance power satisfies $\mathcal{P}_{\min}\geq f\,k_{\mathrm{B}}T\,V_{\!*}$ , where $V_{\!*}$ is the stationary value under the leak/drive. Thus, a non-zero asymptotic $V$ can be read as a lower bound on the energetic cost of maintaining a coarse-grained steady state while microscopic erasure/copying continuously occurs [15, 16].

4 Conclusion and Outlook

We formulate replication as a discrete Markov map under block invariance and prove that minimal KLD to the reachable steady set,

[TABLE]

is a Lyapunov function (Theorem 1). This yields an information-theoretic second law for non-ergodic replication, complementary to Landauer-type results for erasure. Instantiations in Gaussian image copying and DNA block-diagonal substitution with proofreading exhibit the predicted behavior: monotone $V$ , increasing $H$ and cross-entropy, and decreasing $D_{\mathrm{KL}}(p_{n}\|q_{n})$ . Future directions include (i) force-resolved single-molecule tests of potential drops and no-free-copying bounds, (ii) extensions to multiscale block hierarchies and heterogeneous kinetics, and (iii) quantum/continuous-state analogues where contractive metrics (e.g. quantum $f$ -divergences) may provide replication second laws under conserved coarse observables.

Appendix A: Derivation and leakage threshold

Closed-form expression for $V_{\delta}$ (corrected).

Write $\pi=\sum_{j=1}^{m}w_{j}\,\pi_{j}$ with $w\in[0,1]^{m}$ , $\sum_{j=1}^{m}w_{j}=1$ and $\pi_{j}\in\Delta(\mathcal{X}_{j})$ . By the block decomposition of KL,

[TABLE]

For fixed $w$ , the inner minimization is attained at $\pi_{j}=p^{(j)}$ . Thus

[TABLE]

with $a_{j}:=w_{j}(p_{0})-\delta$ , $b_{j}:=w_{j}(p_{0})+\delta$ . This is a convex program. The Lagrangian with multipliers $\tau\in\mathbb{R}$ (simplex) and $\mu_{j}^{\pm}\geq 0$ (box) is

[TABLE]

KKT conditions (necessary and sufficient here) yield, for an optimum $(w^{\star},\tau^{\star},\mu^{\pm\star})$ ,

[TABLE]

Hence interior coordinates ( $a_{j}<w_{j}^{\star}<b_{j}$ ) satisfy $w_{j}^{\star}=w_{j}(p)/\tau^{\star}$ ; active coordinates stick to the nearest bound. Equivalently, the solution is the scaled-and-clipped vector that also satisfies the simplex constraint:

[TABLE]

where $\tau^{\star}>0$ is uniquely determined by the normalization. Substituting back gives

[TABLE]

The associated minimizer is $\pi_{j}^{\star}=p^{(j)}$ on each block, i.e. $\pi^{\star}\!\restriction_{\mathcal{X}_{j}}=w_{j}^{\star}\,p^{(j)}$ .

Continuity.

As $\delta\to 0$ , the box $[a_{j},b_{j}]$ collapses to $w_{j}(p_{0})$ , so $w^{\star}\to w(p_{0})$ and $V_{\delta}(p)\to V(p)$ .

Sufficient threshold for strict decrease under leakage.

Let $w_{n}=(w_{1}(p_{n}),\dots,w_{m}(p_{n}))$ and $\Delta w:=w_{n+1}-w_{n}$ . Define

[TABLE]

Using the KL block decomposition between steps,

[TABLE]

the second line is $-\kappa_{n}$ . By convexity of $x\mapsto x\log x$ and $w_{j}\geq w_{\min}$ , the first line is bounded by $\tfrac{1}{2w_{\min}}\lVert\Delta w\rVert_{1}^{2}$ . Therefore

[TABLE]

and a sufficient condition for a strict one-step decrease is

[TABLE]

Appendix B:Simulation protocol for Fig. 1 (Gaussian-copy model)

Software and environment. All results in Fig. 1 were generated in Python (NumPy/SciPy/Matplotlib). Gaussian smoothing uses scipy.ndimage.gaussian_filter with boundary condition mode="reflect".

Domain and initialization. We use three types of pixel grid. The image domain is partitioned into a regular $B_{x}\times B_{y}$ tiling of rectangular blocks (Fig. 1 uses $B_{x}=B_{y}=4$ ; we also verified $2\times 2$ , $4\times 4$ and $128\times 128$ ). Inside each block, a binary pattern is drawn i.i.d. from a Bernoulli law with a checkerboard success probability: even $(i{+}j)$ blocks use $0.8$ , odd $(i{+}j)$ blocks use $0.2$ . The random seed is fixed to $7$ . The resulting matrix is normalized to a probability mass function (pmf) $p_{0}=I_{0}/\!\sum_{x,y}I_{0}(x,y)$ .

Replication step ( $n\!\to\!n{+}1$ ). Let $p_{n}$ denote the current pmf on pixels. We set $\sigma=1.5$ pixels and apply a Gaussian smoothing, followed by renormalization so that $\sum_{x,y}q_{n}(x,y)=1$ . Two cases are considered:

•

Ergodic (global): Gaussian smoothing is applied to the entire image to produce $q_{n}$ .

•

Non-ergodic (blockwise): The image is split into blocks; the same Gaussian is applied independently within each block (no cross-block smoothing). The blocks are then stitched back together to form $q_{n}$ . This preserves the total mass inside each block at every step.

In both cases, we update $p_{n+1}=q_{n}$ . We run $n_{\text{steps}}=50$ and store snapshots at $n\in\{0,10,20,30,40,50\}$ .

Information measures (base $=2$ , in bits). At each step we compute

[TABLE]

where terms with $p_{n}=0$ or $q_{n}=0$ are omitted in the sum to avoid $\log 0$ . (To convert bits to nats multiply by $\ln 2$ .)

KLD potential $V(p_{n})$ . We use the Lyapunov-type potential consistent with the main text:

•

Ergodic (global) case: $V(p_{n})=D_{\mathrm{KL}}(p_{n}\|\pi^{\ast})$ , where $\pi^{\ast}$ is the unique invariant histogram under global convolution dynamics (approximated numerically by iterating $T_{\sigma}$ to stationarity).

•

Non-ergodic (blockwise) case: Let $b$ index blocks, $w_{b}=\sum_{(x,y)\in b}p_{n}(x,y)$ , $p_{b}$ the normalized histogram within the block, and $u_{b}$ the uniform histogram over block $b$ . Then

[TABLE]

which coincides with the distance to the reachable steady set with fixed block masses.

Visualization and units. Time series plots show $H$ , $H(p_{n},q_{n})$ , $D_{\mathrm{KL}}(p_{n}\|q_{n})$ , and $V(p_{n})$ versus the replication step $n$ . Snapshots at the specified steps use a common grayscale colorbar. All values are reported in bits.

Sanity condition on $\sigma$ . If the Gaussian width exceeds approximately one-quarter of the smallest block edge, boundary effects may dominate. In Fig. 1 (block size $128{\times}128$ , $\sigma=1.5$ ) this issue does not arise. The same qualitative trends were observed for $2{\times}2$ and $4{\times}4$ partitions.

Appendix C:Simulation details for Fig. 2 (time series)

Model reference & parameters (Fig. 2).

We use the two-block Markov model of App. C.2 (block-diagonal kernel; one-step mixture $\widetilde{T}=(1-\rho)T+\rho\mathcal{R}$ ; effective rates as in Eq. (43) and blockwise invariants $\pi_{j}^{\ast}$ therein). For Fig. 2 we set (nats)

[TABLE]

giving

[TABLE]

and invariants

[TABLE]

Initial condition and conserved masses.

We initialize $p_{0}=(p_{A},p_{T},p_{C},p_{G})=(0.6,0.1,0.2,0.1)$ and normalize. Block masses

[TABLE]

are conserved by block invariance; we store $w_{1}(p_{0})$ and $w_{2}(p_{0})$ for the potential below.

Recorded quantities (natural logs; nats).

At step $n$ we set $q_{n}:=p_{n}\,\widetilde{T}$ (for $n<N$ ) and record

[TABLE]

The KLD potential (Def. (1)) reduces to the sum of blocks.

[TABLE]

with $p_{n}^{(1)}=(p_{A},p_{T})/w_{1}(p_{n})$ and $p_{n}^{(2)}=(p_{C},p_{G})/w_{2}(p_{n})$ .

Iteration.

We iterate

[TABLE]

with $N=50$ . At each $n$ we first record $V(p_{n})$ ; then, if $n<N$ , we compute and record $H(q_{n})$ , $H_{\times}(n)$ , and $D_{\mathrm{KL}}(p_{n}\|q_{n})$ , and finally set $p_{n+1}=q_{n}$ .

Appendix D: Simulation details for Fig. 3 (potential landscape)

D.1 Model and objective.

We consider two invariant blocks $\mathcal{X}_{1}=\{\mathrm{A},\mathrm{T}\}$ and $\mathcal{X}_{2}=\{\mathrm{C},\mathrm{G}\}$ . On the grid $(x,y)\in[0,1]^{2}$ , we parameterize a block-wise distribution.

[TABLE]

where $x$ is the A fraction in the AT block and $y$ is the C fraction in the GC block. The KLD potential (Def. (1)) reduces to the separable sum

[TABLE]

in nats. We plot $V$ over $[0,1]^{2}$ and mark the minimum $(x^{\ast},y^{\ast})=\bigl(\pi_{1}^{\ast}(\mathrm{A}),\,\pi_{2}^{\ast}(\mathrm{C})\bigr)$ .

D.2 Effective rates and steady states.

One step is a convex mixture of an extension channel $T$ and a proofreading channel $\mathcal{R}$ with probability $\rho=0.30$ . Each $2\times 2$ block is

[TABLE]

so, the effective rates are

[TABLE]

AT block (Fig. 2): $\alpha=0.020,\ \beta=0.010,\ \alpha^{\prime}=0.005,\ \beta^{\prime}=0.003$ $\Rightarrow\ \alpha_{\!e}=0.0155,\ \beta_{\!e}=0.0079$ . GC block: $\gamma=0.014,\ \delta=0.021,\ \gamma^{\prime}=0.004,\ \delta^{\prime}=0.006$ $\Rightarrow\ \gamma_{\!e}=0.0110,\ \delta_{\!e}=0.0165$ . The blockwise invariants (for $[[1-a,a],[b,1-b]]$ ) are

[TABLE]

We take $(w_{1},w_{2})=(0.5,0.5)$ unless otherwise noted. The blue marker in Fig. 3 is placed at $(x^{\ast},y^{\ast})=(\pi_{1}^{\ast}(\mathrm{A}),\pi_{2}^{\ast}(\mathrm{C}))\approx(0.338,0.600)$ .

D.3 Grid and evaluation.

We use a $101\times 101$ grid on $[0,1]^{2}$ . For Bernoulli pairs we evaluate

[TABLE]

with safe clipping to avoid $\log 0$ . The color scale shows $V$ in nats (“viridis” colormap). Axes: $x$ = A fraction (AT), $y$ = C fraction (GC).

D.4 Minimal Python excerpt (reproducibility).

import numpy as np

mixture rate

rho = 0.30

AT block (same as Fig.2)

alpha, beta = 0.020, 0.010 alpha_p, beta_p = 0.005, 0.003 ae = (1 - rho)alpha + rhoalpha_p # 0.0155 be = (1 - rho)beta + rhobeta_p # 0.0079

GC block (asymmetric, to place y* \approx 0.60)

gamma, delta = 0.014, 0.021 gamma_p, delta_p = 0.004, 0.006 ge = (1 - rho)gamma + rhogamma_p # 0.0110 de = (1 - rho)delta + rhodelta_p # 0.0165

block weights

w1, w2 = 0.5, 0.5

invariants for [[1-a, a],[b, 1-b]]

pi1_A = be/(ae+be); pi1_T = ae/(ae+be) # $\approx$ (0.338, 0.662) pi2_C = de/(ge+de); pi2_G = ge/(ge+de) # (0.600, 0.400)

def kl_bernoulli(p, q, eps=1e-12): p = np.clip(p, eps, 1.0 - eps) q = np.clip(q, eps, 1.0 - eps) return p*np.log(p/q) + (1-p)*np.log((1-p)/(1-q))

grid and potential values (nats)

N = 101 xs = np.linspace(0.0, 1.0, N) # A-fraction in AT ys = np.linspace(0.0, 1.0, N) # C-fraction in GC X, Y = np.meshgrid(xs, ys, indexing=’xy’) V = w1kl_bernoulli(X, pi1_A) + w2kl_bernoulli(Y, pi2_C)

minimum location for the blue marker:

x_star, y_star = pi1_A, pi2_C

This excerpt computes the potential field $V(p)$ on the grid and the minimum $(x^{\ast},y^{\ast})$ . The plotted Fig. 3 uses a filled contour of $V$ , the “viridis” colormap, and annotates $(x^{\ast},y^{\ast})$ with a blue marker.

Appendix E:Block primitivity in DNA Example and convergence rate

Necessary and sufficient conditions for two-state blocks.

For $T=\begin{pmatrix}1-a&a\\ b&1-b\end{pmatrix}$ , primitive $\iff a>0,\ b>0$ and $(a,b)\neq(1,1)$ . Hence $a,b\in(0,1)$ suffices.

Effect of proofreading/repair mixing.

If $\mathcal{R}$ has strictly positive entries and $\rho>0$ , then all effective entries of $\widetilde{T}=(1-\rho)T+\rho\mathcal{R}$ are strictly positive, implying primitivity even when $T$ is period-2 ( $a=b=1$ ).

Relation to Eq. (33) and detailed balance.

Eq. (33) encodes local detailed balance on the enlarged network; the coarse two-state block remains reversible with invariant $\pi^{\ast}$ , so Eq. (35) sets steady odds and speeds but is not the primitivity criterion.

Convergence rate (sketch).

The second eigenvalue of a two-state block is $\lambda_{2}=1-(a+b)$ ; thus total variation decays geometrically with factor $|\lambda_{2}|$ .

Appendix F: A minimal non–block–invariant counterexample and a leakage threshold

Symbols and definitions.

State space and blocks:

[TABLE]

Initial distribution $p_{0}(a_{1},a_{2},b)=(0.6,0.3,0.1)$ . Block (coarse) masses:

[TABLE]

Reachable steady set (coarse masses fixed).

[TABLE]

One-step update with a non–block–invariant kernel.

[TABLE]

Cross–block entries are $T_{a_{1}\to b}=0.1$ and $T_{a_{2}\to b}=0.2$ . Define the net leak

[TABLE]

(Numerical values can be read off directly from $p_{1}$ .)

Block KL decomposition and consequence.

For any $\pi=\sum_{j}w_{j}(p_{0})\,\pi_{j}\in\Pi(p_{0})$ ,

[TABLE]

Since $\pi_{1}$ is free, the minimizer sets $\pi_{1}=p^{(1)}$ ; for the singleton block, $p^{(2)}=\pi_{2}=\delta_{b}$ . Therefore

[TABLE]

(Thus $V$ measures the coarse-mass mismatch when leakage is present.)

A sufficient leakage threshold for one-step decrease.

Let $w_{n}=(w_{1}(p_{n}),\dots,w_{m}(p_{n}))$ and $\Delta w:=w_{n+1}-w_{n}$ . Define

[TABLE]

A convenient sufficient condition for a strict one–step decrease is

[TABLE]

Proof sketch / origin of the constants.

By the block KL decomposition,

[TABLE]

The quadratic term is upper bounded by $\tfrac{1}{2w_{\min}}\lVert\Delta w\rVert_{1}^{2}$ via convexity of $x\mapsto x\log x$ and $w_{j}\geq w_{\min}$ . The linear term is at most $-\,\kappa_{n}$ by definition of $\kappa_{n}$ (within-block contraction toward $\pi_{j}^{\ast}$ ). Combining the two gives

[TABLE]

which yields the threshold (44).

Appendix G: Non-primitive block with a changing minimizer

Notation. Here $K$ denotes a within-block abstract Markov kernel (not the image-induced $T_{\sigma}$ of Sec. 2.1).

Consider a single block $\mathcal{X}_{j}=\{u,v,t\}$ with a reducible kernel

[TABLE]

States $u$ and $v$ are absorbing; $t$ is transient and jumps to $u$ or $v$ in one step with probabilities $\varepsilon$ and $1-\varepsilon$ , respectively. Hence the invariant set is

[TABLE]

Let $p^{(j)}_{n}$ denote the within-block law started from $p^{(j)}_{0}=(0,0,1)$ . Then

[TABLE]

For any $\lambda\in[0,1]$ ,

[TABLE]

which is minimized at $\lambda=\varepsilon$ .

Nonincrease of $V$ and minimizer switching.

If the global partition weight on this block is $w_{j}(p_{n})$ and other blocks are fixed, the contribution of block $j$ to $V$ is $w_{j}(p_{0})\,D_{\mathrm{KL}}\!\bigl(p_{n}^{(j)}\|\pi_{j}\bigr)$ for some $\pi_{j}\in\mathcal{I}_{j}$ . From (47)–(48), for $n\geq 1$ the minimizer is $\pi_{j}^{\star}=\varepsilon\,\delta_{u}+(1-\varepsilon)\,\delta_{v}$ and

[TABLE]

If we perturb the initial condition to $p^{(j)}_{0}=(0,0,1-\eta)+(\eta,0,0)$ with a small $\eta>0$ , then at time [math] the minimizer is $\lambda^{\star}_{0}>\varepsilon$ , whereas for $n\geq 1$ it becomes $\lambda^{\star}_{1}=\varepsilon$ . Thus the blockwise minimizer changes between steps, yet $V$ remains nonincreasing by Theorem 1. This shows that minimizer switching is generic whenever $\mathcal{I}_{j}$ is not a singleton (reducible blocks).

Bibliography16

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] R. Hershberg and D. A. Petrov. Evidence That Mutation Is Universally Biased Towards AT in Bacteria. P Lo S Genet. , 6 (9):e 1001115, 2010. doi: 10.1371/journal.pgen.1001115 . · doi ↗
2[2] K. J. Fryxell and W. J. Moon. Cp G Mutation Rates in the Human Genome Are Highly Dependent on Local GC Content. Mol. Biol. Evol. , 22 (3):650–658, 2005. doi: 10.1093/molbev/msi 043 . · doi ↗
3[3] D. N. Cooper, M. Mort, P. D. Stenson, E. V. Ball, and N. A. Chuzhanova. Methylation-mediated deamination of 5-methylcytosine appears to give rise to mutations causing human inherited disease in Cp Np G trinucleotides, as well as in Cp G dinucleotides. Hum. Genomics , 4 (6):406, 2010. doi: 10.1186/1479-7364-4-6-406 . · doi ↗
4[4] V. Aggarwala and B. F. Voight. An expanded sequence context model broadly explains variability in polymorphism levels across the human genome. Nat. Genet. , 48 (4):349–355, 2016. doi: 10.1038/ng.3511 . · doi ↗
5[5] L. Duret and N. Galtier. Biased gene conversion and the evolution of mammalian genomic landscapes. Annu. Rev. Genom. Hum. Genet. , 10 :285–311, 2009. doi: 10.1146/annurev-genom-082908-150001 . · doi ↗
6[6] J. A. Capra, M. J. Hubisz, D. Kostka, K. S. Pollard, and A. Siepel. A Model-Based Analysis of GC-Biased Gene Conversion in the Human and Chimpanzee Genomes. P Lo S Genet. , 9 (8):e 1003684, 2013. doi: 10.1371/journal.pgen.1003684 . · doi ↗
7[7] C. C. Weber, B. Boussau, J. Romiguier, E. D. Jarvis, and H. Ellegren. Evidence for GC-biased gene conversion as a driver of between-lineage differences in avian base composition. Genome Biol. , 15 (12):549, 2014. doi: 10.1186/s 13059-014-0549-1 . · doi ↗
8[8] G. J. L. Wuite, S. B. Smith, M. Young, D. Keller, and C. Bustamante. Single-Molecule Studies of the Effect of Template Tension on T 7 DNA Polymerase Activity. Nature , 404 (6773):103–106, 2000. doi: 10.1038/35003614 . · doi ↗

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Kullback–Leibler Divergence Potential for Non-Ergodic Replication Dynamics: An Information-Theoretic Second Law

Abstract

1 General framework and main theorem

1.1 Block invariance and reachable steady set

Framework and notation (matrix semantics made explicit)

Block-invariant dynamics.

Coarse variables (block masses) and within-block conditionals.

Block primitivity.

1.2 Key Lemmas

Lemma 1** (Preservation of block mass and conditional evolution).**

Proof.

Lemma 2** (Block decomposition of KLD).**

Proof.

1.3 Lyapunov property of the KLD potential

Theorem 1** (KLD potential is Lyapunov under block invariance).**

Proof.

Robustness to weak inter-block leakage and re-partitioning

Motivation.

Leakage-tolerant admissible set and closed form (corrected).

Continuity and small-leakage persistence.

Coarsening by merging leaking blocks.

2 Instantiations

2.1 Image replication via Gaussian convolution

Block-patterned model.

Recorded metrics.

Changing minimizers.

2.2 DNA replication as a block-diagonal substitution process

Biophysical context.

Modeling stance.

Block-diagonal dynamics.

Proofreading/repair channel.

KLD potential with unique block invariants.

Local detailed balance derivation.

Energetic bias (phenomenology).

Primitivity of 2×22\times 22×2 blocks and relation to Eq. (35).

Recorded metrics and bookkeeping.

Readouts and figures.

An information-theoretic second law

Standing assumptions.

Step variables and their roles.

Decomposition of the step production.

Proposition 1** (Nonnegativity and telescoping sum).**

Proof.

Theorem 2** (Primitivity of block kernels).**

Theorem 3** (Second law for non-ergodic replication: minimal form).**

Proof.

Remark 1** (Tightness conditions).**

Thermodynamic reading.

Extensions (time-dependent/continuous-time).

Summary.

3 Discussion

KLD potential as informational free energy.

Per-step bookkeeping and physical units.

Ergodic vs. non-ergodic limits.

Tightness and equality cases.

Thermodynamic meaning of a non-zero asymptotic VVV.

4 Conclusion and Outlook

Appendix A: Derivation and leakage threshold

Closed-form expression for VδV_{\delta}Vδ​ (corrected).

Continuity.

Sufficient threshold for strict decrease under leakage.

Appendix B:Simulation protocol for Fig. 1 (Gaussian-copy model)

Appendix C:Simulation details for Fig. 2 (time series)

Model reference & parameters (Fig. 2).

Initial condition and conserved masses.

Recorded quantities (natural logs; nats).

Iteration.

Appendix D: Simulation details for Fig. 3 (potential landscape)

D.1 Model and objective.

D.2 Effective rates and steady states.

D.3 Grid and evaluation.

D.4 Minimal Python excerpt (reproducibility).

mixture rate

AT block (same as Fig.2)

Lemma 1 (Preservation of block mass and conditional evolution).

Lemma 2 (Block decomposition of KLD).

Theorem 1 (KLD potential is Lyapunov under block invariance).

Primitivity of $2\times 2$ blocks and relation to Eq. (35).

Proposition 1 (Nonnegativity and telescoping sum).

Theorem 2 (Primitivity of block kernels).

Theorem 3 (Second law for non-ergodic replication: minimal form).

Remark 1 (Tightness conditions).

Thermodynamic meaning of a non-zero asymptotic $V$ .

Closed-form expression for $V_{\delta}$ (corrected).

Nonincrease of $V$ and minimizer switching.