The thermodynamics of quasi-deterministic digital computers
Dominique Chu

TL;DR
This paper explores the thermodynamic costs of finite-time deterministic computation, proposing a Markovian stochastic process model that allows quasi-deterministic computation with manageable energy consumption.
Contribution
It introduces a new model based on Markov processes to analyze the thermodynamics of finite-time deterministic computation, showing energy costs are manageable.
Findings
Energy scaling with accuracy is benign in the proposed model.
Quasi-deterministic computation is achievable at modest energy costs.
The model bridges the gap between ideal zero-energy limits and practical finite-time computation.
Abstract
It is now well established that there is no lower bound for the energy dissipated during a computation. The relevance of the zero-energy limit is unclear, however, because it entails computations that are unreliable or infinitely slow, or both. In contrast the thermodynamic costs of deterministic computations that complete in finite time is less well understood. We propose a model of universal computation based on Markovian stochastic processes. While strictly deterministic computation is not possible in such systems, we show that the scaling of the energy consumption in relation to the accuracy of the computation is benign. This enables quasi-deterministic computation at modest cost in energy and completing within finite time.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
\externaldocument
[SI-]SI[SI.pdf]
The thermodynamics of quasi-deterministic digital computers.
Dominique Chu
School of Computing, University of Kent, CT2 7NF, Canterbury, UK
Abstract
A central result of stochastic thermodynamics is that irreversible state transitions of Markovian systems entail a cost in terms of an infinite entropy production. A corollary of this is that strictly deterministic computation is not possible. Using a thermodynamically consistent model, we show that quasi-deterministic computation can be achieved at finite, and indeed modest cost with accuracies that are indistinguishable from deterministic behaviour for all practical purposes. Concretely, we consider the entropy production of stochastic (Markovian) systems that behave like AND and a NOT gates. Combinations of these gates can implement any logical function. We require that these gates return the correct result with a probability that is very close to 1, and additionally, that they do so within finite time. The central component of the model is a machine that can read and write binary tapes. We find that the error probability of the computation of these gates falls with the power of the system size, whereas the cost only increases linearly with the system size.
information thermodynamics, entropy, universal computation
I Introduction
There is now a renewed interest in the statistical mechanics of information processing infothermreview . Research in the area of information thermodynamics focusses on individual processes such as copying eule ; perscopy , feedback processes feedback , information engines mandal ; mcgrath , but also computational processes in chemical systems government1 ; myinterfacepaper ; wlan . What has received much less attention is universal computation, that is processes that can implement arbitrary algorithms, although there has been some efforts in modelling Turing machines (for example brandes ), and references wolpertextending ; wiese ; crutchmandal propose general limits on computation but without explicitly relating to specific models of theoretical computer science.
Interest in the physics of computation is not new. A key result in the field goes back to the 1980s, stating, somewhat surprisingly, that there is no minimal energy dissipation required during a computationleffrex ; bennetthermo . According to this, computation can be done in principle at zero energy usage. In practice this zero energy limit is unappealing because it usually requires quasi-static processes — resulting in an infinite computation time — or it entails an ultra-sensitivity to initial conditions, as for example in the billiard ball computer billard . Complementary to this is a more recent result coming out of stochastic thermodynamics stating that irreversible state transitions in stochastic systems entail an infinite entropy production. An implication of this is that models of computation that postulate irreversible state transitions, such as Turing machines or finite state automata, are physically implausible.
Real world computing machines must inhabit a regime in-between the infinite dissipation of strictly deterministic machines and the zero-energy limit. Consistent with this, in biological systems one observes routinely trade-offs between the speed, accuracy and energy usage of cellular information processing myinterfacepaper ; wlan ; myperformancelimitspaper . Yet, at the same time, deterministic computing machines with finite energy dissipation rates do exist. Their existence is not contradicting stochastic thermodynamics because in reality these machines are not truly deterministic, but they operate at extremely low, practically negligible error rates. This seems to be sufficient to allow finite, even small, energy dissipation rates of these machines.
In this contribution we will present a thermodynamically consistent model of deterministic computation. By this we mean a computation that () returns the “correct” result with a probability that is indistinguishable from 1 for all practical purposes, () does so within finite time, () is universal. By the latter condition, we mean that the model can be extended so as to implement arbitrary computational functions. () Finally, we also assume that the model is based on stochastic (Markovian) dynamics.
We will focus here on digital, or more specifically, binary computing. Determinism in analogue computers requires taking the thermodynamic limit, which leads to poor scaling of cost, accuracy and speed myperformancelimitspaper . More benign scaling can be achieved with digital computation, whereby the state space of the computing machine is partitioned into two equivalence classes. Rather than setting the computer into a specific state, it is only necessary to ensure that the machine is in one of the states of the equivalence class. Thermodynamically, this is much cheaper to achieve.
The core element of the model presented here are binary tapes. Each tape encodes a single bit, corresponding to the majority of its symbols. The idea here is that the tape represents the record of several attempts to transmit a bit value, whereby each transmission was only successful with some probability . A stochastic reading machine is used to determine reliably the bit value represented by the tape. Variations of such reading machines can mimic NOT and AND gates, and can therefore be combined to arbitrary logical circuits, thus enabling universal computation. Using this model we will probe the costs of deterministic computation , both in terms of entropy production and computation time. We will find that the scaling of cost and accuracy is benign, conducive to arbitrarily accurate computation at a finite energy expense. When run in reverse, then the reading machine can be used to write tapes, while drawing work from an external work reservoir.
II Results
II.1 The reading machine
The central element in the model we propose is the reading machine , which is based on a machine introduced by Barato and Seiffertbaratodl1 and thus ultimately on the Mandal-Jarzynski device mandal . The prima facie function of the machine is to decode a simple repetition error correction code and to set the input to the computation accurately. The machine thus performs proofreading on unreliable input. As will become clear below, has two further functions: () it is an information processor for the computational circuits and () it also mediates the extraction of free energy from a “heat reservoir” to power the computation. We will first describe how the reading machine works, then determine its accuracy, entropy production and the time its operation takes. Following that, we will show how the reading machine can be used to implement a universal set of logic gates.
The machine interacts with two binary random access tapes, and , acting as input and output respectively. By “random access tape” we mean that the symbols on the tape are not spatially organised. Each reading event results in a random tape element being accessed. The input tape is of length and contains copies of the symbol 1 and copies of the symbol 0. The second tape is of length 1, i.e. it is a single bit and will act as the output to the machine. The device also has internal states ; here is an auxiliary index, indicating the value of .
So as to function as a decoder, is to output “1” if the majority of bits on is 1, and “0” if the majority of bits are symbols of type 0. We do not require that this will work reliably when the input tape has a slight bias only. However, the machine must output the correct bit with probabilities close to 1 for as long as for some fixed value . This can be achived by a machine that works according to the following stochastic rules:
- •
At any one time the reading head of the machine accesses (reads) a symbol of .
- •
With rate the reading head accesses a new symbol of .
- •
When the reading head accesses a symbol 1 and the internal state is () then with rate the internal state transitions to and the reading head overwrites the current symbol with a 0.
- •
When the reading head accesses a symbol 0 and the internal state is () then with rate the internal state transitions to and the reading head overwrites the current symbol with a 1.
- •
When takes the value 0 and the internal state is then with rate the machine writes onto and transitions into internal state .
- •
When takes the value 1 and the internal state is then with rate the machine writes [math] onto and transitions into internal state .
In order to simplify the notation, we will define a -tape with respect to as a tape of a given length that, when used as input to , yields with a steady state probability . Here we define as the set of states where the output tape is in state 1. Analogously, a -tape is a tape that, when provided as input to , outputs 0 with probability . The parameter is a user-defined confidence indicator with . There may be many tapes that are neither -tapes nor -tapes with respect to a given .
The behaviour of the machine can be modelled as a random walk characterised by the rate of interaction with tape-elements , forward rates and backwards rates , where and . The random walk can be visualised as follows:
[TABLE]
The left-most column indicates the number of 1s on the input tape, the second column indicates the internal state and the final column illustrates the transitions. For mathematical convenience, but without limiting the generality of our argument, we can assume that the reading head is in a quasi-steady-state with the tape, i.e. for all . In this case, the machine is simplified to a 1D random walk on sites:
[TABLE]
The superscript indicates the value of . The transition rates between states and are and for the backwards and forward direction respectively, but the rates between and are in both directions. Assuming that is of the order of the other rates or faster, we can approximate the dynamics of by cutting out these two sites and connecting directly with , leading to a random walk on sites. As will become clear below, for the parameters of interest, the system spends a vanishing fraction of time on these two sites and the error made by removing them is minimal. We then end up with the final model, which is a random walk over the states:
[TABLE]
Here, the states have been relabelled so that corresponds to and analogously for other states. The index specifies the number of 1s that have been consumed from in order to reach the specified state. In the following we will predominantly be interested in the probability of a particular tape to be recognised as :
[TABLE]
The model makes only sense if .
II.2 Accuracy and resource usage of the reading machine machine
In this section we analyse the resource usage of the reading machine. We will find that the “computation” time and the entropy production scale linearly with the number of internal state , whereas the error probability scales with the power .
II.2.1 Accuracy
We could now formulate a master equation for the probability that the system is in state at time ; we are however more interested in the corresponding steady-state probability . Due to detailed balance the steady-state probabilities obey
[TABLE]
Solving this for yields
[TABLE]
This leads to an expression of in terms of statistical weights
[TABLE]
where and .
There is no useful analytical expression for this probability, but for long tapes, when , the rates of the random walk are approximately uniform and can be replaced by a fixed fraction . Remembering that we can now write the steady-state probability for as
[TABLE]
This result is exact in the limit . An important special case for this equation is and where the exact and the approximate solutions coincide also for finite and the probability to transmit the correct bit becomes . This means, that in this case the machine does not improve on the accuracy of the tape, i.e. it does not perform any proofreading. For moderately large and eq. 3 can be further approximated to obtain an estimate for the error probability
[TABLE]
The error falls with the power of . This means that with a probability that is arbitrarily close to 1 the machine can recognise tapes correctly even if is only marginally above . When then the accuracy of recognition is only limited by . For finite the accuracy increases with and approaches eq. 3 asymptotically; the accuracy also increases with , up to an optimal beyond which the internal mechanism of the machine deprives the tape of too many 1 symbols and significantly lowers their proportion, which interferes with a proper functioning of the machine.
II.2.2 Entropy production
The operation of the reading machine is accompanied by entropy production. Inserting a tape takes the machine out of equilibrium and initiates a relaxation back to equilibrium. The entropy production ceases on average once equilibrium is reached. Using the standard ansatz of stochastic thermodynamics stochtherm , the entropy export associated with a transition from state to state is . The second component of the entropy is the system or “Shannon” entropy which works out as the difference between the logarithm of the probability of the initial and the final state, . The total entropy production is the average over all initial and final states. In the case of reading a -tape the greatest amount of entropy is produced when the initial state is , because in this case the greatest number entropy producing steps are necessary in order to drive the system to its equilibrium which is a narrow distribution around . In this case the entropy production becomes:
[TABLE]
In the limit of an analytic expression for can be obtained.
[TABLE]
This shows that the entropy production is linear in .
II.2.3 Computing time
The second resource consumed by the machine is the time to reach equilibrium. While the relaxation time is infinite in a strict mathematical sense, a time scale for relaxation can be identified with the mean first passage time (MFPT) to reach state from some initial state 111We could have equally chosen any site with , without altering the conclusions materially.. In the worst case, this initial state is in which case the MFPT is given by garten ; mylimitedpaper :
[TABLE]
In general, this formula needs to be evaluated numerically. A compact, albeit approximate analytical expression can be obtained in the case and .
[TABLE]
For large and this equation can be approximated to , where . This shows that the computing time is linear in . Note that for finite the linearity regime is limited to . For larger the time to compute increases exponentially as grows. Again, the exponential increase is due to the deprivation of the tape for 1 symbols, as .
In summary, the reading machine can determine whether a given input is a -tape or a -tape. By adjusting the parameters of the machine, it is possible to make this decision with arbitrary accuracy at a finite cost and within finite time.
II.3 Logic gates
The reading machine can be used as a basic component to build AND and NOT gates, which in turn can be combined to build arbitrary computational circuits. A NOT gate is obtained from the basic reading machine by swapping the state labels of . This does not affect the properties of the machine, such as the computing time or the entropy production.
The AND gate is more involved. It requires two inputs, and respectively. We therefore require an extended reading machine that accepts two input tapes. Its output is, as in the standard reading machine, a single element output tape . Each of the inputs and can be either a -tape or a -tape, each of length . Initially is set up as the combination of two independent, non-interacting, reading machines .
The computation of an AND gate proceeds in two separate steps. () Set the input to the gate. First, the inputs and are set by providing each of the independent reading machines and with their respective inputs and letting them reach their equilibrium states. The internal states of the combined machine can then be written as . Tape drives state transitions of type , and drives interactions of type . The superscript of the internal state label indicates the bit value of . It changes from [math] to during the transition , for a fixed threshold and arbitrary .
() Start the computation proper after a time of order has passed. The inputs and are then disconnected and the internal state reservoirs are allowed to interact by enabling the state transitions . The backwards and forward rates should be equal and independent of and .
There are choices for the parameters of the reading machines and the threshold such that the output tape behaves like a quasi-deterministic AND gate. Define as the sum of the indices of after the inputs have been set, but before the internal states are connected. is is distributed according to
[TABLE]
If both inputs to the gate are -tapes then before the internal states are connected the state labels of and will be the same on average with . After the computation step this will not change on average. The state will therefore be , i.e. the output of the gate is 1 quasi-deterministically as long as was sufficiently high on the original tape, and is small enough in comparison to . A similar argument applies to the case where both inputs are zero.
The accuracy of the gate is limited by the probability to get the correct output for mixed input, i.e. a -tape and a -tape as and . A correct computation must yield the output 0. Yet, the average state label of after initialisation will be close to , whereas the average state label of after initialisation will be close to [math]. The precise probability distributions for the two cases are given by eq. 2. After the computation step, both state labels will be about . In order for the output to be correct, the threshold needs to be chosen such that the label of state never fluctuates to or beyond for the mixed input. Given a set of parameters for the writing machines, there is an optimal choice for , namely the index that minimises ; see fig. 3. Here is the probability that the state of the machine is given that the input and are -tapes.
This case of mixed input also leads to an additional entropy production , which is a result of the two internal states being connected and relaxing to a joint equilibrium. Assume that after setting the input, the machine was in state and . After the computation step, the state is and the state label follows a binomial distribution , where . This reflects the fact that the internal states have “equilibrated” with one another. At the beginning of the computation step, the system is out of equilibrium with Shannon entropies that are distributed according to . Hence, the change in entropy is
[TABLE]
Here, is calculated according to eq. 1. This entropy production is a direct consequence of the logical irreversibility of the AND gate, and is related to Landauer’s limit. The NOT gate, which is logically reversible, does not have such an extra dissipative component. The entropy production is also the reason why the setting of the input and the computation must be separated processes. If not there would be an ongoing competition between computation and initialisation with ongoing need for energy input. Note that in the case the entropy production will be very small, i.e. .
II.4 The writing machine
A computational cycle is closed by writing the output of the computation to a tape. It is possible to run the reading machine run in reverse in order to write a -tape or a -tape. The following modifications are necessary: The input tape is a tape of length 1, the output tape is of length and the machine has internal states, . The transition rules of the writing machine are as follows:
- •
When the internal state is and , then with rate , the internal state goes into state while writing [math] onto . The reverse transition happens with the same rate.
- •
With rate the machine gets in contact with a new symbol from . If the machine is in state then with probability it will be in contact with symbol , and with probability it will be in contact with symbol 0.
- •
With rate the machine writes a [math] onto the tape and goes into state (provided ).
- •
With rate the machine writes a onto the tape and goes into state (provided ).
Altogether, the machine, when in state overwrites the current symbol of with a with rate and correspondingly writes a [math] with rate . The system can be modelled as a biased random walk of the state label . In the long-term limit the average state label has a simple closed form (see SM).
[TABLE]
where . The accuracy of the writing machine is the probability to find a particular tape element of in the correct state, i.e. a in the case of a -tape or 0 in the case of a -tape. The average work required to write a tape with accuracy is:
[TABLE]
We can now also relate the error probability of a reading machine to the cost of reconstituting the tape. From eq. 9, the proportion of correct symbols written by the writing machine is ; together with eq. 4 the probability that the reading machine fails, then scales like so
[TABLE]
Finally, the MFPT to write the full tape can be evaluated along the same lines as eq. 7.
[TABLE]
where and are shorthand for the forward and backwards rates of the random walk respectively. No useful closed form expression exists for and it needs to be evaluated numerically (see SM). Note that the computation time cannot be expressed solely in terms of the ratio , but depends on the absolute scale of the rates. This reflects the fact that the system can be made arbitrarily fast at no additional cost by scaling the reaction rates.
The most cost efficient way to write a tape of a particular type (i.e. a -tape or a -tape) with a given accuracy is to start from a relaxed tape of length with, on average, symbols of type 1. To convert this tape into, say, a -tape with (almost) only symbols of 0, only about half the symbols need to be modified. The average cost of this writing procedure is with the tape initially in state (see SM). Writing a -tape is entirely analogous, but requires a special writing machine for -tapes.
Additional costs arise when the bit to be recorded to tape is unknown, which is typically the case at the end of a computational cycle. One protocol to deal with this case is as follows:
Prepare a writing machine that outputs -tapes with internal states by resetting its internal state to . 2. 2.
Initialise the input tape of the writing machine with , i.e. the output tape from the preceding computation. 3. 3.
Initialise with a -tape. 4. 4.
Wait for a time of the order and then remove .
If then the machine would not have modified the output tape , no extra cost arises here. Otherwise, the -tape output would have been overwritten to be a -tape at a cost proportional to rather than , i.e. twice the work to write a -tape directly from an initially relaxed tape. In both cases the cost to write the original -tape accrues and is .
An additional cost comes from the reset during step 1. If the machines were not reset, then it would be initially in a random state internal state , where the state label is uniformly distributed across all possible states with average . This is a source of error, because if the machine is initially in state it would write 1s onto the tape, irrespective of the input. When then this could significantly degrade the quality of the output tape . In the case of , this would not be harmful though.
The reset of the writing machine comes at the average cost of . Altogether, therefore, the cost of writing a -tape is , which makes the average cost .
This result is not a fundamental lower limit for the writing of the output, which can only be reached using quasi-static protocols; also see SM section for alternative ways to copy a tape.
III Discussion
In the model presented here, all computational processes complete within a finite time. The error probability of the computation falls much more rapidly to zero than the entropy production increases, which makes it possible to achieve quasi-deterministic computations at finite cost in finite time. More specifically, the entropy production associated with setting the input diverges linearly with the ability to correct, which is parametrised by the number of internal states . This parameter also determines the cost of re-constituting the input tape after the computation. Since in this model the tape serves a dual role as a power source and information storage, the reconstitution cost is the actual cost of the computation. Note that it is normally not necessary to write tapes de-novo at a cost because a computation only overwrites at most symbols on the input tape. An additional cost arises when executing the AND gate. This cost is a consequence of the logical irreversibility of the operation. It too scales linearly with . In contrast to the linear scaling of the cost, the probability that the computer returns the wrong results follows (see eq. 4).
The benign scaling of this machine is re-assuring vis-à-vis the existence of real world deterministic computing machines, which are in reality only quasi-deterministic, i.e. stochastic with a very low error probability. Indeed, determinism in electronic circuits is achieved by using principles that are formally not too dissimilar from the model presented here. Bit values are represented as voltage spikes. If the amplitude of a spike exceeds a certain voltage threshold, then it is interpreted as a 1. The probability of an error can be reduced arbitrarily by choosing the correct threshold value in relation to the average voltage peak and typical fluctuations.
All this begs the question why biological systems do not, at least not universally, use a similar route to deterministic computation. Unlike electronic machine, in vivo computation is inherently stochastic and subject to performance trade-offs. Part of the explanation may be that cellular computing is analogue, rather than digital, and not admitting such a benign scaling. Another reason could be that the infra-structure required to perform digital computation cheaply, is itself not cheap to maintain. Here, we have not included the maintenance cost of the reading machine whereas a biological cell has to produce and maintain or “compute” the reading machine itself. This may not be worthwhile doing.
As a final remark, we note that the model used here is but an application of the idea of non-confusable subset coding from information theory mckayinfo . One may wonder whether more advanced block coding schemes could be used to get an even better performance. We conjecture that the computational cost of decoding puts a limit to the use of error correction codes. The computation necessary during the decoding step would itself require energy and likely render the energy-accuracy balance unfavourable.
Acknowledgements.
The author thanks Thomas Ouldridge for valuable comments and discussions on early drafts of this manuscript.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1(1) J. Parrondo, J. Horowitz, and T. Sagawa. Thermodynamics of information. Nature Physics , 11(2):131–139, feb 2015.
- 2(2) T. Ouldridge, C. Govern, and P. ten Wolde. Thermodynamics of computational copying in biochemical systems. Physical Review X , 7(2), apr 2017.
- 3(3) T. Ouldridge and P. ten Wolde. Fundamental costs in the production and destruction of persistent polymer copies. Physical Review Letters , 118(15), apr 2017.
- 4(4) T. Sagawa and M. Ueda. Nonequilibrium thermodynamics of feedback control. Physical Review E , 85(2), feb 2012.
- 5(5) D. Mandal and C. Jarzynski. Work and information processing in a solvable model of Maxwell’s demon. Proceedings of the National Academy of Sciences , 109(29):11641–11645, jul 2012.
- 6(6) T. Mc Grath, N. Jones, P. ten Wolde, and Thomas E. Ouldridge. Biochemical machines for the interconversion of mutual information and work. Physical Review Letters , 118(2), jan 2017.
- 7(7) C. Govern and P. ten Wolde. Energy dissipation and noise correlations in biochemical sensing. Physical Review Letters , 113(25):258102, Dec 2014.
- 8(8) N. Zabet and D. Chu. Computational limits to binary genes. Journal of the Royal Society Interface , 7(47):945–954, Jun 2010.
