A Physical Framework for Algorithmic Entropy

Jeff Edmonds

PMC · DOI:10.3390/e28010061·January 4, 2026

A Physical Framework for Algorithmic Entropy

Jeff Edmonds

PDF

Open Access

TL;DR

This paper introduces a physical framework to better understand the relationship between entropy and algorithmic complexity, using intuitive models to explain abstract concepts.

Contribution

The paper explicitly identifies the complexity of a probability distribution with the physical complexity of a macrostate.

Findings

01

The 'Not Alone' principle is shown to naturally arise from the physical framework.

02

Algorithmic information imposes structural constraints on physical systems.

03

Apparent paradoxes in physics are resolved through the lens of this framework.

Abstract

This paper does not aim to prove new mathematical theorems or claim a fundamental unification of physics and information, but rather to provide a new pedagogical framework for interpreting foundational results in algorithmic information theory. Our focus is on understanding the profound connection between entropy and Kolmogorov complexity. We achieve this by applying these concepts to a physical model. Our work is centered on the distinction, first articulated by Boltzmann, between observable low-complexity macrostates and unobservable high-complexity microstates. We re-examine the known relationships linking complexity and probability, as detailed in works like Li and Vitányi’s An Introduction to Kolmogorov Complexity and Its Applications. Our contribution is to explicitly identify the abstract complexity of a probability distribution K(ρ) with the concrete physical complexity of a…

Figures1

Click any figure to enlarge with its caption.

Funding1

—Natural Sciences and Engineering Research Council of Canada (NSERC)

Keywords

Kolmogorov complexityentropymacrostatemicrostateLevin’s Coding Theoremphase spacedeterminismcoarse-grainingLiouville’s Theoremgravityfoundational principles

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComputability, Logic, AI Algorithms · Statistical Mechanics and Entropy · Advanced Thermodynamics and Statistical Mechanics

Full text

1. Introduction

Boltzmann’s entropy $[eqn]$ was as important to the Industrial Age as Shannon’s is to the Information Age [1,2]. Both quantify uncertainty and the spread of information. Parallel to this, Kolmogorov complexity $[eqn]$ provides a measure of the information content of an individual object as the length of the shortest computer program required to describe it [3]. The deep interconnections between these frameworks are comprehensively detailed in landmark texts, most notably Li and Vitányi’s An Introduction to Kolmogorov Complexity and Its Applications [4].

The results in algorithmic information theory are often powerfully abstract and mathematically dense. This paper aims to provide a new pedagogical interpretation of these foundational results by explicitly applying them to the physical framework of thermodynamics. We emphasize that this implies a structural analogy to aid intuition, rather than a claim that continuous physical reality is identical to discrete algorithmic strings. Our central thesis revolves around the distinction, first articulated by Boltzmann, between observable low-complexity macrostates (like the temperature of a gas) and the unobservable high-complexity microstates (the precise configuration of all particles) that comprise them.

We must be clear about our contribution. The mathematical results we discuss—including the Levin–Chaitin bound [5,6] $[eqn]$ and the “Not Alone” principle (c.f. [4], (Thm. 2.1.3))—are known and foundational. The novelty of this paper lies not in proving new theorems but in synthesizing them into a single intuitive framework. Our central thesis is that the abstract complexity of a distribution $[eqn]$ can be identified with the concrete physical complexity of an observable macrostate $[eqn]$ . Though we say that a macrostate represents a collection of properties/constraints, we mean this only for intuition. Any attempt to map these properties to probabilities $[eqn]$ is well beyond the scope of this paper. In this framework, we explicitly identify the macrostate M with the code/program that generates its probability distribution $[eqn]$ . Thus, the complexity of the macrostate $[eqn]$ is effectively defined as the complexity of the distribution $[eqn]$ .

This physical framework allows us to re-derive and re-interpret these known results, demonstrating how they naturally emerge as structural constraints on physical systems. Our main application is exploring the “Not Alone” principle: we show how a microstate’s high complexity must be balanced by the size of the cluster it inhabits, where the cluster is defined by any simple computable property.

It is important to distinguish our approach from the seminal work of Zurek [7], who also integrated algorithmic randomness with physical entropy. While Zurek focused on the thermodynamic cost of measurement and the operation of heat engines, our work aims to provide a structural framework connecting the abstract properties of Kolmogorov complexity directly to the definition of macrostates and their cluster sizes.

While our framework highlights the deep structural connections between informational entropy and thermodynamic entropy, we acknowledge the ongoing debate regarding whether they are truly identical or merely analogous. Researchers such as Elitzur [8] and Meyer [9] have argued that information (or complexity) and thermodynamic “order” are distinct characteristics that should not be conflated. Elitzur posits that thermodynamic entropy measures the dispersal of energy, which is distinct from the informational content related to structure and complexity—a crystal, for example, has low entropy but low informational complexity, whereas a living organism has low entropy but high informational complexity. Meyer reinforces this by arguing that information is a measurable physical quantity distinct from thermodynamic entropy, essential for understanding biological organization. Our work takes the perspective that treating them as unified provides powerful pedagogical insights for statistical mechanics, but we respect the distinction emphasized in these works regarding biological and functional complexity.

Our contributions can be summarized as follows:

(1) A Physical Framework: An intuitive model for pedagogical purposes that explicitly identifies the abstract complexity of a distribution $[eqn]$ with the concrete complexity of a physical macrostate $[eqn]$ .
(2) A New Interpretation of the Not Alone Principle: We use this framework to show how the known “Not Alone” result (c.f. [4], (Thm. 2.1.3)) arises as a natural structural constraint linking a microstate’s high complexity to the size of the cluster defined by any of its simple properties.
(3) A Unified View: We demonstrate how this physical interpretation connects to other foundational results including the Sandwich Theorem ( $[eqn]$ ) and the bounded variance of complexity in uniform distributions.
(4) Macro- vs. Microstates in Physics: This section explores how macro-level concepts in physics are derived from the micro-level properties of particles. With this, we are able to resolve many of the field’s apparent paradoxes. This perspective reveals that the foundational laws of thermodynamics are not arbitrary but are the necessary statistical consequences of simple rules applied to a complex world, all governed by the universal logic of information.

**The paper is organized as follows: **Section 2 defines the key terms used in the theorems and illustrates each with intuition and concrete examples. Section 3 explores illustrative examples. Section 4 states the formal theorems and provides their proofs. Section 5 explores the connection to physics in more detail. Section 6 concludes. I end by acknowledging Paul Vitányi, Ming Li, and my AI collaborators.

2. Intuition About the Definitions and Results

In this section, we lay out the key concepts and intuitions that underlie the results of this paper, supported by concrete examples. Our goal is to offer an accessible and computationally grounded explanation of the connection between entropy and Kolmogorov complexity. Central to our perspective is the distinction between observable low-complexity macrostates and unobservable high-complexity microstates. We do not know to what extent this intuition was part of Levin’s original motivation for his Coding Theorem; however, we believe that the following explanation is an important contribution of this paper.

The Physical World: Boltzmann defined the macro- and microstates of a physical system like a gas [1].Microstate α is a finite binary string encoding a physical system’s complete unobservable high-complexity configuration

(e.g., it could encode the positions, velocities, and masses of all ≈ $[eqn]$ particles). Properties P(α) = p: Let P be any computable property of a microstate. For a given $[eqn]$ , let $[eqn]$ be the value of that property. The set $[eqn]$ is the cluster of all microstates sharing that property value (e.g., the temperature of the gas).Macrostate M represents the observable low-complexity collection of properties of the physical system (e.g., the temperature and pressure).Probability ρ(α) denotes the probability $[eqn]$ of the system being in microstate $[eqn]$ given that we know that it is in macrostate $[eqn]$ (e.g., in a system influenced by gravity, microstates representing molecules higher up have lower probabilities $[eqn]$ ).S_M_ is the set of microstates consistent with this macrostate, namely $[eqn]$ .Program $[eqn]$ for the macrostate is assumed to operate in two modes:
An approximation mode $[eqn]$ , which outputs an $[eqn]$ -approximation of the real-valued probability $[eqn]$ .
A decision mode $[eqn]$ , which returns $[eqn]$ to decide membership in $[eqn]$ . (It might be undecidable even with $[eqn]$ to know whether $[eqn]$ 0 because this probability might be arbitrarily small.)
As noted above, we explicitly identify the macrostate $[eqn]$ with the code/program that generates its probability distribution $[eqn]$ . Entropy $[eqn]$ : Boltzmann’s thermodynamics-motivated entropy [1] is defined to be the logarithm of the number of microstates consistent with the observed macrostate, namely (Physicists often use $[eqn]$ , where $[eqn]$ is Boltzmann’s constant. We use base-2 logarithms measuring entropy in bits, in line with Shannon and Kolmogorov).

[eqn]

(e.g., for a gas with $[eqn]$ particles in a room-sized of volume $[eqn]$ , $[eqn]$ , so $[eqn]$ ).

Shannon [2] shifted the idea of entropy to informational uncertainty defining

[eqn]

measuring the expected number of bits of information needed to specify a randomly chosen microstate $[eqn]$ given that it is already known that $[eqn]$ . The intuition is that if all microstates $[eqn]$ had the same probability $[eqn]$ , their number would be $[eqn]$ , and the optimal code length for transmitting $[eqn]$ would be $[eqn]$ bits. Kolmogorov Complexity: Unlike entropy, which measures the information content of a distribution of objects, Kolmogorov Complexity measures the information content of a single object.Micro-Complexity K(α): The Kolmogorov Complexity $[eqn]$ of microstate $[eqn]$ is defined as the length of the shortest prefix-free program $[eqn]$ that outputs $[eqn]$ . Let $[eqn]$ be a fixed universal Turing machine. A program $[eqn]$ is a binary string such that $[eqn]$ . Prefix-Free: No valid program is a prefix of another. This allows programs to be concatenated and unambiguously separated.

This is assumed to be enormous. K(P) and K(M) denote the length of the shortest program $[eqn]$ that computes property $[eqn]$ and both the probability $[eqn]$ for macrostate $[eqn]$ and membership $[eqn]$ in $[eqn]$ .
For example, a very simple program $[eqn]$ computes the temperature of a gas from the velocity of each of its particles. K(p) denote the length of the shortest program $[eqn]$ that outputs the value $[eqn]$ . Note that some numbers like $[eqn]$ have short programs relative to their values.
For example, the number of gas particles $[eqn]$ could be represented by the size $[eqn]$ of its binary description or, even better, by $[eqn]$ bits. The latter program, for example, could just output the number 27, specifying $[eqn]$ . Macro-Complexity K(M, P, p) denotes the length of the shortest program that does all three.
This is assumed to be small. Results Presented: Though they have very different formulations the entropy $[eqn]$ of a macrostate and the Kolmogorov complexity $[eqn]$ of its microstates are closely related.
Levin’s Coding Theorem 1: $[eqn]$
Sandwich Theorem 2: $[eqn]$
Uniform Case Theorem 3: $[eqn]$
$[eqn]$
The Not Alone Theorem 4: $[eqn]$
See Section 4.1, Section 4.2, Section 4.3 and Section 4.4. Natural: In natural physical scenarios, we assume the micro-complexity $[eqn]$ is enormous while the total macro-complexity $[eqn]$ is small.c ≈ 1000, $[eqn]$ , and $[eqn]$ are assumed to be “constants”.Tight: This assumption is what makes the Sandwich Theorem $[eqn]$ , for example, tight.

3. Extreme Cases

The following examples illustrate the results in extreme regimes:

Counting Strings of a Given Complexity, $[eqn]$ :
The number of microstates with complexity $[eqn]$ is at most $[eqn]$ . This is because each such microstate requires a unique prefix-free program $[eqn]$ of length $[eqn]$ and there are at most $[eqn]$ such programs available.
Random Strings: An $[eqn]$ -bit string $[eqn]$ is considered random if it is incompressible, i.e., $[eqn]$ . Most strings are random in this sense. The fraction of $[eqn]$ -bit strings that can be compressed by more than $[eqn]$ bits is at most $[eqn]$ because there are fewer than $[eqn]$ prefix-free programs of length less than $[eqn]$ .
Strings with 49% Zeros: Let $[eqn]$ be the macrostate of all $[eqn]$ -bit strings containing exactly 49% zeros for some fixed value of $[eqn]$ . Then, by Stirling’s approximation,

[eqn]

where $[eqn]$ is the binary entropy function. Because a short program can check the 49%-zero condition, we know that $[eqn]$ is small. Given a particular such string $[eqn]$ , it is not clear how one would write a short program that outputs it. Levin’s Coding Theorem 1, however, gives such a program of length $[eqn]$ . This is only an upper bound, as some strings in $[eqn]$ may be highly compressible (e.g., a string of $[eqn]$ zeros followed by $[eqn]$ ones).

All Microstates: Let $[eqn]$ be the macrostate of all microstates $[eqn]$ of length $[eqn]$ . The number of such strings $[eqn]$ is $[eqn]$ , so $[eqn]$ . Most such strings have complexity $[eqn]$ . The complexity $[eqn]$ of the macrostate itself is the size of the program that checks if $[eqn]$ has length $[eqn]$ . This is at most $[eqn]$ , as it only needs to encode the value $[eqn]$ . Here, Theorem 1 is tight in the natural extreme.

[eqn]

Single-Element Macrostate: Let $[eqn]$ be the macrostate that accepts only one microstate $[eqn]$ . Then, $[eqn]$ . Any program to check membership in $[eqn]$ must effectively encode $[eqn]$ , so $[eqn]$ . In this case, Theorem 1 is tight in the unnatural extreme case. The inequality $[eqn]$ becomes

[eqn]

4. Theorems and Proofs

4.1. Levin’s Coding Theorem 1

Levin provides a foundational (though confusing) result in algorithmic information theory. It links the complexity of a string $[eqn]$ to its universal probability $[eqn]$ —the probability that a universal Turing machine with random input will output $[eqn]$ [6]. Moreover, Chaitin [5] shows that this universal probability $[eqn]$ multiplicatively dominates all other computable probabilities like our distribution $[eqn]$ . Together, the theorem states

[eqn]

where the constant $[eqn]$ depends only on the choice of the Universal Turing and $[eqn]$ also depends on the machine computing the distribution $[eqn]$ . The key thing is that they do not depend on the string $[eqn]$ . Diving into the proof, one can see that $[eqn]$ , where $[eqn]$ is precisely the length of the program $[eqn]$ needed to approximate the probabilities $[eqn]$ and $[eqn]$ is maybe 1000. This gives the revised statement:

Theorem 1(Levin’s Coding). For any distribution $[eqn]$ and microstate $[eqn]$ ,

we have $[eqn]$ .

Qualitatively, this theorem establishes a “conservation of complexity” relative to probability. It states that an object cannot be both simple (low $[eqn]$ ) and improbable (low $[eqn]$ ) unless the distribution itself is complex. In physical terms, if a microstate is highly probable, it must have a relatively short description.

A quick proof sketch would go as follows: By definition, we give this bound on $[eqn]$ by giving a program $[eqn]$ that outputs $[eqn]$ described with $[eqn]$ bits. To $[eqn]$ bits of precision, our program is given the cumulative probability $[eqn]$ . With $[eqn]$ bits, it is given the program $[eqn]$ needed to approximate the probabilities $[eqn]$ . The remaining 1000 bits describe our program that enumerates through the strings $[eqn]$ computing this sum until the target $[eqn]$ is reached. What remains is to ensure that all the values are accurate enough so that this works. The more detailed proof is as follows.

Proof. The proof is by construction. We will describe a program that generates $[eqn]$ and its length will serve as the required upper bound on $[eqn]$ .

1. The Description of α: Our description of α consists of two parts which are fed to a fixed universal search program:
A Program for the Macrostate M: A program $[eqn]$ that for any given $[eqn]$ and a precision parameter $[eqn]$ computes its probability $[eqn]$ . The length of the shortest such program is by definition $[eqn]$ .
An Identifying Target $[eqn]$ : A binary string that uniquely identifies $[eqn]$ . This string represents a rational number defined as our “target” which is a multiple of $[eqn]$ . We define this precision $[eqn]$ as

[eqn]

This choice ensures that $[eqn]$ . The target $[eqn]$ is provided to the search program as a binary string of length $[eqn]$ bits.

2. The Search Algorithm: The Turing Machine $[eqn]$ is a fixed universal program (of length, say 1000 bits). It takes the program $[eqn]$ and the identifier $[eqn]$ as input. It then performs the following steps:

It iterates through all microstates $[eqn]$ in lexicographical order.
For each $[eqn]$ , it uses the provided program $[eqn]$ to compute an approximation $[eqn]$ of its probability. The required precision $[eqn]$ ensures that $[eqn]$ .
It maintains a running sum of these approximate probabilities.
The algorithm halts and outputs the current microstate $[eqn]$ at the exact moment this running sum surpasses the target value represented by $[eqn]$ . We claim this procedure outputs our intended microstate $[eqn]$ .

3. Correctness of the Search: We now prove that this search algorithm correctly and uniquely identifies $[eqn]$ .
Interval Width: The interval corresponding to our target microstate $[eqn]$ has width $[eqn]$ . We prove this is at least $[eqn]$ .

[eqn]

We know $[eqn]$ or else our theorem is trivially true.

Defining the Target $[eqn]$ : Imagine placing markers on the real number line at every integer multiple of $[eqn]$ . Because the interval for $[eqn]$ has a width greater than $[eqn]$ , it is guaranteed to contain at least one such marker. We define $[eqn]$ to be the value of one such marker. When the search algorithm’s running sum surpasses this value, it must have just finished adding $[eqn]$ and it correctly outputs $[eqn]$ .
Bounding |q_α_|: Recall $[eqn]$ is a binary string representing a rational number. The number of bits to the right of the decimal is at most its precision $[eqn]$ . The number of bits on the left is one because the value $[eqn]$ is less than 2, namely $[eqn]$ . The sum of probabilities $[eqn]$ is at most 1. The sum of the errors is bounded: $[eqn]$ .
4. Conclusion: The program $[eqn]$ that produces $[eqn]$ is the fixed search program $[eqn]$ with $[eqn]$ and $[eqn]$ hard-wired in. Hence, $[eqn]$ $[eqn]$ $[eqn]$ $[eqn]$ $[eqn]$ .
□

4.2. The Sandwich Theorem 2

We now explain and prove the Sandwich Theorem stating that the expected Kolmogorov complexity $[eqn]$ of a randomly chosen microstate $[eqn]$ is tightly “sandwiched” by the entropy $[eqn]$ of the macrostate.

Theorem 2(Sandwich Theorem). * $[eqn]$ * where $[eqn]$ .

In simple terms, this theorem confirms that the “typical” complexity of a microstate matches the entropy of its macrostate. This aligns with the physical intuition that for a gas in equilibrium, the complexity of a snapshot of the system is effectively determined by the volume of the phase space (entropy).

Proof. (Upper Bound) We move from the result at the micro level back to the macro level by taking the weighted sum of Levin’s Coding Theorem 1 with respect to $[eqn]$ giving

[eqn]

(Lower Bound) We compare the expected code lengths for two methods of assigning codewords to each microstate $[eqn]$ . The first method uses the shortest program $[eqn]$ for each microstate $[eqn]$ as its codeword. Recall that we required such programs to be prefix-free. By definition, the expected code length for this method is $[eqn]$ . The second method is Shannon’s code, which assigns to each microstate $[eqn]$ a codeword of the ideal length $[eqn]$ . By definition, its expected code length is $[eqn]$ . Because Shannon’s method provides the optimal expected code length [2], the expected length of the first code must be greater than or equal to the second. □

Kolmogorov-Based Conditional Entropy: Rearranging Theorems 1 and 2 gives the difference $[eqn]$ , which rings of conditional entropy. Hence, let us define it to be $[eqn]$ .

We denoted Shannon’s entropy of a macrostate by $[eqn]$ to emphasize it is a function of the macrostate’s distribution. Shannon himself might prefer the conditional entropy notation $[eqn]$ , viewing it as the expected bits needed to specify a randomly chosen $[eqn]$ given $[eqn]$ . He might express this as the difference between the information needed for $[eqn]$ and that for $[eqn]$ :

[eqn]

Conditional entropy is normally defined as $[eqn]$ . Here, we associate $[eqn]$ with $[eqn]$ and $[eqn]$ with $[eqn]$ . We assume learning $[eqn]$ determines $[eqn]$ , so $[eqn]$ . Our notation $[eqn]$ is an intuitive stand-in for the standard entropy $[eqn]$ .

Replacing $[eqn]$ with $[eqn]$ gives

[eqn]

Theorem 2 then becomes

[eqn]

This definition aligns with the standard AIT chain rule. Since the macrostate $[eqn]$ is computable from $[eqn]$ , $[eqn]$ . The symmetry of information states $[eqn]$ . Since $[eqn]$ , this yields $[eqn]$ , matching our definition.

4.3. The Uniform Case Theorem 3

The following are fun improvements when the distribution is uniform: the maximum and average complexities are close and the variance of complexity is tightly bounded by a small constant. This is in stark contrast to the non-uniform case. There exist “natural” macrostates with small $[eqn]$ and small average complexity $[eqn]$ , but infinite variance. For every $[eqn]$ , define $[eqn]$ and for every $[eqn]$ bit string $[eqn]$ , define $[eqn]$ . Then, $[eqn]$ and $[eqn]$ $[eqn]$ The first result is also key for our Not Alone Theorem 4.

Theorem 3(Uniform). Consider a uniform macrostate $[eqn]$ . *A: $[eqn]$ *

*B: $[eqn]$ *

Proof. (A) Levin’s Theorem 1 states $[eqn]$ . Replace $[eqn]$ with $[eqn]$ by applying the theorem to a microstate $[eqn]$ of maximum complexity. Because the distribution is uniform, $[eqn]$ must be finite. Because the distribution is uniform, $[eqn]$ . This gives the required $[eqn]$ . Theorem 2 states $[eqn]$ , which because uniform is the same as $[eqn]$ .(B) Partition $[eqn]$ based on their complexity relative to $[eqn]$ . Let $[eqn]$ be the set of microstates with complexity $[eqn]$ . Let $[eqn]$ be those with $[eqn]$ . For each integer $[eqn]$ , let $[eqn]$ be those with complexity exactly $[eqn]$ . Let $[eqn]$ denote this complexity value. From Theorem 3.A, we know $[eqn]$ and hence $[eqn]$ is empty. The contribution to the variance from the “middle” set is $[eqn]$ . This leaves $[eqn]$ . Note the total coefficient of $[eqn]$ is $[eqn]$ , which is equal to one. We can now consider the $[eqn]$ contribution. There are $[eqn]$ microstates with this complexity and Theorem 3.A gives that $[eqn]$ . Hence, the probability of encountering such a microstate is at most $[eqn]$ . The remaining variance is then bounded by $[eqn]$ , giving the result. □

4.4. The Not Alone Theorem 4

This section presents Theorem 2.1.3 in the book [4] in this light. We prove that observable low-complexity macrostates cannot contain lone unobservable high-complexity microstates because otherwise any program for $[eqn]$ would have to effectively “name” them, forcing $[eqn]$ . To avoid this, $[eqn]$ must be “hidden” in a large crowd of similar microstates all sharing a simple collective pattern, namely

[eqn]

By “similar”, we mean sharing the exact same value for any given low-complexity property $[eqn]$ . Namely, the cluster whose size we bound is defined to be $[eqn]$ where $[eqn]$ .

Theorem 4(Not Alone). If not empty, $[eqn]$ , where $[eqn]$ and $[eqn]$ are the average and max complexities over $[eqn]$ ,

and $[eqn]$ .

This result formalizes the intuition that “complex things cannot exist in isolation.” If a microstate is complex, it is not special; it is just one of many similar states. Structurally, this implies that high-complexity states form large, homogenous clusters (macrostates), while unique, isolated states must be simple enough to be described individually.

Example: For example, any microstate $[eqn]$ can be clustered with all those of the exactly same length $[eqn]$ , the same temperature, the same pressure, the same complexity $[eqn]$ , and the same probability $[eqn]$ simply by considering $[eqn]$ $[eqn]$ $[eqn]$ $[eqn]$ $[eqn]$ . Theorem 4 proves that this cluster has a size of at least $[eqn]$ . Note that this agrees with the first example in Section 3 that proves that the number of microstates with complexity $[eqn]$ is at most $[eqn]$ and, hence, this set is a constant fraction of these.

Here, $[eqn]$ is defined to be the time-bounded version of Kolmogorov Complexity that requires the program that outputs $[eqn]$ to halt within a specified time, e.g., $[eqn]$ steps. Unlike $[eqn]$ , the value $[eqn]$ is can be computed by a short program that simulates all $[eqn]$ -bit programs for the specified time limit $[eqn]$ . This means that the complexity $[eqn]$ of the property is small. And the set of microstates with unbounded time complexity $[eqn]$ is a super set of these.

What might be large is the complexity $[eqn]$ of outputting the value $[eqn]$ $[eqn]$ $[eqn]$ $[eqn]$ $[eqn]$ . The first value $[eqn]$ can be denoted $[eqn]$ . The next three are at most that. Hence, these values can be outputted by a program of size at most $[eqn]$ bits. The fifth value $[eqn]$ for natural macrostates, we will assume, is at least $[eqn]$ . Outputting it exactly would require too many bits. However, if you are happy with a two approximation of the probability, then $[eqn]$ bits does the trick. For a non-natural example, consider a macrostate with probability $[eqn]$ , where $[eqn]$ is the integer value of $[eqn]$ . To know the probability, a program must essentially know the entire string. Therefore, the complexity of the probability value is $[eqn]$ . The Not Alone Theorem bound $[eqn]$ correctly predicts that the cluster size is at most 1.

Empty Example: In the previous example, we chose some $[eqn]$ and then formed the cluster $[eqn]$ with similar properties. This construction ensures that the cluster contains at least $[eqn]$ itself. If, instead, we define the cluster based on some chosen property $[eqn]$ , then the cluster $[eqn]$ might be empty. In this case, the theorem is not broken because it only applies when $[eqn]$ is not empty. Alternatively, if property $[eqn]$ narrows the cluster to one microstate $[eqn]$ , then both $[eqn]$ and $[eqn]$ equal the complexity $[eqn]$ , hence the lower bound $[eqn]$ as needed.

Proof. Theorem 3.A applied to macrostate $[eqn]$ that is defined to be uniform over the cluster $[eqn]$ states $[eqn]$ . Simply exponentiating gives the result. □

As said, this result is the same as Theorem 2.1.3 in the book [4].

Theorem 5(Book Theorem 2.1.3 [4]). Let $[eqn]$ be recursively enumerable and let $[eqn]$ . Suppose $[eqn]$ is finite. Then, for some constant $[eqn]$ depending only on $[eqn]$ for all $[eqn]$ in $[eqn]$ , we have $[eqn]$ .

In our setting, $[eqn]$ , $[eqn]$ , $[eqn]$ , and $[eqn]$ .

5. Macro- vs. Microstates in Physics

This section explores how macro-level concepts in physics are derived from the micro-level properties of particles. With this, we are able to resolve some of the field’s apparent paradoxes. This perspective reveals that the foundational laws of thermodynamics are not arbitrary but are the necessary statistical consequences of simple rules applied to a complex world, all governed by the universal logic of information.

5.1. The Tension Between Discrete and Continuous Physics

We acknowledge a fundamental tension in applying algorithmic complexity (defined on discrete binary strings) to classical physics (defined on continuous variables). Our approach relies on discretization—converting continuous phase space into discrete cells. While this is a standard pedagogical tool, it has limitations. For instance, the “shape” of phase space regions (e.g., fractal strange attractors in chaotic systems) affects how entropy scales with precision in ways that simple box-counting may obscure.

Furthermore, we recognize that Quantum Mechanics offers a more rigorous interface between physics and information, where states are vectors in Hilbert space rather than discrete strings. However, our goal here is not to replace the quantum description but to provide a pedagogical bridge using the accessible language of classical statistical mechanics and algorithmic information theory.

5.2. The Precision of Phase Space Approximation

In physics, a microstate $[eqn]$ specifies the locations and momenta of all $[eqn]$ particles. With three position and three momentum real-valued coordinates per particle, $[eqn]$ is a point in $[eqn]$ space. This is called phase space. A macrostate $[eqn]$ carves out a subset of this space.

To make this discrete, let $[eqn]$ denote the set of infinitesimal cubes of volume $[eqn]$ that $[eqn]$ might be in, and let $[eqn]$ denote a set of corresponding unit-volume cubes. A probability distribution is defined using a probability density function $[eqn]$ , where $[eqn]$ is the probability of the microstate being within $[eqn]$ ’s infinitesimal cube.

Shannon’s entropy, which is the expected number of bits of information needed to specify a randomly chosen microstate $[eqn]$ , would be infinite if one had to specify which of the $[eqn]$ infinitesimal cubes $[eqn]$ is in. However, it is finite if we only need to name which unit-volume cube it is contained in (giving $[eqn]$ for a uniform distribution). We compute the continuous entropy (or differential entropy) $[eqn]$ as follows.

Choose a random $[eqn]$ according to the distribution $[eqn]$ . The probability of $[eqn]$ being in a specific unit cube is approximately $[eqn]$ (assuming $[eqn]$ is roughly constant within that cube). If all unit cubes had this same probability density, then the “number” of such cubes would be effectively $[eqn]$ , and the number of bits needed to specify $[eqn]$ ’s unit cube would be $[eqn]$ . This is the code length allocated to microstate $[eqn]$ . The expected number of bits needed is the continuous-case integral:

[eqn]

5.3. Scaling to N-Particle Systems (Algebra of Macrostates)

We apply these theorems to a system with $[eqn]$ particles by analyzing them one at a time. If we assume the particles are (mostly) independent, we can model the total macrostate $[eqn]$ as a cross product of $[eqn]$ individual macrostates: $[eqn]$ , where $[eqn]$ is the macrostate for the $[eqn]$ -th particle. This “Algebra of Macrostates” shows our theorems scale correctly.

Independent Cross Product: For $[eqn]$ , all key quantities (entropy $[eqn]$ , complexity $[eqn]$ , and average/max microstate complexity $[eqn]$ ) are additive (up to a small constant). The Not Alone Theorem’s bound $[eqn]$ remains consistent: the cluster size $[eqn]$ becomes multiplicative ( $[eqn]$ ), and since the terms in the exponent are all additive, the theorem correctly predicts this product.Dependent Cross Product: We can also define a dependent cross product $[eqn]$ , where $[eqn]$ is a simple, computable function (e.g., $[eqn]$ calculates a particle’s velocity from its position). Here, complexities increase only by the small $[eqn]$ . Both the cluster size and the theorem’s bound $[eqn]$ remain essentially unchanged.Example (The “Complex Property” Limit): This framework also handles the extreme case. Let $[eqn]$ be a function that gives $[eqn]$ a unique complex property $[eqn]$ . For example, $[eqn]$ outputs the integer $[eqn]$ (the value of $[eqn]$ ), and we define our property as $[eqn]$ .

For any microstate, its cluster $[eqn]$ is a singleton ( $[eqn]$ ). The Not Alone Theorem correctly predicts this. Since knowing the property $[eqn]$ is the same as knowing $[eqn]$ itself, the complexity of the property is enormous: $[eqn]$ . The theorem’s bound becomes

[eqn]

This confirms that a microstate can be “alone” in its cluster, but only if the property defining that cluster is just as complex as the microstate itself (and thus not a “simple” macrostate property). Scaling of $[eqn]$ : We can address the scaling of $[eqn]$ for standard physical properties. If $[eqn]$ represents a global quantity like Total Energy $[eqn]$ in an $[eqn]$ -particle system, the value $[eqn]$ scales with $[eqn]$ . However, the number of bits required to describe this value is only $[eqn]$ . Since the microstate complexity $[eqn]$ scales linearly with $[eqn]$ (i.e., ≈ $[eqn]$ bits), the cost of describing the property (≈ $[eqn]$ $[eqn]$ bits) is negligible. Thus, even for extensive physical properties, the “constant” $[eqn]$ remains small relative to the system size, and the “Not Alone” bound remains non-trivial and physically meaningful.

5.4. Newtonian Determinism and Coarse-Graining

The Second Law of Thermodynamics states the entropy of a closed system can only increase over time. Let $[eqn]$ be the macrostate with all gas particles initially in a box of volume $[eqn]$ , and let $[eqn]$ be the macrostate at time $[eqn]$ , after they have dispersed into a room of volume $[eqn]$ . Boltzmann would argue that entropy increases by $[eqn]$ because the number of microstates $[eqn]$ increases.

First, let us be clear that the disorder that is measured by entropy does not arise because there are many particles doing the independent things but because there is uncertainty in what they are doing. Section 5.3 argues that instead of considering a system with $[eqn]$ particles, we can analyzing these one at a time. If we assume the particles are (mostly) independent, we can model the total macrostate $[eqn]$ as a cross product of $[eqn]$ individual macrostates: $[eqn]$ , where $[eqn]$ is the macrostate for the $[eqn]$ -th particle. This “Algebra of Macrostates” shows our theorems scale correctly.

An apparent paradox arises in a deterministic, reversible, closed physical system. Because the laws of physics are reversible, no information is gained or lost. The entropy should not change, in violation of the Second Law.

This is best explained by seeing that the Kolmogorov complexity of the microstate $[eqn]$ and hence the entropy $[eqn]$ does not change plus or minus a small constant $[eqn]$ . This is proved by describing a small program $[eqn]$ that outputs $[eqn]$ , where $[eqn]$ outputs $[eqn]$ , $[eqn]$ outputs $[eqn]$ , and $[eqn]$ encodes the laws of physics. Being reversible, $[eqn]$ gives the other direction. This requires you both know and can express these laws of physics.

Liouville’s fundamental Theorem takes this same $[eqn]$ paradox argument a step further. A microstate $[eqn]$ (describing the position and momentum of all $[eqn]$ particles) is a point in the continuous $[eqn]$ phase space. The macrostate $[eqn]$ is the “accessible subregion” of this phase space (e.g., all particles in the box $[eqn]$ with some energy). As time passes, this region evolves to $[eqn]$ . Even though the particles spread out to fill the larger room $[eqn]$ , Liouville’s Theorem proves the total volume of the accessible phase space does not change: $[eqn]$ . See Figure 1a,b.

How? Chaos stretches and folds the accessible region $[eqn]$ into a long, skinny, filament-like region that twists and turns throughout the entire phase space, but its total $[eqn]$ -dimensional volume remains unchanged. Because the differential entropy $[eqn]$ is directly related to this volume, it also does not change. $[eqn]$ .

The way this paradox is resolved and entropy is seen to increase is through coarse-graining. See Figure 1c. Any real measurement has limited precision. We cannot distinguish between points in $[eqn]$ and nearby points that are not in $[eqn]$ . We must “blur” our vision. This noise can be modeled as follows:

Thermal Noise: Boltzmann’s “dust” being knocked around by particle collisions.
Measurement Noise: Adding Gaussian noise to each value in the final microstate $[eqn]$ .
Coarse-Graining: Making the probability distribution locally uniform within each “measurement cube” (e.g., $[eqn]$ ).

If the true accessible phase space $[eqn]$ is a long, skinny filament that twists through the entire room, any of these blurring effects will make it indistinguishable from a uniform distribution over the entire room. This larger, coarse-grained volume is what we perceive as the new, higher-entropy macrostate. This noise also helps restore the assumption of independence between particles, which is lost after they collide.

5.5. How Falling Particles Increase Entropy

Let $[eqn]$ denote the macrostate at time 0, representing a gas diffuse in a thin spherical shell of radius $[eqn]$ and area $[eqn]$ around the earth. Let $[eqn]$ represent the same gas after the particles have fallen under gravity to a smaller shell of radius $[eqn]$ and area $[eqn]$ . Being smaller, one might suspect that the volume $[eqn]$ of accessible phase space decreases, decreasing entropy, and breaking the Second Law of Thermodynamics that $[eqn]$ .

The paradox seems even worse if we focus on a single particle. Suppose the particle starts with a known velocity of zero, a known height $[eqn]$ , and a completely unknown location in the spherical shell. As the particle drops, its height and radial velocity remain known, but its location is only within the shrinking spherical shell. Hence, it takes fewer and fewer bits to communicate the missing location information, implying entropy is decreasing. Knowing such information precisely, however, is unnatural. Liouville’s Theorem requires an initial volume $[eqn]$ , which is zero unless every value has some at least infinitesimal uncertainty.

The resolution comes from Liouville’s Theorem, which states that the volume of the 6D phase space (position and momentum) is conserved. It is useful to first consider a simpler case: if gravity were a fixed, parallel acceleration $[eqn]$ , it would apply an additive change to each microstate’s velocity ( $[eqn]$ ) and position ( $[eqn]$ ). This merely shifts or shears the phase space volume $[eqn]$ , but does not change its volume. The force of gravity, however, is not a parallel force; it radiates towards the center. This radial force introduces a multiplicative scaling, which is resolved by the conservation of angular momentum. This is the same principle as a spinning ice skater: when they pull their arms in (decreasing $[eqn]$ ), they spin faster (increasing angular velocity).

Let us consider the particle’s phase space in polar coordinates. Let $[eqn]$ and $[eqn]$ denote the location (angle) and $[eqn]$ and $[eqn]$ denote the angular velocities in the two directions within the spherical shell.

Position Space Shrinks: When the radius $[eqn]$ shrinks (e.g., by a factor of 2), the area of the shell shrinks. The range of possible locations, $[eqn]$ and $[eqn]$ , also shrinks by this factor.Momentum Space Expands: By the conservation of angular momentum ( $[eqn]$ ), as $[eqn]$ shrinks by a factor of 2, the angular velocity $[eqn]$ must double by a multiplicative factor of 2. This causes the range of possible velocity variables, $[eqn]$ and $[eqn]$ , to grow by that same factor.

Suppose the initial volume of this 4D slice of phase space is $[eqn]$ , then after falling, the new volume is $[eqn]$ . As promised by Liouville’s Theorem, the multiplicative factors cancel perfectly. The accessible phase space volume $[eqn]$ does not change. Hence, the entropy does not either. The Second Law is not violated. The increase in entropy, which we observe in reality, only occurs after the fact due to coarse-graining the probability distribution. See Section 5.4.

5.6. Momentum, Energy, and Temperature from First Principles

Fundamental physical laws and definitions are linked to statistical realities. We can derive macro-level concepts like momentum, energy, pressure, and temperature from the simple, micro-level properties of particles.

Temperature vs. Particle Velocities: A key property of the microstate is the velocity $[eqn]$ and speed $[eqn]$ of each particle. The macro-parameter temperature is defined to be the average kinetic energy of these particles, namely $[eqn]$ scaled by the Boltzmann constant $[eqn]$ . When the particles of a gas are at equilibrium, the distribution for $[eqn]$ is according to the Maxwell–Boltzmann distribution. Formally, this is derived by finding the function that maximizes the system’s entropy (assuming entropy would otherwise increase). A strong physical intuition, however, comes from the Central Limit Theorem. Because it is the result of the “sum” of a vast number of random collisions, each component of the velocity $[eqn]$ is drawn independently from a Gaussian (normal) distribution:

[eqn]

A similar factor $[eqn]$ involving the potential energy of the state is added to the distribution on the height of a particle when there is a force of gravity making it exponentially unlikely for a particle to fly very far up.Momentum: Why is momentum defined as $[eqn]$ ? Because this is the quantity that is conserved in collisions. When two particles collide, Newton’s third law states they apply equal and opposite forces ( $[eqn]$ and $[eqn]$ ). Since $[eqn]$ , the two changes in momentum sum to zero. Therefore, the total momentum of the system $[eqn]$ remains a constant property of the macrostate.Angular Momentum: Kepler noted that a planet sweeps out equal areas in equal times. This is the conservation of angular momentum. It means that when a spinning ice skater pulls in their arms, they spin faster. Newton (and Richard Feynman) has a fantastic proof involving the area of triangles.Energy: It is reasonable to define potential energy to be $[eqn]$ as it should be linear in the force and distance the object has been pushed. Newton defines kinetic energy to be $[eqn]$ and not $[eqn]$ because this is the definition that allows for the conservation of energy. For a falling object (constant force $[eqn]$ ): The change in potential energy is $[eqn]$ . The change in kinetic energy is $[eqn]$ . Thus, $[eqn]$ , and the total energy $[eqn]$ is conserved. The total kinetic energy of the system $[eqn]$ is its internal thermal energy.Pressure P: Pressure is the force per unit area from particles hitting the container wall. This force is the rate of change of momentum ( $[eqn]$ ). Perhaps surprisingly, this depends on $[eqn]$ instead of on $[eqn]$ . The reason is because there are two complimentary effects:

The frequency a particle hits a wall is proportional to its speed $[eqn]$ because the ones that are twice as fast reach the wall in half the time, and hence hit the wall with twice the frequency. Consider the sub-volume with area $[eqn]$ against the container and infinitesimal height $[eqn]$ . If the particles are always uniformly distributed, the expected number in it is $[eqn]$ . To avoid worrying about collisions between particles, assume this expected number is much less than one. If there is such a particle moving more or less in the right direction, then the time until collision with the container is $[eqn]$ and the “rate” of collisions per second is $[eqn]$ .
The momentum transferred per hit is also proportional to $[eqn]$ . On collision, the perpendicular component of the momentum $[eqn]$ is transferred (times two because the particle bounces).

Concluding, the rate of momentum transfer is $[eqn]$ , as needed. This gives that the total force is proportional to $[eqn]$ . This is convenient because it directly links pressure to the average kinetic energy per unit volume. The exact relation is $[eqn]$ , where $[eqn]$ is the average kinetic energy of one particle. Recall $[eqn]$ . This gives the Ideal Gas Law $[eqn]$ .

6. Conclusions

This paper has provided an accessible and computationally grounded framework for understanding the deep connection between entropy and Kolmogorov complexity. Our central contribution is to show that this connection is not merely a mathematical abstraction but a direct consequence of the structural constraints governing how information is described.

Our constructive proof of a tighter Levin’s Coding Theorem reveals the explicit computational cost of specifying a microstate within a macrostate. This naturally leads to the “Not Alone” principle: a simple macrostate cannot contain an isolated complex microstate without its own description becoming complex. Together, these results demonstrate that the statistical properties of a system like its entropy are fundamentally constrained by the algorithmic properties of its individual constituents. They provide a clear intuitive mechanism for why high-complexity states must appear in organized “families” within low-complexity observable systems. We end by exploring concrete properties in physics, resolving a few apparent paradoxes, and revealing how these laws are the statistical consequences of simple rules.

Ultimately, our work reinforces the view that the laws of thermodynamics and information theory are two sides of the same coin both governed by the fundamental rules of computation and description.

Scope and Limitations

We acknowledge that this framework relies on the standard AIT assumption of a fixed optimal Universal Turing Machine, which introduces additive constants ( $[eqn]$ ) that are negligible only for sufficiently large systems. Furthermore, the mapping of continuous physical systems onto discrete strings requires coarse-graining, the specifics of which can influence the calculated complexity. This work is intended as an interpretive framework to build intuition, rather than a replacement for the rigorous formalisms of statistical mechanics or quantum information theory.

Bibliography9

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Boltzmann L. Weitere Studien über das Wärmegleichgewicht unter Gasmolekülen Sitzungsberichte der Kaiserlichen Akademie der Wissenschaften in Wien Wien, Austria 1872 Volume 66275370
2Shannon C.E. A mathematical theory of communication Bell Syst. Tech. J.19482737942410.1002/j.1538-7305.1948.tb 01338.x · doi ↗
3Kolmogorov A.N. Three approaches to the quantitative definition of information Probl. Inf. Transm.196511710.1080/00207166808803030 · doi ↗
4Li M. Vitányi P.M.B. An Introduction to Kolmogorov Complexity and Its Applications 4th ed.Springer Berlin, Germany 2019
5Chaitin G.J. A theory of program size formally identical to information theory J. ACM 19752232934010.1145/321892.321894 · doi ↗
6Levin L.A. Laws of information conservation (non-growth) and aspects of the foundation of probability theory Probl. Inf. Transm.197410206210
7Zurek W.H. Algorithmic randomness and physical entropy Phys. Rev. A 1989404731475110.1103/Phys Rev A.40.47319902721 · doi ↗ · pubmed ↗
8Elitzur A.C. Let there be life: Thermodynamic reflections on biogenesis and evolution J. Theor. Biol.199416842945910.1006/jtbi.1994.11238072301 · doi ↗ · pubmed ↗