Universal Encryption of Individual Sequences Under Maximal Information Leakage
Neri Merhav

TL;DR
This paper explores how to encrypt individual data sequences to minimize information leakage using a specific mathematical framework.
Contribution
The paper introduces a method combining Lempel–Ziv compression and one-time pad encryption to achieve minimal information leakage.
Findings
A lower bound and an asymptotically matching upper bound on maximal information leakage are derived.
Lempel–Ziv compression followed by one-time pad encryption minimizes leakage asymptotically.
Abstract
We consider the Shannon cipher system in the framework of individual sequences and finite-state encrypters under the metric of maximal information leakage. A lower bound and an asymptotically matching upper bound on the leakage are derived, which lead to the conclusion that asymptotically minimum leakage can be attained by Lempel–Ziv compression followed by one-time pad encryption of the compressed bitstream.
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsChaos-based Image/Signal Encryption · Cryptographic Implementations and Security · Coding theory and cryptography
1. Introduction
The information-theoretic approach that combines individual-sequence modeling with finite-state encoders and decoders has been extensively developed, representing a notable departure from the conventional reliance on probabilistic models traditionally used in source and channel modeling. This paradigm shift has gained traction across multiple areas of information theory, including lossless and lossy source coding [1,2,3,4,5,6]; source/channel simulation [7]; hypothesis testing [8,9]; prediction and decision making [10,11,12]; filtering [13]; and even error correction coding [14,15,16]. A concise overview of this expanding body of work can be found in [17], though these citations represent only a small fraction of the broader literature. By sharp contrast, the domain of information-theoretic security has remained largely anchored in probabilistic methods from Shannon’s foundational contributions [18] to more contemporary developments [19,20,21,22,23]. While these examples are far from exhaustive, they underscore the field’s persistent adherence to probabilistic frameworks.
To the author’s knowledge, only two significant departures from the dominant probabilistic approach in information-theoretic security exist: one is an unpublished technical report by Ziv [24] and the other is a subsequent development documented in [25]. In his report, Ziv proposes a novel framework in which the plaintext, destined for encryption using a secret key, is modeled as an arbitrary individual sequence. Within this setup, the encrypter functions as a general block encoder, while the eavesdropper is equipped with a finite-state machine (FSM) designed to distinguish between potential candidates for estimating the plaintext. A central aspect of Ziv’s model is the assumption that the eavesdropper possesses partial prior knowledge about the plaintext, formalized as a set of “acceptable messages”, which he defines as the acceptance set. Before intercepting the ciphertext, the eavesdropper’s uncertainty is characterized by the possibility that the plaintext could be any member of this set. Encryption is deemed perfectly secure if the ciphertext provides no additional information, i.e., if it does not reduce the size of the acceptance set, and thus renders the eavesdropper’s uncertainty unchanged. Accordingly, the size of the acceptance set quantifies uncertainty: the larger the set, the less the eavesdropper knows. The FSM attempts to rule out unacceptable sequences by testing them across different key sequences. Perfect secrecy is thus defined by the ciphertext’s inability to eliminate any members of the acceptance set. Ziv’s key result is that the minimum asymptotic key rate required for perfect secrecy, under this definition, is lower-bounded by the Lempel–Ziv (LZ) complexity of the plaintext sequence [6]. Furthermore, this lower bound is asymptotically tight. Perfect security can be achieved by applying a one-time pad (bitwise XOR with key bits) to the LZ-compressed version of the plaintext. This mirrors Shannon’s classical finding that the key rate must match the entropy rate of the source. More recent work [26] has expanded and sharpened Ziv’s original ideas in several important respects.
The subsequent work [25] offers a different take on the modeling approach and on achieving perfect secrecy for individual sequences. Instead of focusing on a finite-state eavesdropper with predefined knowledge, this approach models the encrypter itself as a finite-state machine (FSM), which processes both the plaintext and a stream of random key bits in a sequential manner. Central to this framework is the introduction of a new notion called finite-state encryptability, in the footsteps of finite-state compressibility introduced in [6]. Finite-state encryptability is defined as the minimum key rate required by any FSM-based encrypter to ensure that a particular measure of normalized empirical mutual information between the plaintext and ciphertext converges to zero as the block length increases. One of the main theoretical results in [25] asserts that the finite-state encryptability of a given individual sequence is lower-bounded by its finite-state compressibility. Stated differently, no finite-state encrypter can use a key rate below the sequence compressibility without compromising security, as defined in this setting. Once again, this lower bound is not merely theoretical; it can be asymptotically achieved through the same two-step process that was mentioned above: first compressing the plaintext using the Lempel–Ziv (LZ) algorithm and then applying one-time pad encryption to the resulting compressed bitstream. This mirrors earlier results in both compression and security, illustrating a deep connection between the two domains.
In this paper, we adopt the same model setting as in [25], but with a different security metric: the maximum leakage of information, which was first introduced by Issa, Wagner, and Kamath in [27] and then further explored in several more recent works, including [28,29,30,31,32], among others. This metric is closely related to, and similarly motivated by, the earlier security measure proposed in [33], which defines security as a scenario where the correct decoding exponent of the plaintext is not improved by the availability of the ciphertext, compared to that of blind guessing. For more details, see the last paragraph of Section 2.2.1. The maximum leakage metric is defined in a more general form and has a relatively straightforward expression, as demonstrated in [27] and further clarified in the following sections. As will be discussed in the sequel, the maximum leakage metric is particularly well suited for the individual-sequence setting considered here, as it is weakly dependent on the probability distribution of the plaintext, depending only on its support.
We derive both a lower bound and an asymptotically matching upper bound on the leakage, leading yet again to the conclusion that asymptotically optimal performance can be achieved by applying LZ compression followed by one-time pad encryption of the compressed bitstream. Thus, considering the above-mentioned earlier works, refs. [24,25,26], one of the messages of this work is that one-time pad encryption on top of LZ compression forms an asymptotically optimal cipher system from many aspects. Therefore, we believe that the deeper and more interesting contribution of this work is the converse theorem (Theorem 1 in the sequel) and its proof, asserting that the key rate that must be consumed to encrypt an individual sequence cannot be much smaller than the LZ complexity of the sequence minus the allowed normalized maximal information leakage.
This paper is structured as follows: In Section 2, we establish notation conventions, provide some necessary background, and formulate the problem studied in this work. In Section 3, we assert the main results and discuss them. Finally, in Section 4, we prove Theorem 1, which is the converse theorem.
2. Notation Conventions, Background, and Problem Formulation
2.1. Notation Conventions
In this paper, we adopt the following notation rules: Scalar random variables (RVs) are represented using uppercase letters, while their realizations (sample values) are denoted by the corresponding lowercase letters. The sets of possible values (alphabets) for these variables are indicated using calligraphic letters. This notation extends naturally to random vectors and their realizations. Specifically, an n-dimensional random vector will be denoted by appending a superscript indicating the dimension to the scalar symbol. For instance, (n–positive integer) refers to the random vector , and denotes a particular instance of this vector, belonging to the set , the n-fold Cartesian product of the alphabet . Segment notations, such as and , are used to represent substrings and , respectively, for integers . When , the index is dropped for brevity. If , these notations are interpreted as representing the empty string. Additionally, for any real number u, the notation denotes . Unless otherwise noted, all logarithms and exponential functions throughout this paper are taken to base 2.
Throughout this article, information sources and channels will be generically represented by the letter P, following standard textbook notation. Subscripts will indicate the relevant random variables and any sort of conditioning, when applicable. For example, denotes the probability mass function of the random vector evaluated at , while represents the conditional probability of given . These subscripts may be omitted when the meaning is clear from context. Information-theoretic functionals such as entropy, mutual information, and related quantities will be expressed using standard symbols and conventions widely adopted in the information theory literature. In the remainder of this work, the symbol will refer to a specific input sequence intended for encryption. Each element , , belongs to a finite input alphabet , whose size is denoted by .
2.2. Background
Before showing the main results and their proofs, let us revisit key terms and details related to the notion of maximal information leakage and the 1978 version of the LZ algorithm, also known as the LZ78 algorithm [6], which is the central building block in this work.
2.2.1. Maximal Leakage of Information
As mentioned in the introduction, we adopt the maximal leakage [27] as our secrecy metric. For a probabilistic plaintext source, the maximal leakage from a secret random variable X, distributed according to , to another random variable Y, available to an adversary, and which is conditionally distributed given according to , is defined as
where the supremum is over all finite-alphabet random variables U and , with the Markov structure . In other words, it is the maximum possible difference between the logarithm of the probability of guessing correctly some (possibly randomized) function of X based on Y and correctly guessing it blindly.
In Theorem 1 of [27], it was asserted and proved that the leakage can be calculated relatively easily using the following formula:
Clearly, if is independent of x for all , then , which is the case of perfect secrecy. In general, the smaller the , the more secure the system is. In [27], it is shown that the maximal leakage has many interesting properties; one of them is that it satisfies a data processing inequality (see Lemma 1 of [27]). It is also shown in Section III of [27] that the maximal leakage has several additional operative meanings in addition to the original one explained above.
Note that the dependence on the distribution of the secret random variable, , is rather weak, as it depends only on its support. When passing from single variables to vectors of length n, is defined in the same manner except that x, y, , , , and are replaced by , , , , , and , respectively. In this case, the weak dependence of on makes it natural to use when is uncertain, or completely unknown, or even non-existent, such as in the individual sequence setting considered here. In this case, we adopt the simple definition
corresponding to the full support for , which accounts for a worst-case approach. The operational significance of maximal information leakage in our setting can then be understood in two ways: (i) Considering the definition (1), it allows arbitrary probability distributions (without any assumed structure) on , including those that place almost all their mass on a single (unknown) arbitrary sequence, with regard to the individual-sequence setting considered here. (ii) Referring to Formula (2), it is evident that the leakage vanishes whenever is independent of , which is an indisputable characterization for perfect secrecy in the individual-sequence setting too, where no distribution is assumed on .
As mentioned in the introduction, a somewhat different security metric was proposed in [33], but it is intimately related to the maximal information leakage considered here. In [33], the idea was to define a system as secure if the probability of guessing X correctly is essentially the same if Y is present or absent. More precisely, if X and Y are random vectors of dimension n, then a system is considered secure if the correct decoding exponent of X in the presence of Y is the same as if Y is absent. Specifically, the correct decoding probability of X based on Y is
which is closely related to
where is understood to designate the uniform distribution across ; accordingly, stands for the probability of correct decoding of a uniformly distributed X by an informed observer, namely one that has access to Y, whereas denotes the probability of correctly blind guessing the value of X (in the absence of Y).
2.2.2. Lempel–Ziv Parsing
The incremental parsing process of the LZ78 algorithm is a sequential method applied to an input vector over a finite alphabet. In this process, each new phrase is the shortest substring that has not appeared before as a complete parsed phrase, except possibly for the final (incomplete) phrase. For instance, if one applies incremental parsing to the sequence , the outcome is . Let designate the total number of phrases formed from by the incremental parsing process (in the example above, ). Also, let stand for the length of the LZ78 binary compressed representation for . Theorem 2 of [6] easily leads to the following inequality:
where tends to zero as . In other words, the LZ code-length for cannot exceed an expression whose dominant term is . On the other hand, it turns out that is also the dominant term of a lower bound (see Theorem 1 of [6]) to the minimum code-length attainable by any information lossless finite-state encoder with no more than s states, provided that is very small compared to . In view of these facts, we will be referring to as the unnormalized LZ complexity of , whereas the normalized LZ complexity will be defined as
2.3. Problem Formulation
Following the approach in [25], we adopt a finite-state model for encryption, described by the sextuple
where is a finite input alphabet with cardinality , is a finite collection of binary strings of variable length, possibly including the empty string (with zero length), is a finite set representing the internal states of the encrypter, is the output function, is the state transition function, and indicates the number of key bits used at each step. The encrypter E processes two infinite input sequences: a plaintext sequence , where each , and a key sequence , where each . Given these inputs, the encrypter generates an infinite ciphertext sequence , with each , while simultaneously transitioning through a corresponding state sequence , where each . The evolution of these sequences is governed by a set of recursive equations, applied iteratively for each time step
Here, the initial state of the encrypter, , is fixed and will be referred to as throughout. It is understood that when , the encrypter uses no key bits at step i. In this case, the key fragment is defined to be the empty string . Similarly, if the output , then no output is generated at that step; the system effectively idles, meaning only the internal state changes in response to the input. More specifically, at each time step i, given the current state and the incoming plaintext symbol , the encrypter consumes the next unused bits from the key sequence u to form . It then produces an output (which may be empty) and transitions to the next state based on the function g. In summary, the operation at time i proceeds as follows: The encrypter is in state , it receives input symbol ; it consumes key bits from u, forming ; it generates output ; and it updates its state to .
Remark 1. Note that the evolution of the state variable depends solely on the source inputs and is independent of the key bits. This design choice reflects the intended role of , which is to retain memory of the source sequence , allowing the encrypter to exploit empirical correlations and repetitive patterns within the plaintext. In contrast, maintaining memory of past key bits—which are assumed to be independent and identically distributed (i.i.d.)—offers no practical benefit and is therefore omitted. Moreover, the model can be naturally extended to include two state variables: one that evolves based only on the source sequence (as in the current setup) and another that evolves based on both and the consumed key bits . In such a framework, the first state variable would continue to govern the update of the index , while the second could influence the output function, allowing for more expressive or adaptive encryption mechanisms.
An encrypter with s states, or an s-state encrypter, E, is one with . It is assumed that the plaintext sequence x is deterministic (i.e., an individual sequence), whereas the key sequence u is purely random, i.e., for every positive integer n, .
A few additional notation conventions will be convenient: By , ( ) we refer to the vector produced by E in response to the inputs and when the initial state is . Similarly, the notation will mean that the state and will designate under the same circumstances.
As explained in Section 2.2, we adopt the maximal leakage of information as our security metric, given by
An encryption system E is said to be perfectly secure if for every positive integer n, . If as , we say that the encryption system is asymptotically secure.
An encrypter is referred to as information lossless (IL) if for every , every sufficiently large n and all pairs , the quadruple uniquely determines . Given an encrypter E and an input string , the encryption key rate of with respect to E is defined as
where is the length of the binary string and is the total length of .
Remark 2. It is worth noting that the definition of information losslessness used here is more relaxed and, thus, more general than the one given in [6]. In [6], the requirement must hold for every positive integer n, whereas in the present context, it is only required to hold for all sufficiently large n. The absence of information losslessness in the stricter sense of [6] does not contradict the ability of the legitimate decoder to reconstruct the source. Rather, it implies that reconstructing may require more than just the tuple ; for example, some additional data from times later than may be needed.
The set of all perfectly secure, IL encrypters with no more than s states will be denoted by . The minimum of over all encrypters in will be denoted by , i.e.,
Finally, let
and define the finite-state encryptability of x as
Our purpose is to characterize these quantities and to point out how they can be achieved in principle.
3. Main Results
Our converse theorem, whose proof appears in Section 4, is the following:
Theorem 1. For every information lossless encrypter E with no more than s states,
where for every fixed s. Equivalently, if for some given constant , then for every and every information lossless encrypter ,
As for achievability, consider first an arbitrary lossless compression scheme that compresses at a compression ratio of and then applies one-time pad encryption to compressed bits. Let denote the resulting (partially) encrypted compressed representation of . Since the key bits are purely random, the probablity of any that can be obtained from some is exactly and zero if cannot be obtained from any . In other words, . Obviously, the length of , denoted as , is equal to . Therefore, by denoting , we have the following:
and, therefore,
If , then the dominant term is clearly .
Remark 3. The condition that is always easy to satisfy via a minor modification of any given compression scheme (if it does not satisfy the condition in the first place). First, test whether or . If , add a header bit ‘0’ before the compressed representation of ; otherwise, add a header bit ‘1’ and then add the uncompressed binary representation of using bits. The resulting code-length would then be bits.
If the compression scheme is chosen to be the LZ78 algorithm, then
which essentially meets the converse bound (16). We have therefore proved the following direct theorem:
Theorem 2. Given , there exists a universal encrypter that satisfies
and for every ,
Discussion.
A few comments are now in order.
We established both a lower bound and an asymptotically matching upper bound on the information leakage, leading once again to the conclusion that asymptotically optimal performance can be achieved by applying Lempel–Ziv (LZ) compression followed by one-time pad encryption of the compressed bitstream. As discussed in the introduction, together with earlier works, such as [24,25,26], this reinforces the message that one-time pad encryption applied after LZ compression yields an asymptotically optimal cipher system in several important respects. That said, we believe the deeper and more significant contribution of this work lies in the converse theorem (Theorem 1), which shows that the key rate required to securely encrypt an individual sequence cannot be substantially smaller than its LZ complexity minus the permitted normalized maximal information leakage, no matter what encryption strategy is employed.Similarly as in [6], there is formally a certain gap between the converse theorem and the achievability scheme in its basic form, when examined from the viewpoint of the number of states, s, relative to n. While s should be small relative to n for the lower bound to be essentially (see Section 2.2 above), the number of states actually needed to implement LZ78 compression for a sequence of length n is basically exponential in n. In [6], the gap is closed in the limit of (after taking the limit ) by subdividing the sequence into blocks and restarting the LZ algorithm at the beginning of every block. A similar comment applies here too in the double limit of achieving .As discussed in [26] in a somewhat different context, for an alternative to the use of the LZ78 algorithm, it can be shown that asymptotically optimum performance can also be attained by a universal compression scheme for the class of k-th order Markov sources, where k is chosen to be sufficiently large. In this case, in Theorems 1 and 2 should be replaced by the k-th order empirical entropy of order k, and some redundancy terms should be modified. However, one of these redundancy terms is , which means that in order to compete with the best encrypter with s states, k must be chosen to be significantly larger than , so as to make this term reasonably small.It is speculated that it may not be difficult to extend our findings in several directions, including lossy reconstruction, the presence of side information at either parties, the combination of both, and successive refinement systems in accordance to [32]. Other potentially interesting extensions are in broadening the scope of the FSM model to larger classes of machines, including FSMs with counters, shift-register machines with counters, and periodically time-varying FSMs with counters, as was carried out in Section III of [26]. Research in some of these directions will be explored in future studies.
4. Proof of Theorem 1
First, observe that
where is the joint empirical distribution of derived from . It is therefore seen that depends on only via . Accordingly, in the sequel, we will also use the alternative notation when we wish to emphasize the dependence on . Let denote the set of , which, together with their associated state sequences, share the same empirical PMF as that of along with its state sequence. Similarly as with , we also denote it by . In the sequel, we will make use of the inequality
where as for a fixed s at the rate of . The proof of Equation (23), which appears in various forms and variations in earlier papers (see, e.g., [34]), is provided in the Appendix A for the sake of completeness (see also the related Ziv’s inequality in Lemma 13.5.5 of [35]).
For later use, we also define the following sets:
where
In other words, is the set of all type classes of plaintext sequences for which some members can be mapped to by some key bit strings, whereas is the set of ciphertext sequences which can be obtained by some member of type class and some key bit string. Now, observe that
where the last inequality follows from the following consideration: Let exhaust all members of . For each such , let . Now, for every , let denote the subset of for which , and we have already defined to denote the set of corresponding output sequences, . Obviously, since form a partition of , then for some , . Therefore,
where the equality is proved since the mapping between and is one-to-one given that , , and by the information losslessness postulated, provided that n is sufficiently large as required. Now, let denote that set of all for which for some . Then,
where, in (a), we used the fact that implies (because implies that there is at least one such that and the probability of each such is ), and where is the number of different type classes, , which is upper-bounded by . In (b), we used Equation (27); in (c), we used Equation (23). Finally, the operator that appears in the assertion of Theorem 1 is due to the additional trivial lower bound . This completes the proof of Theorem 1.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Kieffer J.C. Yang E.-H. Sequential Codes, Lossless Compression of Individual Sequences, and Kolmogorov Complexity Technical Report 1993–3Information Theory Research Group, University of Minnesota Minneapolis, MN, USA 1993
- 2Yang E.-H. Kieffer J.C. Simple universal lossy data compression schemes derived from the Lempel–Ziv algorithm IEEE Trans. Inform. Theory 19964223924510.1109/18.481794 · doi ↗
- 3Ziv J. Coding theorems for individual sequences IEEE Trans. Inform. Theory 19782440541210.1109/TIT.1978.1055911 · doi ↗
- 4Ziv J. Distortion–rate theory for individual sequences IEEE Trans. Inform. Theory 19802613714310.1109/TIT.1980.1056164 · doi ↗
- 5Ziv J. Fixed-rate encoding of individual sequences with side information IEEE Trans. Inf. Theory 19843034845210.1109/TIT.1984.1056878 · doi ↗
- 6Ziv J. Lempel A. Compression of individual sequences via variable-rate coding IEEE Trans. Inform. Theory 19782453053610.1109/TIT.1978.1055934 · doi ↗
- 7Seroussi G. On universal types IEEE Trans. Inform. Theory 20065217118910.1109/TIT.2005.860437 · doi ↗
- 8Ziv J. Compression, tests for randomness, and estimating the statistical model of an individual sequence Sequences: Combinatorics, Compression, Security, and Transmission Capocelli R.M. Springer Verlag New York, NY, USA 1990366373
