Property Directed Self Composition
Ron Shemer, Arie Gurfinkel, Sharon Shoham, Yakir Vizel

TL;DR
This paper introduces a property-directed inference algorithm for automatically inferring semantic self composition functions and invariants to verify k-safety properties, improving over existing tools.
Contribution
It presents a novel algorithm that infers self composition functions and invariants simultaneously, enhancing verification capabilities for k-safety properties.
Findings
Successfully infers complex self compositions beyond existing tools.
Demonstrates effectiveness on various verification benchmarks.
Improves automation in verifying k-safety properties.
Abstract
We address the problem of verifying k-safety properties: properties that refer to k-interacting executions of a program. A prominent way to verify k-safety properties is by self composition. In this approach, the problem of checking k-safety over the original program is reduced to checking an "ordinary" safety property over a program that executes k copies of the original program in some order. The way in which the copies are composed determines how complicated it is to verify the composed program. We view this composition as provided by a semantic self composition function that maps each state of the composed program to the copies that make a move. Since the "quality" of a self composition function is measured by the ability to verify the safety of the composed program, we formulate the problem of inferring a self composition function together with the inductive invariant needed to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
\newfloatcommand
capbtabboxtable[][\FBwidth]
11institutetext: Tel Aviv University 22institutetext: University of Waterloo 33institutetext: The Technion
Property Directed Self Composition
Ron Shemer 11
Arie Gurfinkel 22
Sharon Shoham 11
Yakir Vizel 22
Abstract
We address the problem of verifying -safety properties: properties that refer to interacting executions of a program. A prominent way to verify -safety properties is by self composition. In this approach, the problem of checking -safety over the original program is reduced to checking an “ordinary” safety property over a program that executes copies of the original program in some order. The way in which the copies are composed determines how complicated it is to verify the composed program. We view this composition as provided by a semantic self composition function that maps each state of the composed program to the copies that make a move. Since the “quality” of a self composition function is measured by the ability to verify the safety of the composed program, we formulate the problem of inferring a self composition function together with the inductive invariant needed to verify safety of the composed program, where both are restricted to a given language. We develop a property-directed inference algorithm that, given a set of predicates, infers composition-invariant pairs expressed by Boolean combinations of the given predicates, or determines that no such pair exists. We implemented our algorithm and demonstrate that it is able to find self compositions that are beyond reach of existing tools.
1 Introduction
Many relational properties, such as noninterference [12], determinism [21], service level agreements [9], and more, can be reduced to the problem of -safety. Namely, reasoning about different traces of a program simultaneously. A common approach to verifying -safety properties is by means of self composition, where the program is composed with copies of itself [4, 31]. A state of the composed program consists of the states of each copy, and a trace naturally corresponds to traces of the original program. Therefore, -safety properties of the original program become ordinary safety properties of the composition, hence reducing -safety verification to ordinary safety. This enables reasoning about -safety properties using any of the existing techniques for safety verification such as Hoare logic [20] or model checking [7].
While self composition is sound and complete for -safety, its applicability is questionable for two main reasons:
(i) considering several copies of the program greatly increases the state space; and
(ii) the way in which the different copies are composed when reducing the problem to safety verification affects the complexity of the resulting self composed program, and as such affects the complexity of verifying it.
Improving the applicability of self composition has been the topic of many works [2, 29, 14, 18, 26, 32]. However, most efforts are focused on compositions that are pre-defined, or only depend on syntactic similarities.
In this paper, we take a different approach; we build upon the observation that by choosing the “right” composition, the verification can be greatly simplified by leveraging “simple” correlations between the executions. To that end, we propose an algorithm, called Pdsc, for inferring a property directed self composition. Our approach uses a dynamic composition, where the composition of the different copies can change during verification, directed at simplifying the verification of the composed program.
Compositions considered in previous work differ in the order in which the copies of the program execute: either synchronously, asynchronously, or in some mix of the two [33, 3, 14]. To allow general compositions, we define a composition function that maps every state of the composed program to the set of copies that are scheduled in the next step. This determines the order of execution for the different copies, and thus induces the self composed program. Unlike most previous works where the composition is pre-defined based on syntactic rules only, our composition is semantic as it is defined over the state of the composed program.
To capture the difficulty of verifying the composed program, we consider verification by means of inferring an inductive invariant, parameterized by a language for expressing the inductive invariant. Intuitively, the more expressive the language needs to be, the more difficult the verification task is. We then define the problem of inferring a composition function together with an inductive invariant for verifying the safety of the composed program, where both are restricted to a given language. Note that for a fixed language , an inductive invariant may exist for some composition function but not for another 111See Appendix 0.B for an example that requires a non-linear inductive invariant with a composition that is based on the control structure but has a linear invariant with another. . Thus, the restriction to defines a target for the inference algorithm, which is now directed at finding a composition that admits an inductive invariant in .
Example 1
To demonstrate our approach, consider the program in Figure 1. The program inserts a new value into an array. We assume that the array and its length are “low”-security variables, while the inserted value is “high”-security. The first loop finds the location in which will be inserted. Note that the number of iterations depends on the value of . Due to that, the second loop executes to ensure that the output (which corresponds to the number of iterations) does not leak sensitive data. As an example, we emphasize that without the second loop, could leak the location of in . To express the property that does not leak sensitive data, we use the 2-safety property that in any two executions, if the inputs and are the same, so is the output .
To verify the 2-safety property, consider two copies of the program. Let the language for verifying the self composition be defined by the predicates depicted in Figure 1. The most natural self composition to consider is a lock-step composition, where the copies execute synchronously. However, for such a composition the composed program may reach a state where, for example, . This occurs when the first copy exists the first loop, while the second copy is still executing it. Since the language cannot express this correlation between the two copies, no inductive invariant suffices to verify that when the program terminates.
In contrast, when verifying the 2-safety property, Pdsc directs its search towards a composition function for which an inductive invariant in does exist. As such, it infers the composition function depicted in Figure 1, as well as an inductive invariant in . The invariant for this composition implies that at every state.
As demonstrated by the example, Pdsc focuses on logical languages based on predicate abstraction [17], where inductive invariants can be inferred by model checking. In order to infer a composition function that admits an inductive invariant in , Pdsc starts from a default composition function, and modifies its definition based on the reasoning performed by the model checker during verification. As the composition function is part of the verified model (recall that it is defined over the program state), different compositions are part of the state space explored by the model checker. As a result, a key ingredient of Pdsc is identifying “bad” compositions that prevent it from finding an inductive invariant in . It is important to note that a naive algorithm that tries all possible composition functions has a time complexity , where is the set of predicates considered. However, integrating the search for a composition function into the model checking algorithm allows us to reduce the time complexity of the algorithm to , where we show that the problem is in fact PSPACE-hard.
We implemented Pdsc using SeaHorn [19], Z3 [25] and Spacer [22] and evaluated it on examples that demonstrate the need for nontrivial semantic compositions. Our results clearly show that Pdsc can solve complex examples by inferring the required composition, while other tools cannot verify these examples. We emphasize that for these particular examples, lock-step composition is not sufficient. We also evaluated Pdsc on the examples from [29, 26] that are proven with the trivial lock-step composition. On these examples, Pdsc is comparable to state of the art tools.
1.0.1 Related work.
This paper addresses the problem of verifying k-safety properties (also called hyperproperties [8]) by means of self composition. Other approaches tackle the problem without self-composition, and often focus on more specific properties, most noticeably the -safety noninterference property (e.g. [1, 32]). Below we focus on works that use self-composition.
Previous work such as [4, 2, 3, 15, 31, 14] considered self composition (also called product programs) where the composition function is constant and set a-priori, using syntax-based hints. While useful in general, such self compositions may sometimes result in programs that are too complex to verify. This is in contrast to our approach, where the composition function is evolving during verification, and is adapted to the capabilities of the model checker.
The work most closely related to ours is [29] which introduces Cartesian Hoare Logic (CHL) for verification of -safety properties, and designs a verification framework for this logic. This work is further improved in [26]. These works search for a proof in CHL, and in doing so, implicitly modify the composition. Our work infers the composition explicitly and can use off-the-shelf model checking tools. More importantly, when loops are involved both [29] and [26] use lock-step composition and align loops syntactically. Our algorithm, in contrast, does not rely on syntactic similarities, and can handle loops that cannot be aligned trivially.
There have been several results in the context of harnessing Constraint Horn Clauses (CHC) solvers for verification of relational properties [11, 24]. Given several copies of a CHC system, a product CHC system that synchronizes the different copies is created by a syntactical analysis of the rules in the CHC system. These works restrict the synchronization points to CHC predicates (i.e., program locations), and consider only one synchronization (obtained via transformations of the system of CHCs). On the other hand, our algorithm iteratively searches for a good synchronization (composition), and considers synchronizations that depend on program state.
Equivalence checking and regression verification.
Equivalence checking is another closely related research field, where a composition of several programs is considered. As an example, equivalence checking is applied to verify the correctness of compiler optimizations [33, 28, 10, 18]. In [28] the composition is determined by a brute-force search for possible synchronization points. While this brute-force search resembles our approach for finding the correct composition, it is not guided by the verification process. The works in [10, 18] identify possible synchronization points syntactically, and try to match them during the construction of a simulation relation between programs.
Regression verification also requires the ability to show equivalence between different versions of a program [15, 16, 30]. The problem of synchronizing unbalanced loops appears in [30] in the form of unbalanced recursive function calls. To allow synchronization in such cases, the user can specify different unrolling parameters for the different copies. In contrast, our approach relies only on user supplied predicates that are needed to establish correctness, while synchronization is handled automatically.
2 Preliminaries
In this paper we reason about programs by means of the transition systems defining their semantics. A transition system is a tuple , where is a set of states, is a transition relation that specifies the steps in an execution of the program, and is a set of terminal states such that every terminal state has an outgoing transition to itself and no additional transitions (terminal states allow us to reason about pre/post specifications of programs). An execution or trace is a (finite or infinite) sequence of states such that for every , . The execution is terminating if there exists such that . In this case, the suffix of the execution is of the form and we say that ends at .
As usual, we represent transition systems using logical formulas over a set of variables, corresponding to the program variables. We denote the set of variables by . The set of terminal states is represented by a formula over and the transition relation is represented by a formula over , where represents the pre-state of a transition and represents its post-state. In the sequel, we use sets of states and their symbolic representation via formulas interchangeably.
Safety and inductive invariants.
We consider safety properties defined via pre/post conditions.222Our results can be extended to arbitrary safety (and -safety) properties by introducing “observable” states to which the property may refer. A safety property is a pair where are formulas over , representing subsets of , denoting the pre- and post-condition, respectively. satisfies , denoted , if every terminating execution of that starts in a state such that ends in a state such that . In other words, for every state that is reachable in from a state in pre we have that .
A prominent way to verify safety properties is by finding an inductive invariant. An inductive invariant for a transition system and a safety property is a formula such that
(1) (initiation),
(2) (consecution), and
(3) (safety),
where denotes the validity of , and denotes , i.e., the formula obtained after substituting every by the corresponding . If there exists such an inductive invariant, then .
-safety.
A -safety property refers to interacting executions of . Similarly to an ordinary property, it is defined by , except that pre and post are defined over where denotes the th copy of the program variables. As such, pre and post represent sets of -tuples of program states (-states for short): for a -tuple of states and a formula over , we say that if is satisfied when for each , the assignment of is determined by . We say that satisfies , denoted , if for every terminating executions of that start in states , respectively, such that , it holds that they end in states , respectively, such that .
For example, the non interference property may be specified by the following -safety property:
[TABLE]
where and denote subsets of the program inputs, resp. outputs, that are considered “low security” and the rest are classified as “high security”. This property asserts that every terminating executions that start in states that agree on the “low security” inputs end in states that agree on the low security outputs, i.e., the outcome does not depend on any “high security” input and, hence, does not leak secure information.
Checking -safety properties reduces to checking ordinary safety properties by creating a self composed program that consists of copies of the transition system, each with its own copy of the variables, that run in parallel in some way. Thus, the self composed program is defined over variables , where denotes the variables associated with the th copy. For example, a common composition is a lock-step composition in which the copies execute simultaneously. The resulting composed transition system is defined such that , and . Note that is defined over (as usual). Then, the -safety property is satisfied by if and only if an ordinary safety property is satisfied by . More general notions of self composition are investigated in Section 3.
3 Inferring Self Compositions for Restricted Languages of Inductive Invariants
Any self-composition is sufficient for reducing -safety to safety, e.g., lock-step, sequential, synchronous, asynchronous, etc. However, the choice of the self-composition used determines the difficulty of the resulting safety problem. Different self composed programs would require different inductive invariants, some of which cannot be expressed in a given logical language.
In this section, we formulate the problem of inferring a self composition function such that the obtained self composed program may be verified with a given language of inductive invariants. We are, therefore, interested in inferring both the self composition function and the inductive invariant for verifying the resulting self composed program. We start by formulating the kind of self compositions that we consider.
In the sequel, we fix a transition system with a set of variables .
3.1 Semantic Self Composition
Roughly speaking, a self composition of consists of copies of that execute together in some order, where steps may interleave or be performed simultaneously. The order is determined by a self composition function, which may also be viewed as a scheduler that is responsible for scheduling a subset of the copies in each step. We consider semantic compositions in which the order may depend on the states of the different copies, as well as the correlations between them (as opposed to syntactic compositions that only depend on the control locations of the copies, but may not depend on the values of other variables):
Definition 1 (Semantic Self Composition Function)
A semantic self composition function (-composition function for short) is a function , mapping each -state to a nonempty set of copies that are to participate in the next step of the self composed program333We consider memoryless composition functions. Compositions that depend on the history of the (joint) execution are supported via ghost state added to the program to track the history..
We represent a -composition function by a set of logical conditions, with a condition for every nonempty subset of the copies. For each such , the condition is defined over , and hence it represents a set of -states, with the meaning that all the -states that satisfy are mapped to by :
[TABLE]
To ensure that the function is well defined, we require that , which ensures that every -state satisfies at least one of the conditions. We also require that for every , , hence every -state satisfies at most one condition. Together these requirements ensure that the conditions induce a partition of the set of all -states. In the sequel, we identify a -composition function with its symbolic representation via conditions and use them interchangeably.
Definition 2 (Composed Program)
Given a -composition function , represented via conditions for every nonempty set , we define the self composition of to be the transition system over variables defined as follows: , where , and
[TABLE]
Thus, in , the set of states consists of -states (), the terminal states are -states in which all the individual states are terminal, and the transition relation includes a transition from to if and only if and
[TABLE]
That is, every transition of corresponds to a simultaneous transition of a subset of the copies of , where the subset is determined by the self composition function . If , then for every we say that is scheduled in .
Example 2
A self composition that runs the copies of sequentially, one after the other, corresponds to a -composition function defined by where is the minimal index of a non-terminal state in . If all states in are terminal then (or any other index). This is encoded as follows: for every , , and for every other .
Example 3
The lock-step composition that runs the copies of synchronously corresponds to a -self composition function defined by , and encoded by and for every other .
In order to ensure soundness of a reduction of -safety to safety via self composition, one has to require that the self composition function does not “starve” any copy of the transition system that is about to terminate if it continues to execute. We refer to this requirement as fairness.
Definition 3 (Fairness)
A -self composition function is fair if for every terminating executions of there exists an execution of such that for every copy , the projection of to is .
Note that by the definition of the terminal states of , as above is guaranteed to be terminating. We say that the th copy terminates in if contains a -state such that . Fairness may be enforced in a straightforward way by requiring that whenever , the set includes no index for which , unless all have terminated. Since we assume that terminal states may only transition to themselves, a weaker requirement that suffices to ensure fairness is that includes at least one index for which , unless there is no such index.
The following claim is now straightforward:
Lemma 1
Let be a transition system, a -safety property, and a fair -composition function for and . Then
[TABLE]
Proof (sketch)
Every terminating execution of corresponds to terminating executions of . Fairness of ensures that the converse also holds.
To demonstrate the necessity of the fairness requirement, consider a (non-fair) self composition function that maps every state to . Then, regardless of what the actual transition system does, the resulting self composition satisfies every pre-post specification vacuously, as it never reaches a terminal state.
Remark 1
While we require the conditions defining a self composition function to induce a partition of in order to ensure that is well defined as a (total) function, the requirement may be relaxed in two ways. First, we may allow and to overlap. This will add more transitions and may make the task of verifying the composed program more difficult, but it maintains the soundness of the reduction. Second, it suffices that the conditions cover the set of reachable states of the composed program rather than the entire state space. These relaxations do not damage soundness. Technically, this means that represented by the conditions is a relation rather than a function. We still refer to it as a function and write to indicate that , not excluding the possibility that for as well. We note that as long as the language used to describe compositions is closed under Boolean operations, we can always extract from the conditions a function . This is done as follows:
- •
To prevent the overlap between conditions, determine an arbitrary total order on the sets and set .
- •
To ensure that the conditions cover the entire state space, set .
It is easy to verify that defined by is a total self composition function and that if is fair, then so is .
3.2 The Problem of Inferring Self Composition with Inductive Invariant
Lemma 1 states the soundness of the reduction of -safety to ordinary safety. Together with the ability to verify safety by means of an inductive invariant, this leads to a verification procedure. However, while soundness of the reduction holds for any self composition, an inductive invariant in a given language may exist for the composed program resulting from some compositions but not from others. We therefore consider the self composition function and the inductive invariant together, as a pair, leading to the following definition.
Definition 4
Let be a transition system and a safety property. For a formula over and a self composition function represented by conditions , we say that is a composition-invariant pair for and if the following conditions hold:
- •
(initiation of ),
- •
for every , (consecution of for ),
- •
\mathit{Inv}\Rightarrow\big{(}(\bigwedge_{j=1}^{k}F^{j})\rightarrow\textit{post}\big{)} (safety of ),
- •
( covers the reachable states),
- •
for every , ( is fair).
As commented in Remark 1, we relax the requirement that to , thus ensuring that the conditions cover all the reachable states. Since the reachable states of are determined by (which define ), this reveals the interplay between the self composition function and the inductive invariant. Furthermore, we do not require that for , hence a -state may satisfy multiple conditions. As explained earlier, these relaxations do not damage soundness. Furthermore, if we construct from a self composition function as described in Remark 1, would be an inductive invariant for as well.
Lemma 2
If there exists a composition-invariant pair for and , then .
Proof (sketch)
If is a composition-invariant pair, then is an inductive invariant for , where is a fair composition function defined as in Remark 1. From Lemma 1 we conclude that .
If we do not restrict the language in which and are specified, then the converse also holds. However, in the sequel we are interested in the ability to verify -safety with a given language, e.g., one for which the conditions of Definition 4 belong to a decidable fragment of logic and hence can be discharged automatically.
Definition 5 (Inference in )
Let be a logical language. The problem of inferring a composition-invariant pair in is defined as follows. The input is a transition system and a -safety property . The output is a composition-invariant pair for and (as defined in Definition 4), where and is represented by conditions such that for every . If no such pair exists, the output is “no solution”.
When no solution exists, it does not necessarily mean that . Instead, it may be that the language is simply not expressive enough. Unfortunately, for expressive languages (e.g., quantified formulas or even quantifier free linear integer arithmetic), the problem of inferring an inductive invariant alone is already undecidable, making the problem of inferring a composition-invariant pair undecidable as well:
Lemma 3
Let be closed under Boolean operations and under substitution of a variable with a value, and include equalities of the form , where is a variable and is a value (of the same sort). If the problem of inferring an inductive invariant in is undecidable, then so is the problem of inferring a composition-invariant pair in .
Proof
We show a reduction from the ordinary invariant inference problem in to the problem of inferring a composition-invariant pair in . Given a transition system and an ordinary safety property the reduction constructs a transition system over , where is a new Boolean variable such that when the original transitions are taken and when the systems remains in the same state, which is also added to the set of terminal states. Formally, for every , let be an arbitrary fixed value in the domain of . For example, if is Boolean, . The reduction constructs
[TABLE]
and the following -safety property:
[TABLE]
That is, the first copy is “initialized” with and with the original pre-condition and is required to terminate in a state that satisfies the original post-condition, while the second copy is initialized with , and with the value for each original variable, and is required to terminate in the same state. Clearly, if has an inductive invariant for , then is a composition-invariant pair for and , where is defined by and for any other , which is clearly in . For the converse direction, if has a composition-invariant pair for then obtained by substituting each positive occurrence of in by false, each negative occurrence of by true and each occurrence of by is an inductive invariant for and . ∎
For example, linear integer arithmetic satisfies the conditions of the lemma. This motivates us to restrict the languages of inductive invariants. Specifically, we consider languages defined by a finite set of predicates. We consider relational predicates, defined over . For a finite set of predicates , we define to be the set of all formulas obtained by Boolean combinations of the predicates in .
Definition 6 (Inference using predicate abstraction)
The problem of inferring a predicate-based composition-invariant pair is defined as follows. The input is a transition system , a -safety property , and a finite set of predicates . The output is the solution to the problem of inferring a composition-invariant pair for and in .
Remark 2
It is possible to decouple the language used for expressing the self composition function from the language used to express the inductive invariant. Clearly, different sets of predicates (and hence languages) can be assigned to the self composition function and to the inductive invariant. However, since inductiveness is defined with respect to the transitions of the composed system, which are in turn defined by the self composition function, if the language defining is not included in the language defining , the conditions themselves would be over-approximated when checking the requirements of Definition 4 and therefore would incur a precision loss. For this reason, we use the same language for both.
Since the problem of invariant inference in is PSPACE-hard [23], a reduction from the problem of inferring inductive invariants to the problem of inferring composition-invariant pairs (similar to the one used in the proof of Lemma 3) shows that composition-invariant inference in is also PSPACE-hard:
Theorem 3.1
Inferring a predicate-based composition-invariant pair is PSPACE-hard.
4 Algorithm for Inferring Composition-Invariant Pairs
In this section, we present Property Directed Self-Composition, Pdsc for short — our algorithm for tackling the composition-invariant inference problem for languages of predicates (Definition 6). Namely, given a transition system , a -safety property and a finite set of predicates , we address the problem of finding a pair ), where is a self composition function and is an inductive invariant for the composed transition system obtained from , and both of them are in , i.e., defined by Boolean combinations of the predicates in .
We rely on the property that a transition system (in our case ) has an inductive invariant in if and only if its abstraction obtained using is safe. This is because, the set of reachable abstract states is the strongest set expressible in that satisfies initiation and consecution. Given , this allows us to use predicate abstraction to either obtain an inductive invariant in for (if the abstraction of is safe) or determine that no such inductive invariant exists (if an abstract counterexample trace is obtained). The latter indicates that a different self composition function needs to be considered. A naive realization of this idea gives rise to an iterative algorithm that starts from an arbitrary initial composition function and in each iteration computes a new composition function. At the worst case such an algorithm enumerates all self composition functions defined in , i.e., has time complexity . Importantly, we observe that, when no inductive invariant exists for some composition function, we can use the abstract counterexample trace returned in this case to (i) generalize and eliminate multiple composition functions, and (ii) identify that some abstract states must be unreachable if there is to be a composition-invariant pair, i.e., we “block” states in the spirit of property directed reachability [5, 13]. This leads to the algorithm depicted in Algorithm 1 whose worst case time complexity is . Next, we explain the algorithm in detail.
Finding an inductive invariant for a given composition function using predicate abstraction.
We use predicate abstraction [17, 27] to check if a given candidate composition function has a corresponding inductive invariant. This is done as follows. The abstraction of using , denoted , is a transition system defined over variables , where (we omit the terminal states). , i.e., each abstract state corresponds to a valuation of the Boolean variables representing . An abstract state represents the following set of states of :
[TABLE]
We extend to sets of states and to formulas representing sets of states in the usual way. The abstract transition relation is defined as usual:
[TABLE]
Note that the set of abstract states in does not depend on .
Notation
We sometimes refer to an abstract state as the formula . For a formula , we denote by the result of substituting each in by the corresponding Boolean variable . For the opposite direction, given a formula over , we denote by the formula in resulting from substituting each in by . Therefore, is a symbolic representation of .
Every set defined by a formula is precisely represented by in the sense that is equal to the set of states defined by , i.e., is a precise abstraction of . For simplicity, we assume that the termination conditions as well as the pre/post specification can be expressed precisely using the abstraction, in the following sense:
Definition 7
is adequate for and if there exist such that , and (for every copy ).
The following lemma provides the foundation for our algorithm:
Lemma 4
Let be a transition system, a safety property, and a finite set of predicates adequate for and . For a self composition function defined via conditions in , there exists an inductive invariant in such that is a composition-invariant pair for and if and only if the following three conditions hold:
- S1
All reachable states of from satisfy ,
- S2
All reachable states of from satisfy , and
- S3
For every , .
Furthermore, if the conditions hold, then the symbolic representation of the set of abstract states of reachable from is a formula over such that is a composition-invariant pair for and .
Proof
The proof relies on the following statement, denoted by : for a formula in and an abstract state , for every it holds that (which follows by induction on the structure of a formula in , relying on the definition of ). In particular, this implies that for a formula over , it holds that whenever .
() Let , and be as described, and let () be a composition-invariant pair for and in . We first show that every (abstract) state that is reachable from in satisfies . Let be such a reachable state. Then there exists an abstract trace such that , and for every . Consider a concrete state of such that , then and from we get . From the definition of a composition-invariant pair (Definition 4) we get that (initiation). Since is in we get from that also . For , the next state in the abstract trace, it also holds that : since , we know that there exist some and such that , using we get that , the consecution of implies and from we get . By induction over the length of the abstract trace we get that . We now turn to show that conditions S1–S3 hold. First, the safety of for together with adequacy of and imply that \mathit{Inv}(\mathcal{B})\Rightarrow\big{(}(\bigwedge_{j=1}^{k}F^{j}(\mathcal{B}))\rightarrow\textit{post}(\mathcal{B})\big{)}, and since all the reachable states of satisfy , S1 follows. Similarly, the covering requirement of together with the property that is in for every and together with imply S2. Finally, S3 is implied directly from the fairness of (Definition 4).
() Assume that for , , and some composition function as described, conditions S1–S3 hold. Condition S1 ensures that satisfies the safety property , when we augment with a set of terminal states given by the formula . Hence, there exists an inductive invariant over for and . Furthermore, condition S2 ensures that there exists such for which (for example, such may be obtained by conjoining the inductive invariant ensured by S1 with another inductive invariant that establishes S2). To conclude the proof we show that () is a composition-invariant pair for and , as defined in Definition 4. First, initiation and safety of with respect to and , imply initiation and safety (respectively) of with respect to and due to and adequacy of . As for consecution of : for a pair of states in such that , if and , then . Therefore, if then (according to ), and from consecution of in also , and from we get and conclude the consecution of in . Similarly, for covering of : recall that , hence by , , i.e., covers the states satisfying . Finally, the fairness of follows from S3. ∎
Algorithm 1 starts from the lock-step self composition function (Algorithm 1), which is fair444Any fair self composition can be chosen as the initial one; we chose lock-step since it is a good starting point in many applications., and constructs the next candidate such that condition S3 in Lemma 4 always holds (see discussion of Modify_SC). Thus, condition S3 need not be checked explicitly.
Algorithm 1 checks whether conditions S1 and S2 hold for a given candidate composition function by calling Abs_Reach (Algorithm 1) – both checks are performed via a (non-)reachability check in , checking whether a state violating or is reachable from . Algorithm 1 maintains the abstract states that are not in by the formula Unreach defined over , which is initialized to false (as the lock-step composition function is defined for every state) and is updated in each iteration of Algorithm 1 to include the abstract states violating . If no abstract state violating S1 or S2 is reachable, i.e., the conditions hold, then Abs_Reach returns the (potentially overapproximated) set of reachable abstract states, represented by a formula over . In this case, by Lemma 4, is a composition-invariant pair (Algorithm 1). Otherwise, an abstract counterexample trace is obtained. (We can of course apply bounded model checking to check if the counterexample is real; we omit this check as our focus is on the case where the system is safe.)
Remark 3
In practice, we do not construct explicitly. Instead, we use the implicit predicate abstraction approach [6].
Eliminating self composition candidates based on abstract counterexamples.
An abstract counterexample to conditions S1 or S2 indicates that the candidate composition function has no corresponding . Violation of S1 can only be resolved by changing such that the abstract trace is no longer feasible. Violation of S2 may, in principle, also be resolved by extending the definition of such that it is defined for all the abstract states in the counterexample trace.
However, to prevent the need to explore both options, our algorithm maintains the following invariant for every candidate self composition function that it constructs:
Claim
Every abstract state that is not in is not reachable w.r.t. the abstract composed program of any composition function that is part of a composition-invariant pair for and .
This property clearly holds for the lock-step composition function, which the algorithm starts with, since for this composition, . As we explain in Corollary 2, it continues to hold throughout the algorithm.
As a result of this property, whenever a candidate composition function does not satisfy condition S1 or S2, it is never the case that needs to be extended to allow the abstract states in to be reachable. Instead, the abstract counterexample obtained in violation of the conditions needs to be eliminated by modifying .
Let be an abstract counterexample of such that and (violating S1) or (violating S2). Any self composition that agrees with on the states in for every that appears in has the same transitions in and, hence, the same transitions in . It, therefore, exhibits the same abstract counterexample in . Hence, it violates S1 or S2 and is not part of any composition-invariant pair.
Notation
Recall that is defined via conditions . This ensures that for every abstract state , is defined in the same way for all the states in . We denote the value of on the states in by (in particular, may be undefined). We get that if and only if .
Using this notation, to eliminate the abstract counterexample , one needs to eliminate at least one of the transitions in by changing the definition of for some . For a new candidate function this may be encoded by the disjunctive constraint . However, we observe that a stronger requirement may be derived from based on the following lemma:
Lemma 5
Let be a self composition function and a counterexample trace in such that but or . Then for any self composition function such that , if is reachable in from , then a counterexample trace to S1 or S2 exists.
Proof
Suppose that is reachable in from . Then there exists a trace in such that and . Since , the outgoing transitions of are the same in both and . In particular, the transition from also exists in . Therefore, is a trace to in . If , then is a counterexample to S1 in as well. Consider the case where . By the construction of Unreach, this indicates that has an outgoing abstract trace that leads to violation of S1 or S2 with every non-starving self composition function, and in particular in . ∎
Corollary 1
If there exists a composition-invariant pair , then there is also one where .
Proof
If , then by Lemma 5, is necessarily unreachable in from . Therefore, if we change , all the requirements of Lemma 4 will still hold. If no alternative value that admits the fairness requirement exists, then can remain undefined. ∎
Therefore, we require that in the next self composition candidates the abstract state must not be mapped to its current value in , i.e., , where 555If the conditions defining may overlap, we consider the condition by which the transition from to was defined..
Algorithm 1 accumulates these constraints in the set (Algorithm 1). Formally, the constraint asserts that must imply , and hence .
Identifying abstract states that must be unreachable.
A new candidate self composition is constructed such that it satisfies all the constraints in (thus ensuring that no abstract counterexample will re-appear). In the construction, we make sure to satisfy S3 (fairness). Therefore, for every abstract state , we choose a value that satisfies the constraints in and is non-starving: a value is starving for if but , i.e., some of the copies have not terminated in but none of the non-terminating copies is scheduled. (Due to adequacy, a value is starving for if and only if it is starving for every .)
If for some abstract state , all the non-starving values have already been excluded (i.e., for every non-starving ), we conclude that there is no such that is reachable in and is part of a composition-invariant pair:
Lemma 6
Let be an abstract state such that for every either is starving for or . Then, for every that satisfies S3, if satisfies S1 and S2, then is unreachable in .
Proof
If satisfies S3 and satisfies S1 and S2, then according to Lemma 4 is a part of some composition-invariant pair for . Furthermore, as shown in the proof of Lemma 4, every (abstract) state that is reachable from in satisfies . Assume to the contrary that is reachable in . Then . According to Definition 4, must be defined for , thus for some . Since is fair (satisfies S3) it must be the case that . According to the algorithm, at some iteration there was a composition with that caused adding to , i.e., there was a counterexample to S1 or S2 in in the form of a trace to . Then Lemma 5 implies that there is also a counterexample to S1 or S2 in because . This contradicts the assumption that satisfies S1 and S2. ∎
Corollary 2
If there exists a composition-invariant pair , then is unreachable in .
This is because no matter how the self composition function would be defined, is guaranteed to have an outgoing abstract counterexample trace in .
We, therefore, turn to be undefined. As a result, condition S2 of Lemma 4 requires that will be unreachable in . In Algorithm 1, this is enforced by adding to Unreach (Algorithm 1).
Every abstract state that is added to Unreach is a strengthening of the safety property by an additional constraint that needs to be obeyed in any composition-invariant pair, where obtaining a composition-invariant pair is the target of the algorithm. This makes our algorithm property directed.
If an abstract state that satisfies is added to Unreach, then Algorithm 1 determines that no solution exists (Algorithm 1). Otherwise, it generates a new constraint for based on the abstract state preceding in the abstract counterexample (Algorithm 1).
Constructing the next candidate self composition function.
Given the set of constraints in and the formula Unreach, Modify_SC (Algorithm 1) generates the next candidate composition function by (i) taking a constraint such that (typically the one that was added last), (ii) selecting a non-starving value for (such a value must exist, otherwise would have been added to Unreach), and (iii) updating the conditions defining as follows:
[TABLE]
The conditions of other values remain as before. This definition is facilitated by the fact that the same set of predicates is used both for defining and for defining the abstract states (by which is obtained). Note that in practice we do not explicitly turn to be undefined for . However, these definitions are ignored. The definition ensures that is non-starving (satisfying condition S3) and that no two conditions overlap. While the latter is not required, it also does not restrict the generality of the approach (since the language we consider is closed under Boolean operations).
Theorem 4.1
Let be a transition system, a -safety property and a set of predicates over . If Algorithm 1 returns “ no solution” then there is no composition-invariant pair for and in . Otherwise, returned by Algorithm 1 is a composition-invariant pair in , and thus .
Proof
Algorithm 1 returns “ no solution” when is satisfiable. This means that there is an abstract state that satisfies but also satisfies Unreach. By the construction of Unreach, this means that must be unreachable from in any such that a composition-invariant pair in (see Corollary 2). Hence, no such exists. Conversely, Algorithm 1 returns when all the conditions listed in Lemma 4 are met, thus is a composition-invariant pair. ∎
Complexity.
Each iteration of Algorithm 1 adds at least one constraint to , excluding a potential value for over some abstract state . An excluded values is never re-used. Hence, the number of iterations is at most the number of abstract states, , multiplied by the number of potential values for each abstract state, . Altogether, the number of iterations is at most . Each iteration makes one call to Abs_Reach which checks reachability via predicate abstraction, hence, assuming that satisfiability checks in the original logic are at most exponential, its complexity is . Therefore, the overall complexity of the algorithm is . Typically, is a small constant, hence the complexity is dominated by .
5 Evaluation and Conclusion
Implementation.
We implemented Pdsc (Algorithm 1) in Python on top of Z3 [25]. Its input is a transition system encoded by Constrained Horn Clauses (CHC) in SMT2 format, a -safety property and a set of predicates. The abstraction is implicitly encoded using the approach of [6], and is parameterized by a composition function that is modified in each iteration. For reachability checks (Abs_Reach) we use Spacer [22], which supports LRA and arrays. For the set of predicates used by Pdsc, we implemented an automatic procedure that mines these predicates from the CHC. Additional predicates may be added manually.
Experiments.
To evaluate Pdsc, we compare it to Synonym [26], the current state of the art in -safety verification.
To show the effectiveness of Pdsc, we consider examples that require a nontrivial composition (these examples are detailed in Appendix 0.A). We emphasize that the motivation for these example is originated in real-life scenarios. For example, Figure 1 follows a pattern of constant-time execution. The results of these experiments are summarized in Figure 3. Pdsc is able to find the right composition function and prove all of the examples, while Synonym cannot verify any of them. We emphasize that for these examples, lock-step composition is not sufficient. However, Pdsc infers a composition that depends on the programs’ state (variable values), rather than just program locations.
Next we consider Java programs from [26, 29], which we manually converted to C, and then converted to CHC using SeaHorn [19]. For all but 3 examples, only 2 types of predicates, which we mined automatically, were sufficient for verification: (i) relational predicates derived from the pre- and post-conditions, and (ii) for simple loops that have an index variable (e.g., for iterating over an array), an equality predicate between the copies of the indices. These predicates were sufficient since we used a large-step encoding of the transition relation, hence the abstraction via predicates takes effect only at cut-points. For the remaining 3 examples, we manually added 2–4 predicates. With the exception of 1 example where a timeout of 10 seconds was reached, all examples were solved with a lock-step composition function. Yet, we include them to show that on examples with simple compositions Pdsc performs similarly to Synonym. This can be seen in Figure 3.
5.0.1 Conclusion and Future Work.
This work formulates the problem of inferring a self composition function together with an inductive invariant for the composed program, thus capturing the interplay between the self composition and the difficulty of verifying the resulting composed program. To address this problem we present Pdsc– an algorithm for inferring a semantic self composition, directed at verifying the composed program with a given language of predicates. We show that Pdsc manages to find nontrivial self compositions that are beyond reach of existing tools. In future work, we are interested in further improving Pdsc by extending it with additional (possibly lazy) predicate discovery abilities. This has the potential to both improve performance and verify properties over wider range of programs. Additionally, we consider exploring further generalization techniques during the inference procedure.
Acknowledgements
This publication is part of a project that has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No [759102-SVIS]). The research was partially supported by Len Blavatnik and the Blavatnik Family foundation, the Blavatnik Interdisciplinary Cyber Research Center, Tel Aviv University, the Israel Science Foundation (ISF) under grant No. 1810/18 and the United States-Israel Binational Science Foundation (BSF) grant No. 2016260.
Appendix 0.A Benchmarks Used in the Evaluation
In this section, we elaborate on the examples from Figure 3.
0.A.1 DoubleSquareNI
Figure 4 depicts a non-interference problem (a 2-safety problem) where is the low input and is the high input. Taint analysis methods cannot prove non-interference for this program, and no proof exists when the product program presented in [14] is applied (see Appendix 0.B). However, using the language of predicates presented (also in Figure 4), our algorithm infers a composition-invariant pair that proves non-interference for the program.
0.A.2 HalfSquareNI
In the program presented at Figure 5 we consider the non-interference property, with pre-condition (low input) and post-condition (non-interference). The high input has no constraints as implied from the pre-condition. Intuitively, the difficulty of proving non-interference for this program is the need to "skip" the statement between the two loops in order to keep the outputs of the copies equal in every composed state along the execution. The suggested composition aligns the computations such that they proceed simultaniously only when both are at either loops, which makes true for every state of the self composed program.
0.A.3 ArrayIntMod
The example in Figure 6 is a comparator based on a Java comparator from the evaluation comparator programs. The comparator was modified to have loop that might perform two steps in a single iteration. The 2-safety property to prove for the comparator is anti-symmetry, i.e. the pre-condition is and the post-condition is . The figure also describes a composition that aligns the loops according to the value of . This yields a composed program that has an invariant that proves the desired property in the predicates language from Figure 6.
0.A.4 SquaresSum
For the program described in Figure 7 we consider the monotonicity property - a 2-safety property with pre-condition and post-condition . Considering a composition that aligns the computations to start together and run simultaniously, it is easy to see that for unbounded number iterations. However, in Figure 7 we see a composition that eases the task of finding a proof by scheduling the copies such that holds from the first iteration of copy 2 and to the end of both computations.
0.A.5 ArrayInsert
The program with a detailed explanation of its proof using a composition-invariant pair are presented in Section 1.
Appendix 0.B Demonstrating the Interplay Between Self Composition and Inductive Invariants
We illustrate the effect of the self composition function on the difficulty of verifying the obtained composed program, as well as the need for a semantic self composition function on the simple example depicted in Figure 4. The program receives as input an integer and a secret bit , and outputs . The desired specification is that the output does not depend on , which is indeed the case. Formally, this is a -safety property with pre-condition and post-condition , requiring that in any two terminating executions that start with the same values for , the final value of is the same.
As explained earlier, any fair self composition function can be soundly used to reduce the -safety problem to an ordinary safety problem. This is because the variables of the two copies of the program are completely disjoint, making the states completely independent. Therefore, the output of each copy does not depend on the actual interleaving of the two copies. As a result, if some interleaving (a fair self composition function) violates the postcondition, all of them will. That is, the actual interleaving does not affect the soundness of the reduction to traditional safety. However, when we turn to verifying the safety of the composed program by finding an inductive invariant in a given language, the specific self composition function used plays a significant role. For example, consider a composition that “synchronizes” the two copies in each control structure (e.g. [14]). Such a composition runs the two copies of the loop in parallel until one copy exits the loop, and then continues to run the other copy. The self composed program obtained by this composition is displayed in Figure 8.
We show that for this composition, there exists no inductive invariant in quantifier free linear integer arithmetic (QFLIA) that is sufficient for establishing safety of the composed program.
Proof
If we examine the set of the reachable states of the composed program at the exit point we see that it includes (for every natural ):
[TABLE]
(We omit the second copy of since both copies are equal in all the reachable states – a fact that is also expressible in QFLIA – and similarly, we omit and .)
Clearly, an inductive invariant must be satisfied by all of these states, since all of them are reachable. However, we show that any QFLIA formula that is satisfied by all of these states is also satisfied by a state that reaches a bad state (i.e., a state where ), thus if it is safe, it necessarily violates the consecution requirement, which means it is not an inductive invariant.
Let be a QFLIA formula, written in DNF form, where each is a cube (conjunction of literals). Define such that includes all states in that satisfy . We show that there exists such that is also satisfied by a state that reaches a bad state.
includes infinitely many “points” of the form where is an even number. Therefore, since there are finitely many ’s that together cover , there exists such that also includes infinitely many such points. Take two such points and in where . Then is a state (all values are integers) in the convex hull of . In particular, it must satisfy ( is a cube in LIA that is satisfied by all states in , hence it is also satisfied by all states in its convex hull).
However, when executing the while loop starting from the state , the outcome is the state , where , hence safety is violated.
This means that is not an inductive invariant strong enough to establish safety of the composed program, in contradiction. ∎
In contrast, with the composition function inferred by Pdsc (see Figure 4), the composed program has an inductive invariant in QFLIA.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Antonopoulos, T., Gazzillo, P., Hicks, M., Koskinen, E., Terauchi, T., Wei, S.: Decomposition instead of self-composition for proving the absence of timing channels. In: Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2017, Barcelona, Spain, June 18-23, 2017. pp. 362–375 (2017). https://doi.org/10.1145/3062341.3062378, https://doi.org/10.1145/3062341.3062378 · doi ↗
- 2[2] Barthe, G., Crespo, J.M., Kunz, C.: Relational verification using product programs. In: FM 2011: Formal Methods - 17th International Symposium on Formal Methods, Limerick, Ireland, June 20-24, 2011. Proceedings. pp. 200–214 (2011). https://doi.org/10.1007/978-3-642-21437-0_17, https://doi.org/10.1007/978-3-642-21437-0_17 · doi ↗
- 3[3] Barthe, G., Crespo, J.M., Kunz, C.: Beyond 2-safety: Asymmetric product programs for relational program verification. In: Logical Foundations of Computer Science, International Symposium, LFCS 2013, San Diego, CA, USA, January 6-8, 2013. Proceedings. pp. 29–43 (2013). https://doi.org/10.1007/978-3-642-35722-0_3, https://doi.org/10.1007/978-3-642-35722-0_3 · doi ↗
- 4[4] Barthe, G., D’Argenio, P.R., Rezk, T.: Secure information flow by self-composition. In: 17th IEEE Computer Security Foundations Workshop, (CSFW-17 2004), 28-30 June 2004, Pacific Grove, CA, USA. pp. 100–114 (2004). https://doi.org/10.1109/CSFW.2004.17, http://doi.ieeecomputersociety.org/10.1109/CSFW.2004.17
- 5[5] Bradley, A.R.: Sat-based model checking without unrolling. In: Verification, Model Checking, and Abstract Interpretation - 12th International Conference, VMCAI 2011, Austin, TX, USA, January 23-25, 2011. Proceedings. pp. 70–87 (2011). https://doi.org/10.1007/978-3-642-18275-4_7, https://doi.org/10.1007/978-3-642-18275-4_7 · doi ↗
- 6[6] Cimatti, A., Griggio, A., Mover, S., Tonetta, S.: IC 3 modulo theories via implicit predicate abstraction. In: Tools and Algorithms for the Construction and Analysis of Systems - 20th International Conference, TACAS 2014, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2014, Grenoble, France, April 5-13, 2014. Proceedings. pp. 46–61 (2014). https://doi.org/10.1007/978-3-642-54862-8_4, https://doi.org/10.1007/978-3-642-54862-8_4 · doi ↗
- 7[7] Clarke, E.M., Henzinger, T.A., Veith, H., Bloem, R. (eds.): Handbook of Model Checking. Springer (2018)
- 8[8] Clarkson, M.R., Schneider, F.B.: Hyperproperties. In: Proceedings of the 21st IEEE Computer Security Foundations Symposium, CSF 2008, Pittsburgh, Pennsylvania, USA, 23-25 June 2008. pp. 51–65 (2008). https://doi.org/10.1109/CSF.2008.7, https://doi.org/10.1109/CSF.2008.7 · doi ↗
