Property Directed Self Composition

Ron Shemer; Arie Gurfinkel; Sharon Shoham; Yakir Vizel

arXiv:1905.07705·cs.PL·May 28, 2019

Property Directed Self Composition

Ron Shemer, Arie Gurfinkel, Sharon Shoham, Yakir Vizel

PDF

TL;DR

This paper introduces a property-directed inference algorithm for automatically inferring semantic self composition functions and invariants to verify k-safety properties, improving over existing tools.

Contribution

It presents a novel algorithm that infers self composition functions and invariants simultaneously, enhancing verification capabilities for k-safety properties.

Findings

01

Successfully infers complex self compositions beyond existing tools.

02

Demonstrates effectiveness on various verification benchmarks.

03

Improves automation in verifying k-safety properties.

Abstract

We address the problem of verifying k-safety properties: properties that refer to k-interacting executions of a program. A prominent way to verify k-safety properties is by self composition. In this approach, the problem of checking k-safety over the original program is reduced to checking an "ordinary" safety property over a program that executes k copies of the original program in some order. The way in which the copies are composed determines how complicated it is to verify the composed program. We view this composition as provided by a semantic self composition function that maps each state of the composed program to the copies that make a move. Since the "quality" of a self composition function is measured by the ability to verify the safety of the composed program, we formulate the problem of inferring a self composition function together with the inductive invariant needed to…

Equations32

pre = v \in LowIn ⋀ v^{1} = v^{2} post = v \in LowOut ⋀ v^{1} = v^{2}

pre = v \in LowIn ⋀ v^{1} = v^{2} post = v \in LowOut ⋀ v^{1} = v^{2}

\vspace - 0.2 c m f (s_{1}, \dots, s_{k}) = M \mbox i f an d o n l y i f (s_{1}, \dots, s_{k}) ⊨ C_{M} .

\vspace - 0.2 c m f (s_{1}, \dots, s_{k}) = M \mbox i f an d o n l y i f (s_{1}, \dots, s_{k}) ⊨ C_{M} .

\vspace - 0.2 c m R^{f} = \emptyset \neq = M \subseteq {1.. k} ⋁ (C_{M} \land φ_{M}) \mbox w h er e φ_{M} = j \in M ⋀ R (V^{j}, V^{j}^{'}) \land j \neq \in M ⋀ V^{j} = V^{j}^{'}

\vspace - 0.2 c m R^{f} = \emptyset \neq = M \subseteq {1.. k} ⋁ (C_{M} \land φ_{M}) \mbox w h er e φ_{M} = j \in M ⋀ R (V^{j}, V^{j}^{'}) \land j \neq \in M ⋀ V^{j} = V^{j}^{'}

(\forall i \in M . (s_{i}, s_{i}^{'}) \in R) \land (\forall i \neq \in M . s_{i} = s_{i}^{'})

(\forall i \in M . (s_{i}, s_{i}^{'}) \in R) \land (\forall i \neq \in M . s_{i} = s_{i}^{'})

T ⊨^{k} (pre, post) \mbox i f f T^{f} ⊨ (pre, post) .

T ⊨^{k} (pre, post) \mbox i f f T^{f} ⊨ (pre, post) .

R^{*}

R^{*}

pre^{*}

pre^{*}

γ (\overset{s}{^}) = {s^{∥} \in S^{∥ k} ∣ \forall p \in P . s^{∥} ⊨ p \Leftrightarrow \overset{s}{^} (b_{p}) = 1}

γ (\overset{s}{^}) = {s^{∥} \in S^{∥ k} ∣ \forall p \in P . s^{∥} ⊨ p \Leftrightarrow \overset{s}{^} (b_{p}) = 1}

\hat{R} = {(\overset{s}{^}_{1}, \overset{s}{^}_{2}) ∣ \exists s^{∥}_{1} \in γ (\overset{s}{^}_{1}) \exists s^{∥}_{2} \in γ (\overset{s}{^}_{2}) . (s^{∥}_{1}, s^{∥}_{2}) \in R^{f}}

\hat{R} = {(\overset{s}{^}_{1}, \overset{s}{^}_{2}) ∣ \exists s^{∥}_{1} \in γ (\overset{s}{^}_{1}) \exists s^{∥}_{2} \in γ (\overset{s}{^}_{2}) . (s^{∥}_{1}, s^{∥}_{2}) \in R^{f}}

C_{M}^{'}

C_{M}^{'}

(x, y_{1}, z_{1}, y_{2}, z_{2}) \mapsto

(x, y_{1}, z_{1}, y_{2}, z_{2}) \mapsto

(n, n, 2 n - 1, n, n - 1)

\dots

(n, k n, 2 n - k, k n, n - k)

\dots

(n, n^{2}, n, n^{2}, 0)

(n, n^{2} + n, n - 1, n^{2}, 0)

\dots

(n, n^{2} + k n, n - k, n^{2}, 0)

\dots

(n, 2 n^{2}, 0, n^{2}, 0)

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

\newfloatcommand

capbtabboxtable[][\FBwidth]

11institutetext: Tel Aviv University 22institutetext: University of Waterloo 33institutetext: The Technion

Property Directed Self Composition

Ron Shemer 11

Arie Gurfinkel 22

Sharon Shoham 11

Yakir Vizel 22

Abstract

We address the problem of verifying $k$ -safety properties: properties that refer to $k$ interacting executions of a program. A prominent way to verify $k$ -safety properties is by self composition. In this approach, the problem of checking $k$ -safety over the original program is reduced to checking an “ordinary” safety property over a program that executes $k$ copies of the original program in some order. The way in which the copies are composed determines how complicated it is to verify the composed program. We view this composition as provided by a semantic self composition function that maps each state of the composed program to the copies that make a move. Since the “quality” of a self composition function is measured by the ability to verify the safety of the composed program, we formulate the problem of inferring a self composition function together with the inductive invariant needed to verify safety of the composed program, where both are restricted to a given language. We develop a property-directed inference algorithm that, given a set of predicates, infers composition-invariant pairs expressed by Boolean combinations of the given predicates, or determines that no such pair exists. We implemented our algorithm and demonstrate that it is able to find self compositions that are beyond reach of existing tools.

1 Introduction

Many relational properties, such as noninterference [12], determinism [21], service level agreements [9], and more, can be reduced to the problem of $k$ -safety. Namely, reasoning about $k$ different traces of a program simultaneously. A common approach to verifying $k$ -safety properties is by means of self composition, where the program is composed with $k$ copies of itself [4, 31]. A state of the composed program consists of the states of each copy, and a trace naturally corresponds to $k$ traces of the original program. Therefore, $k$ -safety properties of the original program become ordinary safety properties of the composition, hence reducing $k$ -safety verification to ordinary safety. This enables reasoning about $k$ -safety properties using any of the existing techniques for safety verification such as Hoare logic [20] or model checking [7].

While self composition is sound and complete for $k$ -safety, its applicability is questionable for two main reasons:

(i) considering several copies of the program greatly increases the state space; and

(ii) the way in which the different copies are composed when reducing the problem to safety verification affects the complexity of the resulting self composed program, and as such affects the complexity of verifying it.

Improving the applicability of self composition has been the topic of many works [2, 29, 14, 18, 26, 32]. However, most efforts are focused on compositions that are pre-defined, or only depend on syntactic similarities.

In this paper, we take a different approach; we build upon the observation that by choosing the “right” composition, the verification can be greatly simplified by leveraging “simple” correlations between the executions. To that end, we propose an algorithm, called Pdsc, for inferring a property directed self composition. Our approach uses a dynamic composition, where the composition of the different copies can change during verification, directed at simplifying the verification of the composed program.

Compositions considered in previous work differ in the order in which the copies of the program execute: either synchronously, asynchronously, or in some mix of the two [33, 3, 14]. To allow general compositions, we define a composition function that maps every state of the composed program to the set of copies that are scheduled in the next step. This determines the order of execution for the different copies, and thus induces the self composed program. Unlike most previous works where the composition is pre-defined based on syntactic rules only, our composition is semantic as it is defined over the state of the composed program.

To capture the difficulty of verifying the composed program, we consider verification by means of inferring an inductive invariant, parameterized by a language for expressing the inductive invariant. Intuitively, the more expressive the language needs to be, the more difficult the verification task is. We then define the problem of inferring a composition function together with an inductive invariant for verifying the safety of the composed program, where both are restricted to a given language. Note that for a fixed language $\mathcal{L}$ , an inductive invariant may exist for some composition function but not for another 111See Appendix 0.B for an example that requires a non-linear inductive invariant with a composition that is based on the control structure but has a linear invariant with another. . Thus, the restriction to $\mathcal{L}$ defines a target for the inference algorithm, which is now directed at finding a composition that admits an inductive invariant in $\mathcal{L}$ .

Example 1

To demonstrate our approach, consider the program in Figure 1. The program inserts a new value into an array. We assume that the array $A$ and its length $len$ are “low”-security variables, while the inserted value $h$ is “high”-security. The first loop finds the location in which $h$ will be inserted. Note that the number of iterations depends on the value of $h$ . Due to that, the second loop executes to ensure that the output $i$ (which corresponds to the number of iterations) does not leak sensitive data. As an example, we emphasize that without the second loop, $i$ could leak the location of $h$ in $A$ . To express the property that $i$ does not leak sensitive data, we use the 2-safety property that in any two executions, if the inputs $A$ and $len$ are the same, so is the output $i$ .

To verify the 2-safety property, consider two copies of the program. Let the language $\mathcal{L}$ for verifying the self composition be defined by the predicates depicted in Figure 1. The most natural self composition to consider is a lock-step composition, where the copies execute synchronously. However, for such a composition the composed program may reach a state where, for example, $i_{1}=i_{2}+1$ . This occurs when the first copy exists the first loop, while the second copy is still executing it. Since the language cannot express this correlation between the two copies, no inductive invariant suffices to verify that $i_{1}=i_{2}$ when the program terminates.

In contrast, when verifying the 2-safety property, Pdsc directs its search towards a composition function for which an inductive invariant in $\mathcal{L}$ does exist. As such, it infers the composition function depicted in Figure 1, as well as an inductive invariant in $\mathcal{L}$ . The invariant for this composition implies that $i_{1}=i_{2}$ at every state.

As demonstrated by the example, Pdsc focuses on logical languages based on predicate abstraction [17], where inductive invariants can be inferred by model checking. In order to infer a composition function that admits an inductive invariant in $\mathcal{L}$ , Pdsc starts from a default composition function, and modifies its definition based on the reasoning performed by the model checker during verification. As the composition function is part of the verified model (recall that it is defined over the program state), different compositions are part of the state space explored by the model checker. As a result, a key ingredient of Pdsc is identifying “bad” compositions that prevent it from finding an inductive invariant in $\mathcal{L}$ . It is important to note that a naive algorithm that tries all possible composition functions has a time complexity $O(2^{2^{|\mathcal{P}|}})$ , where $\mathcal{P}$ is the set of predicates considered. However, integrating the search for a composition function into the model checking algorithm allows us to reduce the time complexity of the algorithm to $2^{O(|\mathcal{P}|)}$ , where we show that the problem is in fact PSPACE-hard.

We implemented Pdsc using SeaHorn [19], Z3 [25] and Spacer [22] and evaluated it on examples that demonstrate the need for nontrivial semantic compositions. Our results clearly show that Pdsc can solve complex examples by inferring the required composition, while other tools cannot verify these examples. We emphasize that for these particular examples, lock-step composition is not sufficient. We also evaluated Pdsc on the examples from [29, 26] that are proven with the trivial lock-step composition. On these examples, Pdsc is comparable to state of the art tools.

1.0.1 Related work.

This paper addresses the problem of verifying k-safety properties (also called hyperproperties [8]) by means of self composition. Other approaches tackle the problem without self-composition, and often focus on more specific properties, most noticeably the $2$ -safety noninterference property (e.g. [1, 32]). Below we focus on works that use self-composition.

Previous work such as [4, 2, 3, 15, 31, 14] considered self composition (also called product programs) where the composition function is constant and set a-priori, using syntax-based hints. While useful in general, such self compositions may sometimes result in programs that are too complex to verify. This is in contrast to our approach, where the composition function is evolving during verification, and is adapted to the capabilities of the model checker.

The work most closely related to ours is [29] which introduces Cartesian Hoare Logic (CHL) for verification of $k$ -safety properties, and designs a verification framework for this logic. This work is further improved in [26]. These works search for a proof in CHL, and in doing so, implicitly modify the composition. Our work infers the composition explicitly and can use off-the-shelf model checking tools. More importantly, when loops are involved both [29] and [26] use lock-step composition and align loops syntactically. Our algorithm, in contrast, does not rely on syntactic similarities, and can handle loops that cannot be aligned trivially.

There have been several results in the context of harnessing Constraint Horn Clauses (CHC) solvers for verification of relational properties [11, 24]. Given several copies of a CHC system, a product CHC system that synchronizes the different copies is created by a syntactical analysis of the rules in the CHC system. These works restrict the synchronization points to CHC predicates (i.e., program locations), and consider only one synchronization (obtained via transformations of the system of CHCs). On the other hand, our algorithm iteratively searches for a good synchronization (composition), and considers synchronizations that depend on program state.

Equivalence checking and regression verification.

Equivalence checking is another closely related research field, where a composition of several programs is considered. As an example, equivalence checking is applied to verify the correctness of compiler optimizations [33, 28, 10, 18]. In [28] the composition is determined by a brute-force search for possible synchronization points. While this brute-force search resembles our approach for finding the correct composition, it is not guided by the verification process. The works in [10, 18] identify possible synchronization points syntactically, and try to match them during the construction of a simulation relation between programs.

Regression verification also requires the ability to show equivalence between different versions of a program [15, 16, 30]. The problem of synchronizing unbalanced loops appears in [30] in the form of unbalanced recursive function calls. To allow synchronization in such cases, the user can specify different unrolling parameters for the different copies. In contrast, our approach relies only on user supplied predicates that are needed to establish correctness, while synchronization is handled automatically.

2 Preliminaries

In this paper we reason about programs by means of the transition systems defining their semantics. A transition system is a tuple $T=(S,R,F)$ , where $S$ is a set of states, $R\subseteq S\times S$ is a transition relation that specifies the steps in an execution of the program, and $F\subseteq S$ is a set of terminal states $F\subseteq S$ such that every terminal state $s\in F$ has an outgoing transition to itself and no additional transitions (terminal states allow us to reason about pre/post specifications of programs). An execution or trace $\pi=s_{0},s_{1},\ldots$ is a (finite or infinite) sequence of states such that for every $i\geq 0$ , $(s_{i},s_{i+1})\in R$ . The execution is terminating if there exists $0\leq i\leq|\pi|$ such that $s_{i}\in F$ . In this case, the suffix of the execution is of the form $s_{i},s_{i},\ldots$ and we say that $\pi$ ends at $s_{i}$ .

As usual, we represent transition systems using logical formulas over a set of variables, corresponding to the program variables. We denote the set of variables by $\mathcal{V}$ . The set of terminal states is represented by a formula over $\mathcal{V}$ and the transition relation is represented by a formula over $\mathcal{V}\uplus\mathcal{V}^{\prime}$ , where $\mathcal{V}$ represents the pre-state of a transition and $\mathcal{V}^{\prime}=\{v^{\prime}\mid v\in\mathcal{V}\}$ represents its post-state. In the sequel, we use sets of states and their symbolic representation via formulas interchangeably.

Safety and inductive invariants.

We consider safety properties defined via pre/post conditions.222Our results can be extended to arbitrary safety (and $k$ -safety) properties by introducing “observable” states to which the property may refer. A safety property is a pair $(\textit{pre},\textit{post})$ where $\textit{pre},\textit{post}$ are formulas over $\mathcal{V}$ , representing subsets of $S$ , denoting the pre- and post-condition, respectively. $T$ satisfies $(\textit{pre},\textit{post})$ , denoted $T\models(\textit{pre},\textit{post})$ , if every terminating execution $\pi$ of $T$ that starts in a state $s_{0}$ such that $s_{0}\models\textit{pre}$ ends in a state $s$ such that $s\models\textit{post}$ . In other words, for every state $s$ that is reachable in $T$ from a state in pre we have that $s\models F\rightarrow\textit{post}$ .

A prominent way to verify safety properties is by finding an inductive invariant. An inductive invariant for a transition system $T$ and a safety property $(\textit{pre},\textit{post})$ is a formula $\mathit{Inv}$ such that

(1) $\textit{pre}\Rightarrow\mathit{Inv}$ (initiation),

(2) $\mathit{Inv}\wedge R\Rightarrow\mathit{Inv}^{\prime}$ (consecution), and

(3) $\mathit{Inv}\Rightarrow(F\rightarrow\textit{post})$ (safety),

where $\varphi\Rightarrow\psi$ denotes the validity of $\varphi\to\psi$ , and $\varphi^{\prime}$ denotes $\varphi(\mathcal{V}^{\prime})$ , i.e., the formula obtained after substituting every $v\in\mathcal{V}$ by the corresponding $v^{\prime}\in\mathcal{V}$ . If there exists such an inductive invariant, then $T\models(\textit{pre},\textit{post})$ .

$k$ -safety.

A $k$ -safety property refers to $k$ interacting executions of $T$ . Similarly to an ordinary property, it is defined by $(\textit{pre},\textit{post})$ , except that pre and post are defined over $\mathcal{V}^{1}\uplus\ldots\uplus\mathcal{V}^{k}$ where $\mathcal{V}^{i}=\{v^{i}\mid v\in\mathcal{V}\}$ denotes the $i$ th copy of the program variables. As such, pre and post represent sets of $k$ -tuples of program states ( $k$ -states for short): for a $k$ -tuple $(s_{1},\ldots,s_{k})$ of states and a formula $\varphi$ over $\mathcal{V}^{1}\uplus\ldots\uplus\mathcal{V}^{k}$ , we say that $(s_{1},\ldots,s_{k})\models\varphi$ if $\varphi$ is satisfied when for each $i$ , the assignment of $\mathcal{V}^{i}$ is determined by $s_{i}$ . We say that $T$ satisfies $(\textit{pre},\textit{post})$ , denoted $T\models^{k}(\textit{pre},\textit{post})$ , if for every $k$ terminating executions $\pi^{1},\ldots,\pi^{k}$ of $T$ that start in states $s_{1},\ldots,s_{k}$ , respectively, such that $(s_{1},\ldots,s_{k})\models\textit{pre}$ , it holds that they end in states $t_{1},\ldots,t_{k}$ , respectively, such that $(t_{1},\ldots,t_{k})\models\textit{post}$ .

For example, the non interference property may be specified by the following $2$ -safety property:

[TABLE]

where $\mathrm{LowIn}$ and $\mathrm{LowOut}$ denote subsets of the program inputs, resp. outputs, that are considered “low security” and the rest are classified as “high security”. This property asserts that every $2$ terminating executions that start in states that agree on the “low security” inputs end in states that agree on the low security outputs, i.e., the outcome does not depend on any “high security” input and, hence, does not leak secure information.

Checking $k$ -safety properties reduces to checking ordinary safety properties by creating a self composed program that consists of $k$ copies of the transition system, each with its own copy of the variables, that run in parallel in some way. Thus, the self composed program is defined over variables ${\mathcal{V}^{\|k}}=\mathcal{V}^{1}\uplus\ldots\uplus\mathcal{V}^{k}$ , where $\mathcal{V}^{i}=\{v^{i}\mid v\in\mathcal{V}\}$ denotes the variables associated with the $i$ th copy. For example, a common composition is a lock-step composition in which the copies execute simultaneously. The resulting composed transition system ${T^{\|k}}=({S^{\|k}},{R^{\|k}},{F^{\|k}})$ is defined such that ${S^{\|k}}=S\times\ldots\times S$ , ${F^{\|k}}=\bigwedge_{i=1}^{k}F(\mathcal{V}^{i})$ and ${R^{\|k}}=\bigwedge_{i=1}^{k}R(\mathcal{V}^{j},{\mathcal{V}^{j}}^{\prime})$ . Note that ${R^{\|k}}$ is defined over ${\mathcal{V}^{\|k}}\uplus{{\mathcal{V}^{\|k}}}^{\prime}$ (as usual). Then, the $k$ -safety property $(\textit{pre},\textit{post})$ is satisfied by $T$ if and only if an ordinary safety property $(\textit{pre},\textit{post})$ is satisfied by ${T^{\|k}}$ . More general notions of self composition are investigated in Section 3.

3 Inferring Self Compositions for Restricted Languages of Inductive Invariants

Any self-composition is sufficient for reducing $k$ -safety to safety, e.g., lock-step, sequential, synchronous, asynchronous, etc. However, the choice of the self-composition used determines the difficulty of the resulting safety problem. Different self composed programs would require different inductive invariants, some of which cannot be expressed in a given logical language.

In this section, we formulate the problem of inferring a self composition function such that the obtained self composed program may be verified with a given language of inductive invariants. We are, therefore, interested in inferring both the self composition function and the inductive invariant for verifying the resulting self composed program. We start by formulating the kind of self compositions that we consider.

In the sequel, we fix a transition system $T=(S,R,F)$ with a set of variables $\mathcal{V}$ .

3.1 Semantic Self Composition

Roughly speaking, a $k$ self composition of $T$ consists of $k$ copies of $T$ that execute together in some order, where steps may interleave or be performed simultaneously. The order is determined by a self composition function, which may also be viewed as a scheduler that is responsible for scheduling a subset of the copies in each step. We consider semantic compositions in which the order may depend on the states of the different copies, as well as the correlations between them (as opposed to syntactic compositions that only depend on the control locations of the copies, but may not depend on the values of other variables):

Definition 1 (Semantic Self Composition Function)

A semantic $k$ self composition function ( $k$ -composition function for short) is a function $f:S^{k}\to\mathbb{P}(\{1..k\})$ , mapping each $k$ -state to a nonempty set of copies that are to participate in the next step of the self composed program333We consider memoryless composition functions. Compositions that depend on the history of the (joint) execution are supported via ghost state added to the program to track the history..

We represent a $k$ -composition function $f$ by a set of logical conditions, with a condition $C_{M}$ for every nonempty subset $M\subseteq\{1..k\}$ of the copies. For each such $M\subseteq\{1..k\}$ , the condition $C_{M}$ is defined over ${\mathcal{V}^{\|k}}=\mathcal{V}^{1}\uplus\ldots\uplus\mathcal{V}^{k}$ , and hence it represents a set of $k$ -states, with the meaning that all the $k$ -states that satisfy $C_{M}$ are mapped to $M$ by $f$ :

[TABLE]

To ensure that the function is well defined, we require that $(\bigvee_{M}C_{M})\equiv\textit{true}$ , which ensures that every $k$ -state satisfies at least one of the conditions. We also require that for every $M_{1}\neq M_{2}$ , $C_{M_{1}}\wedge C_{M_{2}}\equiv\textit{false}$ , hence every $k$ -state satisfies at most one condition. Together these requirements ensure that the conditions induce a partition of the set of all $k$ -states. In the sequel, we identify a $k$ -composition function $f$ with its symbolic representation via conditions $\{C_{M}\}_{M}$ and use them interchangeably.

Definition 2 (Composed Program)

Given a $k$ -composition function $f$ , represented via conditions $C_{M}$ for every nonempty set $M\subseteq\{1..k\}$ , we define the $k$ self composition of $T$ to be the transition system ${T^{f}}=({S^{\|k}},{R^{f}},{F^{\|k}})$ over variables ${\mathcal{V}^{\|k}}=\mathcal{V}^{1}\uplus\ldots\uplus\mathcal{V}^{k}$ defined as follows: ${F^{\|k}}=\bigwedge_{i=1}^{k}F^{i}$ , where $F^{i}=F(\mathcal{V}^{i})$ , and

[TABLE]

Thus, in ${T^{f}}$ , the set of states consists of $k$ -states ( ${S^{\|k}}=S\times\ldots\times S$ ), the terminal states are $k$ -states in which all the individual states are terminal, and the transition relation includes a transition from $(s_{1},\ldots,s_{k})$ to $(s_{1}^{\prime},\ldots,s_{k}^{\prime})$ if and only if $f(s_{1},\ldots,s_{k})=M$ and

[TABLE]

That is, every transition of ${T^{f}}$ corresponds to a simultaneous transition of a subset $M$ of the $k$ copies of $T$ , where the subset is determined by the self composition function $f$ . If $f(s_{1},\ldots,s_{k})=M$ , then for every $i\in M$ we say that $i$ is scheduled in $(s_{1},\ldots,s_{k})$ .

Example 2

A $k$ self composition that runs the $k$ copies of $T$ sequentially, one after the other, corresponds to a $k$ -composition function $f$ defined by $f(s_{1},\ldots,s_{k})=\{i\}$ where $i\in\{1..k\}$ is the minimal index of a non-terminal state in $\{s_{1},\ldots,s_{k}\}$ . If all states in $\{s_{1},\ldots,s_{k}\}$ are terminal then $i=k$ (or any other index). This is encoded as follows: for every $1\leq i<k$ , $C_{\{i\}}=\neg F^{i}\wedge\bigwedge_{j<i}F^{j}$ , $C_{\{k\}}=\bigwedge_{j<k}F^{j}$ and $C_{M}=\textit{false}$ for every other $M\subseteq\{1..k\}$ .

Example 3

The lock-step composition that runs the $k$ copies of $T$ synchronously corresponds to a $k$ -self composition function $f$ defined by $f(s_{1},\ldots,s_{k})=\{1,\ldots,k\}$ , and encoded by $C_{\{1,\ldots,k\}}=\textit{true}$ and $C_{M}=\textit{false}$ for every other $M\subseteq\{1..k\}$ .

In order to ensure soundness of a reduction of $k$ -safety to safety via self composition, one has to require that the self composition function does not “starve” any copy of the transition system that is about to terminate if it continues to execute. We refer to this requirement as fairness.

Definition 3 (Fairness)

A $k$ -self composition function $f$ is fair if for every $k$ terminating executions $\pi^{1},\ldots,\pi^{k}$ of $T$ there exists an execution ${\pi^{\|}}$ of ${T^{f}}$ such that for every copy $i\in\{1..k\}$ , the projection of ${\pi^{\|}}$ to $i$ is $\pi^{i}$ .

Note that by the definition of the terminal states of ${T^{f}}$ , ${\pi^{\|}}$ as above is guaranteed to be terminating. We say that the $i$ th copy terminates in ${\pi^{\|}}$ if ${\pi^{\|}}$ contains a $k$ -state $(s_{1},\ldots,s_{k})$ such that $s_{i}\in F$ . Fairness may be enforced in a straightforward way by requiring that whenever $f(s_{1},\ldots,s_{k})=M$ , the set $M$ includes no index $i$ for which $s_{i}\in F$ , unless all have terminated. Since we assume that terminal states may only transition to themselves, a weaker requirement that suffices to ensure fairness is that $M$ includes at least one index $i$ for which $s_{i}\not\in F$ , unless there is no such index.

The following claim is now straightforward:

Lemma 1

Let $T$ be a transition system, $(\textit{pre},\textit{post})$ a $k$ -safety property, and $f$ a fair $k$ -composition function for $T$ and $(\textit{pre},\textit{post})$ . Then

[TABLE]

Proof (sketch)

Every terminating execution of ${T^{f}}$ corresponds to $k$ terminating executions of $T$ . Fairness of $f$ ensures that the converse also holds.

To demonstrate the necessity of the fairness requirement, consider a (non-fair) self composition function $f$ that maps every state to $\{1\}$ . Then, regardless of what the actual transition system $T$ does, the resulting self composition ${T^{f}}$ satisfies every pre-post specification vacuously, as it never reaches a terminal state.

Remark 1

While we require the conditions $\{C_{M}\}_{M}$ defining a self composition function $f$ to induce a partition of ${S^{\|k}}$ in order to ensure that $f$ is well defined as a (total) function, the requirement may be relaxed in two ways. First, we may allow $C_{M_{1}}$ and $C_{M_{2}}$ to overlap. This will add more transitions and may make the task of verifying the composed program more difficult, but it maintains the soundness of the reduction. Second, it suffices that the conditions cover the set of reachable states of the composed program rather than the entire state space. These relaxations do not damage soundness. Technically, this means that $f$ represented by the conditions is a relation rather than a function. We still refer to it as a function and write $f(s_{1},\ldots,s_{k})=M$ to indicate that $(s_{1},\ldots,s_{k})\models C_{M}$ , not excluding the possibility that $(s_{1},\ldots,s_{k})\models M^{\prime}$ for $M^{\prime}\neq M$ as well. We note that as long as the language used to describe compositions is closed under Boolean operations, we can always extract from the conditions $\{C_{M}\}_{M}$ a function $f^{\prime}$ . This is done as follows:

•

To prevent the overlap between conditions, determine an arbitrary total order $<$ on the sets $M\subseteq\{1..k\}$ and set $C_{M}^{\prime}:=C_{M}\wedge\bigwedge_{N<M}\neg C_{N}$ .

•

To ensure that the conditions cover the entire state space, set $C_{\{1..k\}}^{\prime}:=C_{\{1..k\}}^{\prime}\vee\neg(\bigvee_{M}C_{M})$ .

It is easy to verify that $f^{\prime}$ defined by $\{C^{\prime}_{M}\}_{M}$ is a total self composition function and that if $f$ is fair, then so is $f^{\prime}$ .

3.2 The Problem of Inferring Self Composition with Inductive Invariant

Lemma 1 states the soundness of the reduction of $k$ -safety to ordinary safety. Together with the ability to verify safety by means of an inductive invariant, this leads to a verification procedure. However, while soundness of the reduction holds for any self composition, an inductive invariant in a given language may exist for the composed program resulting from some compositions but not from others. We therefore consider the self composition function and the inductive invariant together, as a pair, leading to the following definition.

Definition 4

Let $T$ be a transition system and $(\textit{pre},\textit{post})$ a $k$ safety property. For a formula $\mathit{Inv}$ over ${\mathcal{V}^{\|k}}$ and a self composition function $f$ represented by conditions $\{C_{M}\}_{M}$ , we say that $(f,\mathit{Inv})$ is a composition-invariant pair for $T$ and $(\textit{pre},\textit{post})$ if the following conditions hold:

•

$\textit{pre}\Rightarrow\mathit{Inv}$ (initiation of $\mathit{Inv}$ ),

•

for every $\emptyset\neq M\subseteq\{1..k\}$ , $\mathit{Inv}\wedge C_{M}\wedge\varphi_{M}\Rightarrow\mathit{Inv}^{\prime}$ (consecution of $\mathit{Inv}$ for ${R^{f}}$ ),

•

$\mathit{Inv}\Rightarrow\big{(}(\bigwedge_{j=1}^{k}F^{j})\rightarrow\textit{post}\big{)}$ (safety of $\mathit{Inv}$ ),

•

$\mathit{Inv}\Rightarrow\bigvee_{M}C_{M}$ ( $f$ covers the reachable states),

•

for every $\emptyset\neq M\subseteq\{1..k\}$ , $C_{M}\wedge(\bigvee_{j=1}^{k}\neg F^{j})\Rightarrow\bigvee_{j\in M}\neg F^{j}$ ( $f$ is fair).

As commented in Remark 1, we relax the requirement that $(\bigvee_{M}C_{M})\equiv\textit{true}$ to $\mathit{Inv}\Rightarrow\bigvee_{M}C_{M}$ , thus ensuring that the conditions cover all the reachable states. Since the reachable states of ${T^{f}}$ are determined by $\{C_{M}\}_{M}$ (which define $f$ ), this reveals the interplay between the self composition function and the inductive invariant. Furthermore, we do not require that $C_{M_{1}}\wedge C_{M_{2}}\equiv\textit{false}$ for $M_{1}\neq M_{2}$ , hence a $k$ -state may satisfy multiple conditions. As explained earlier, these relaxations do not damage soundness. Furthermore, if we construct from $f$ a self composition function $f^{\prime}$ as described in Remark 1, $\mathit{Inv}$ would be an inductive invariant for ${T^{f^{\prime}}}$ as well.

Lemma 2

If there exists a composition-invariant pair $(f,\mathit{Inv})$ for $T$ and $(\textit{pre},\textit{post})$ , then $T\models^{k}(\textit{pre},\textit{post})$ .

Proof (sketch)

If $(f,\mathit{Inv})$ is a composition-invariant pair, then $\mathit{Inv}$ is an inductive invariant for ${T^{f^{\prime}}}$ , where $f^{\prime}$ is a fair composition function defined as in Remark 1. From Lemma 1 we conclude that $T\models^{k}(\textit{pre},\textit{post})$ .

If we do not restrict the language in which $f$ and $\mathit{Inv}$ are specified, then the converse also holds. However, in the sequel we are interested in the ability to verify $k$ -safety with a given language, e.g., one for which the conditions of Definition 4 belong to a decidable fragment of logic and hence can be discharged automatically.

Definition 5 (Inference in $\mathcal{L}$ )

Let $\mathcal{L}$ be a logical language. The problem of inferring a composition-invariant pair in $\mathcal{L}$ is defined as follows. The input is a transition system $T$ and a $k$ -safety property $(\textit{pre},\textit{post})$ . The output is a composition-invariant pair $(f,\mathit{Inv})$ for $T$ and $(\textit{pre},\textit{post})$ (as defined in Definition 4), where $\mathit{Inv}\in\mathcal{L}$ and $f$ is represented by conditions $\{C_{M}\}_{M}$ such that $C_{M}\in\mathcal{L}$ for every $\emptyset\neq M\subseteq\{1..k\}$ . If no such pair exists, the output is “no solution”.

When no solution exists, it does not necessarily mean that $T\not\models^{k}(\textit{pre},\textit{post})$ . Instead, it may be that the language $\mathcal{L}$ is simply not expressive enough. Unfortunately, for expressive languages (e.g., quantified formulas or even quantifier free linear integer arithmetic), the problem of inferring an inductive invariant alone is already undecidable, making the problem of inferring a composition-invariant pair undecidable as well:

Lemma 3

Let $\mathcal{L}$ be closed under Boolean operations and under substitution of a variable with a value, and include equalities of the form $v=a$ , where $v$ is a variable and $a$ is a value (of the same sort). If the problem of inferring an inductive invariant in $\mathcal{L}$ is undecidable, then so is the problem of inferring a composition-invariant pair in $\mathcal{L}$ .

Proof

We show a reduction from the ordinary invariant inference problem in $\mathcal{L}$ to the problem of inferring a composition-invariant pair in $\mathcal{L}$ . Given a transition system $T$ and an ordinary safety property $(\textit{pre},\textit{post})$ the reduction constructs a transition system $T^{*}=(S^{*},R^{*},F^{*})$ over $\mathcal{V}^{*}=\mathcal{V}\uplus\{b\}$ , where $b$ is a new Boolean variable such that when $b=\textit{true}$ the original transitions are taken and when $b=\textit{false}$ the systems remains in the same state, which is also added to the set of terminal states. Formally, for every $v\in\mathcal{V}$ , let $a_{v}$ be an arbitrary fixed value in the domain of $v$ . For example, if $v$ is Boolean, $a_{v}=\textit{false}$ . The reduction constructs

[TABLE]

and the following $2$ -safety property:

[TABLE]

That is, the first copy is “initialized” with $b=\textit{true}$ and with the original pre-condition and is required to terminate in a state that satisfies the original post-condition, while the second copy is initialized with $b=\textit{false}$ , and with the value $a_{v}$ for each original variable, and is required to terminate in the same state. Clearly, if $T$ has an inductive invariant $\mathit{Inv}$ for $(\textit{pre},\textit{post})$ , then $(f,b^{1}\wedge\mathit{Inv}(\mathcal{V}^{1})\wedge\neg b^{2}\wedge\bigwedge_{v\in\mathcal{V}}v^{2}=a_{v})$ is a composition-invariant pair for $T^{*}$ and $(\textit{pre}^{*},\textit{post}^{*})$ , where $f$ is defined by $C_{\{1,2\}}=\textit{true}$ and $C_{M}=\textit{false}$ for any other $M$ , which is clearly in $\mathcal{L}$ . For the converse direction, if $T^{*}$ has a composition-invariant pair $(f,\mathit{Inv}^{*})$ for $(\textit{pre}^{*},\textit{post}^{*})$ then $\mathit{Inv}$ obtained by substituting each positive occurrence of $b^{2}$ in $\mathit{Inv}^{*}$ by false, each negative occurrence of $b^{2}$ by true and each occurrence of $v^{2}$ by $a_{v}$ is an inductive invariant for $T$ and $(\textit{pre},\textit{post})$ . ∎

For example, linear integer arithmetic satisfies the conditions of the lemma. This motivates us to restrict the languages of inductive invariants. Specifically, we consider languages defined by a finite set of predicates. We consider relational predicates, defined over ${\mathcal{V}^{\|k}}=\mathcal{V}^{1}\uplus\ldots\uplus\mathcal{V}^{k}$ . For a finite set of predicates $\mathcal{P}$ , we define $\mathcal{L}_{\mathcal{P}}$ to be the set of all formulas obtained by Boolean combinations of the predicates in $\mathcal{P}$ .

Definition 6 (Inference using predicate abstraction)

The problem of inferring a predicate-based composition-invariant pair is defined as follows. The input is a transition system $T$ , a $k$ -safety property $(\textit{pre},\textit{post})$ , and a finite set of predicates $\mathcal{P}$ . The output is the solution to the problem of inferring a composition-invariant pair for $T$ and $(\textit{pre},\textit{post})$ in $\mathcal{L}_{\mathcal{P}}$ .

Remark 2

It is possible to decouple the language used for expressing the self composition function from the language used to express the inductive invariant. Clearly, different sets of predicates (and hence languages) can be assigned to the self composition function and to the inductive invariant. However, since inductiveness is defined with respect to the transitions of the composed system, which are in turn defined by the self composition function, if the language defining $f$ is not included in the language defining $\mathit{Inv}$ , the conditions $C_{M}$ themselves would be over-approximated when checking the requirements of Definition 4 and therefore would incur a precision loss. For this reason, we use the same language for both.

Since the problem of invariant inference in $\mathcal{L}_{\mathcal{P}}$ is PSPACE-hard [23], a reduction from the problem of inferring inductive invariants to the problem of inferring composition-invariant pairs (similar to the one used in the proof of Lemma 3) shows that composition-invariant inference in $\mathcal{L}_{\mathcal{P}}$ is also PSPACE-hard:

Theorem 3.1

Inferring a predicate-based composition-invariant pair is PSPACE-hard.

4 Algorithm for Inferring Composition-Invariant Pairs

In this section, we present Property Directed Self-Composition, Pdsc for short — our algorithm for tackling the composition-invariant inference problem for languages of predicates (Definition 6). Namely, given a transition system $T$ , a $k$ -safety property $(\textit{pre},\textit{post})$ and a finite set of predicates $\mathcal{P}$ , we address the problem of finding a pair $(f,\mathit{Inv}$ ), where $f$ is a self composition function and $\mathit{Inv}$ is an inductive invariant for the composed transition system ${T^{f}}$ obtained from $f$ , and both of them are in $\mathcal{L}_{\mathcal{P}}$ , i.e., defined by Boolean combinations of the predicates in $\mathcal{P}$ .

We rely on the property that a transition system (in our case ${T^{f}}$ ) has an inductive invariant in $\mathcal{L}_{\mathcal{P}}$ if and only if its abstraction obtained using $\mathcal{P}$ is safe. This is because, the set of reachable abstract states is the strongest set expressible in $\mathcal{L}_{\mathcal{P}}$ that satisfies initiation and consecution. Given ${T^{f}}$ , this allows us to use predicate abstraction to either obtain an inductive invariant in $\mathcal{L}_{\mathcal{P}}$ for ${T^{f}}$ (if the abstraction of ${T^{f}}$ is safe) or determine that no such inductive invariant exists (if an abstract counterexample trace is obtained). The latter indicates that a different self composition function needs to be considered. A naive realization of this idea gives rise to an iterative algorithm that starts from an arbitrary initial composition function and in each iteration computes a new composition function. At the worst case such an algorithm enumerates all self composition functions defined in $\mathcal{L}_{\mathcal{P}}$ , i.e., has time complexity $O(2^{2^{|\mathcal{P}|}})$ . Importantly, we observe that, when no inductive invariant exists for some composition function, we can use the abstract counterexample trace returned in this case to (i) generalize and eliminate multiple composition functions, and (ii) identify that some abstract states must be unreachable if there is to be a composition-invariant pair, i.e., we “block” states in the spirit of property directed reachability [5, 13]. This leads to the algorithm depicted in Algorithm 1 whose worst case time complexity is $2^{O(|\mathcal{P}|)}$ . Next, we explain the algorithm in detail.

Finding an inductive invariant for a given composition function using predicate abstraction.

We use predicate abstraction [17, 27] to check if a given candidate composition function has a corresponding inductive invariant. This is done as follows. The abstraction of ${T^{f}}$ using $\mathcal{P}$ , denoted $A_{\mathcal{P}}({T^{f}})$ , is a transition system $(\hat{S},\hat{R})$ defined over variables $\mathcal{B}$ , where $\mathcal{B}=\{b_{p}\mid p\in\mathcal{P}\}$ (we omit the terminal states). $\hat{S}=\{0,1\}^{\mathcal{B}}$ , i.e., each abstract state corresponds to a valuation of the Boolean variables representing $\mathcal{P}$ . An abstract state $\hat{s}\in\hat{S}$ represents the following set of states of ${T^{f}}$ :

[TABLE]

We extend $\gamma$ to sets of states and to formulas representing sets of states in the usual way. The abstract transition relation is defined as usual:

[TABLE]

Note that the set of abstract states in $A_{\mathcal{P}}({T^{f}})$ does not depend on $f$ .

Notation

We sometimes refer to an abstract state $\hat{s}\in\hat{S}$ as the formula $\bigwedge_{\hat{s}(b_{p})=1}b_{p}\wedge\bigwedge_{\hat{s}(b_{p})=0}\neg b_{p}$ . For a formula $\psi\in\mathcal{L}_{\mathcal{P}}$ , we denote by $\psi(\mathcal{B})$ the result of substituting each $p\in\mathcal{P}$ in $\psi$ by the corresponding Boolean variable $b_{p}$ . For the opposite direction, given a formula $\psi$ over $\mathcal{B}$ , we denote by $\psi(\mathcal{P})$ the formula in $\mathcal{L}_{\mathcal{P}}$ resulting from substituting each $b_{p}\in\mathcal{B}$ in $\psi$ by $p$ . Therefore, $\psi(\mathcal{P})$ is a symbolic representation of $\gamma(\psi)$ .

Every set defined by a formula $\psi\in\mathcal{L}_{\mathcal{P}}$ is precisely represented by $\psi(\mathcal{B})$ in the sense that $\gamma(\psi(\mathcal{B}))$ is equal to the set of states defined by $\psi$ , i.e., $\psi(\mathcal{B})$ is a precise abstraction of $\psi$ . For simplicity, we assume that the termination conditions as well as the pre/post specification can be expressed precisely using the abstraction, in the following sense:

Definition 7

$\mathcal{P}$ is adequate for $T$ and $(\textit{pre},\textit{post})$ if there exist $\varphi_{\textit{pre}},\varphi_{\textit{post}},\varphi_{F^{i}}\in\mathcal{L}_{\mathcal{P}}$ such that $\varphi_{\textit{pre}}\equiv\textit{pre}$ , $\varphi_{\textit{post}}\equiv\textit{post}$ and $\varphi_{F^{i}}\equiv F^{i}$ (for every copy $i\in\{1..k\}$ ).

The following lemma provides the foundation for our algorithm:

Lemma 4

Let $T$ be a transition system, $(\textit{pre},\textit{post})$ a $k$ safety property, and $\mathcal{P}$ a finite set of predicates adequate for $T$ and $(\textit{pre},\textit{post})$ . For a self composition function $f$ defined via conditions $\{C_{M}\}_{M}$ in $\mathcal{L}_{\mathcal{P}}$ , there exists an inductive invariant $\mathit{Inv}$ in $\mathcal{L}_{\mathcal{P}}$ such that $(f,\mathit{Inv})$ is a composition-invariant pair for $T$ and $(\textit{pre},\textit{post})$ if and only if the following three conditions hold:

S1

All reachable states of $A_{\mathcal{P}}({T^{f}})$ from $\varphi_{\textit{pre}}(\mathcal{B})$ satisfy $(\bigwedge_{i=1}^{k}\varphi_{F^{i}}(\mathcal{B}))\rightarrow\varphi_{\textit{post}}(\mathcal{B})$ ,

S2

All reachable states of $A_{\mathcal{P}}({T^{f}})$ from $\varphi_{\textit{pre}}(\mathcal{B})$ satisfy $\bigvee_{M}C_{M}(\mathcal{B})$ , and

S3

For every $\emptyset\neq M\subseteq\{1..k\}$ , $C_{M}(\mathcal{B})\wedge(\bigvee_{j=1}^{k}\neg\varphi_{F^{j}}(\mathcal{B}))\Rightarrow\bigvee_{j\in M}\neg\varphi_{F^{j}}(\mathcal{B})$ .

Furthermore, if the conditions hold, then the symbolic representation of the set of abstract states of $A_{\mathcal{P}}({T^{f}})$ reachable from $\varphi_{\textit{pre}}(\mathcal{B})$ is a formula $\mathit{Inv}$ over $\mathcal{B}$ such that $(f,\mathit{Inv}(\mathcal{P}))$ is a composition-invariant pair for $T$ and $(\textit{pre},\textit{post})$ .

Proof

The proof relies on the following statement, denoted by $(*)$ : for a formula $\varphi$ in $\mathcal{L}_{\mathcal{P}}$ and an abstract state $\hat{s}$ , for every ${s^{\|}}\in\gamma(\hat{s})$ it holds that ${s^{\|}}\models\varphi\Leftrightarrow\hat{s}\models\varphi(\mathcal{B})$ (which follows by induction on the structure of a formula in $\mathcal{L}_{\mathcal{P}}$ , relying on the definition of $\gamma(\hat{s})$ ). In particular, this implies that for a formula $\psi$ over $\mathcal{B}$ , it holds that ${s^{\|}}\models\psi(\mathcal{P})\Leftrightarrow\hat{s}\models{\psi}$ whenever ${s^{\|}}\in\gamma(\hat{s})$ .

( $\Rightarrow$ ) Let $T$ , $(\textit{pre},\textit{post})$ and $\mathcal{P}$ be as described, and let ( $f,\mathit{Inv}$ ) be a composition-invariant pair for $T$ and $(\textit{pre},\textit{post})$ in $\mathcal{L}_{\mathcal{P}}$ . We first show that every (abstract) state that is reachable from $\varphi_{\textit{pre}}(\mathcal{B})$ in $A_{\mathcal{P}}({T^{f}})$ satisfies $\mathit{Inv}(\mathcal{B})$ . Let $\hat{s}$ be such a reachable state. Then there exists an abstract trace $\hat{s}_{1},\ldots,\hat{s}_{m}$ such that $\hat{s}_{1}\models\varphi_{\textit{pre}}(\mathcal{B})$ , $\hat{s}_{m}=\hat{s}$ and $(\hat{s}_{i},\hat{s}_{i+1})\in\hat{R}$ for every $1\leq i<m$ . Consider a concrete state ${s^{\|}}_{1}$ of ${T^{f}}$ such that ${s^{\|}}_{1}\in\gamma(\hat{s}_{1})$ , then $\hat{s}_{1}\models\varphi_{\textit{pre}}(\mathcal{B})$ and from $(*)$ we get ${s^{\|}}_{1}\models\varphi_{\textit{pre}}$ . From the definition of a composition-invariant pair (Definition 4) we get that ${s^{\|}}_{1}\models\mathit{Inv}$ (initiation). Since $\mathit{Inv}$ is in $\mathcal{L}_{\mathcal{P}}$ we get from $(*)$ that also $\hat{s}_{1}\models\mathit{Inv}(\mathcal{B})$ . For $\hat{s}_{2}$ , the next state in the abstract trace, it also holds that $\hat{s}_{2}\models\mathit{Inv}(\mathcal{B})$ : since $(\hat{s}_{1},\hat{s}_{2})\in\hat{R}$ , we know that there exist some ${s^{\|}}_{a}\in\gamma(\hat{s}_{1})$ and ${s^{\|}}_{b}\in\gamma(\hat{s}_{2})$ such that $({s^{\|}}_{a},{s^{\|}}_{b})\in{R^{f}}$ , using $(*)$ we get that ${s^{\|}}_{a}\models Inv$ , the consecution of $\mathit{Inv}$ implies ${s^{\|}}_{b}\models\mathit{Inv}$ and from $(*)$ we get $\hat{s}_{2}\models\mathit{Inv}(\mathcal{B})$ . By induction over the length of the abstract trace we get that $\hat{s}\models\mathit{Inv}(\mathcal{B})$ . We now turn to show that conditions S1–S3 hold. First, the safety of $\mathit{Inv}$ for ${T^{f}}$ together with adequacy of $\mathcal{P}$ and $(*)$ imply that $\mathit{Inv}(\mathcal{B})\Rightarrow\big{(}(\bigwedge_{j=1}^{k}F^{j}(\mathcal{B}))\rightarrow\textit{post}(\mathcal{B})\big{)}$ , and since all the reachable states of $A_{\mathcal{P}}({T^{f}})$ satisfy $\mathit{Inv}(\mathcal{B})$ , S1 follows. Similarly, the covering requirement of $f$ together with the property that $C_{M}$ is in $\mathcal{L}_{\mathcal{P}}$ for every $M$ and together with $(*)$ imply S2. Finally, S3 is implied directly from the fairness of $f$ (Definition 4).

( $\Leftarrow$ ) Assume that for $T$ , $(\textit{pre},\textit{post})$ , $\mathcal{P}$ and some composition function $f$ as described, conditions S1–S3 hold. Condition S1 ensures that $A_{\mathcal{P}}({T^{f}})$ satisfies the safety property $(\varphi_{\textit{pre}}(\mathcal{B}),\varphi_{\textit{post}}(\mathcal{B}))$ , when we augment $A_{\mathcal{P}}({T^{f}})$ with a set of terminal states given by the formula $\bigwedge_{i=1}^{k}\varphi_{F^{i}}(\mathcal{B})$ . Hence, there exists an inductive invariant $\mathit{Inv}$ over $\mathcal{B}$ for $A_{\mathcal{P}}({T^{f}})$ and $(\varphi_{\textit{pre}}(\mathcal{B}),\varphi_{\textit{post}}(\mathcal{B}))$ . Furthermore, condition S2 ensures that there exists such $\mathit{Inv}$ for which $\mathit{Inv}\Rightarrow\bigvee_{M}C_{M}(\mathcal{B})$ (for example, such $\mathit{Inv}$ may be obtained by conjoining the inductive invariant ensured by S1 with another inductive invariant that establishes S2). To conclude the proof we show that ( $f,\mathit{Inv}(\mathcal{P})$ ) is a composition-invariant pair for $T$ and $(\textit{pre},\textit{post})$ , as defined in Definition 4. First, initiation and safety of $\mathit{Inv}$ with respect to $A_{\mathcal{P}}({T^{f}})$ and $(\varphi_{\textit{pre}}(\mathcal{B}),\varphi_{\textit{post}}(\mathcal{B}))$ , imply initiation and safety (respectively) of $\mathit{Inv}(\mathcal{P})$ with respect to $T$ and $(\varphi_{\textit{pre}},\varphi_{\textit{post}})$ due to $(*)$ and adequacy of $\mathcal{P}$ . As for consecution of $\mathit{Inv}(\mathcal{P})$ : for a pair of states ${s^{\|}}_{1},{s^{\|}}_{2}$ in ${T^{f}}$ such that $({s^{\|}}_{1},{s^{\|}}_{2})\in{R^{f}}$ , if ${s^{\|}}_{1}\in\gamma(\hat{s}_{1})$ and ${s^{\|}}_{2}\in\gamma(\hat{s}_{2})$ , then $(\hat{s}_{1},\hat{s}_{2})\in\hat{R}$ . Therefore, if ${s^{\|}}_{1}\models\mathit{Inv}(\mathcal{P})$ then $\hat{s}_{1}\models{\mathit{Inv}}$ (according to $(*)$ ), and from consecution of ${\mathit{Inv}}$ in $A_{\mathcal{P}}({T^{f}})$ also $\hat{s}_{2}\models{\mathit{Inv}}$ , and from $(*)$ we get ${s^{\|}}_{2}\models\mathit{Inv}(\mathcal{P})$ and conclude the consecution of $\mathit{Inv}(\mathcal{P})$ in ${T^{f}}$ . Similarly, for covering of $f$ : recall that $\mathit{Inv}\Rightarrow\bigvee_{M}C_{M}(\mathcal{B})$ , hence by $(*)$ , $\mathit{Inv}(\mathcal{P})\Rightarrow\bigvee_{M}{C_{M}}$ , i.e., $f$ covers the states satisfying $\mathit{Inv}(\mathcal{P})$ . Finally, the fairness of $f$ follows from S3. ∎

Algorithm 1 starts from the lock-step self composition function (Algorithm 1), which is fair444Any fair self composition can be chosen as the initial one; we chose lock-step since it is a good starting point in many applications., and constructs the next candidate $f$ such that condition S3 in Lemma 4 always holds (see discussion of Modify_SC). Thus, condition S3 need not be checked explicitly.

Algorithm 1 checks whether conditions S1 and S2 hold for a given candidate composition function $f$ by calling Abs_Reach (Algorithm 1) – both checks are performed via a (non-)reachability check in $A_{\mathcal{P}}({T^{f}})$ , checking whether a state violating $(\bigwedge_{i=1}^{k}\varphi_{F^{i}}(\mathcal{B}))\rightarrow\varphi_{\textit{post}}(\mathcal{B})$ or $\bigvee_{M}C_{M}(\mathcal{B})$ is reachable from $\varphi_{\textit{pre}}(\mathcal{B})$ . Algorithm 1 maintains the abstract states that are not in $\bigvee_{M}C_{M}(\mathcal{B})$ by the formula Unreach defined over $\mathcal{B}$ , which is initialized to false (as the lock-step composition function is defined for every state) and is updated in each iteration of Algorithm 1 to include the abstract states violating $\bigvee_{M}C_{M}(\mathcal{B})$ . If no abstract state violating S1 or S2 is reachable, i.e., the conditions hold, then Abs_Reach returns the (potentially overapproximated) set of reachable abstract states, represented by a formula $\mathit{Inv}$ over $\mathcal{B}$ . In this case, by Lemma 4, $(f,\mathit{Inv}(\mathcal{P}))$ is a composition-invariant pair (Algorithm 1). Otherwise, an abstract counterexample trace is obtained. (We can of course apply bounded model checking to check if the counterexample is real; we omit this check as our focus is on the case where the system is safe.)

Remark 3

In practice, we do not construct $A_{\mathcal{P}}({T^{f}})$ explicitly. Instead, we use the implicit predicate abstraction approach [6].

Eliminating self composition candidates based on abstract counterexamples.

An abstract counterexample to conditions S1 or S2 indicates that the candidate composition function $f$ has no corresponding $\mathit{Inv}$ . Violation of S1 can only be resolved by changing $f$ such that the abstract trace is no longer feasible. Violation of S2 may, in principle, also be resolved by extending the definition of $f$ such that it is defined for all the abstract states in the counterexample trace.

However, to prevent the need to explore both options, our algorithm maintains the following invariant for every candidate self composition function $f$ that it constructs:

Claim

Every abstract state that is not in $\bigvee_{M}C_{M}(\mathcal{B})$ is not reachable w.r.t. the abstract composed program of any composition function that is part of a composition-invariant pair for $T$ and $(\textit{pre},\textit{post})$ .

This property clearly holds for the lock-step composition function, which the algorithm starts with, since for this composition, $\bigvee_{M}C_{M}(\mathcal{B})\equiv\textit{true}$ . As we explain in Corollary 2, it continues to hold throughout the algorithm.

As a result of this property, whenever a candidate composition function $f$ does not satisfy condition S1 or S2, it is never the case that $\bigvee_{M}C_{M}(\mathcal{B})$ needs to be extended to allow the abstract states in $\mathit{cex}$ to be reachable. Instead, the abstract counterexample obtained in violation of the conditions needs to be eliminated by modifying $f$ .

Let $\mathit{cex}=\hat{s}_{1},\ldots,\hat{s}_{m+1}$ be an abstract counterexample of $A_{\mathcal{P}}({T^{f}})$ such that $\hat{s}_{1}\models\varphi_{\textit{pre}}(\mathcal{B})$ and $\hat{s}_{m+1}\models(\bigwedge_{i=1}^{k}\varphi_{F^{i}}(\mathcal{B}))\wedge\neg\varphi_{\textit{post}}(\mathcal{B})$ (violating S1) or $\hat{s}_{m+1}\models\textit{Unreach}$ (violating S2). Any self composition $f^{\prime}$ that agrees with $f$ on the states in $\gamma(\hat{s}_{i})$ for every $\hat{s}_{i}$ that appears in $\mathit{cex}$ has the same transitions in ${R^{f}}$ and, hence, the same transitions in $\hat{R}$ . It, therefore, exhibits the same abstract counterexample in $A_{\mathcal{P}}({T^{f^{\prime}}})$ . Hence, it violates S1 or S2 and is not part of any composition-invariant pair.

Notation

Recall that $f$ is defined via conditions $C_{M}\in\mathcal{L}_{\mathcal{P}}$ . This ensures that for every abstract state $\hat{s}$ , $f$ is defined in the same way for all the states in $\gamma(\hat{s})$ . We denote the value of $f$ on the states in $\gamma(\hat{s})$ by $f(\hat{s})$ (in particular, $f(\hat{s})$ may be undefined). We get that $f(\hat{s})=M$ if and only if $\hat{s}\models C_{M}(\mathcal{B})$ .

Using this notation, to eliminate the abstract counterexample $\mathit{cex}$ , one needs to eliminate at least one of the transitions in $\mathit{cex}$ by changing the definition of $f(\hat{s}_{i})$ for some $1\leq i\leq m$ . For a new candidate function $f^{\prime}$ this may be encoded by the disjunctive constraint $\bigvee_{i=1}^{m}f^{\prime}(\hat{s}_{i})\neq f(\hat{s}_{i})$ . However, we observe that a stronger requirement may be derived from $\mathit{cex}$ based on the following lemma:

Lemma 5

Let $f$ be a self composition function and $\mathit{cex}=\hat{s}_{1},\ldots,\hat{s}_{m+1}$ a counterexample trace in $A_{\mathcal{P}}({T^{f}})$ such that $\hat{s}_{1}\models\varphi_{\textit{pre}}(\mathcal{B})$ but $\hat{s}_{m+1}\models(\bigwedge_{i=1}^{k}\varphi_{F^{i}}(\mathcal{B}))\wedge\neg\varphi_{\textit{post}}(\mathcal{B})$ or $\hat{s}_{m+1}\models\textit{Unreach}$ . Then for any self composition function $f^{\prime}$ such that $f^{\prime}(\hat{s}_{m})=f(\hat{s}_{m})$ , if $\hat{s}_{m}$ is reachable in $A_{\mathcal{P}}({T^{f^{\prime}}})$ from $\varphi_{\textit{pre}}(\mathcal{B})$ , then a counterexample trace to S1 or S2 exists.

Proof

Suppose that $\hat{s}_{m}$ is reachable in $A_{\mathcal{P}}({T^{f^{\prime}}})$ from $\varphi_{\textit{pre}}(\mathcal{B})$ . Then there exists a trace $\hat{s}^{\prime}_{1},\ldots,\hat{s}^{\prime}_{m}$ in $A_{\mathcal{P}}({T^{f^{\prime}}})$ such that $\hat{s}^{\prime}_{1}\models\varphi_{\textit{pre}}(\mathcal{B})$ and $\hat{s}^{\prime}_{m}=\hat{s}_{m}$ . Since $f^{\prime}(\hat{s}_{m})=f(\hat{s}_{m})$ , the outgoing transitions of $\hat{s}_{m}$ are the same in both $A_{\mathcal{P}}({T^{f}})$ and $A_{\mathcal{P}}({T^{f^{\prime}}})$ . In particular, the transition $(\hat{s}_{m},\hat{s}_{m+1})$ from $A_{\mathcal{P}}({T^{f}})$ also exists in $A_{\mathcal{P}}({T^{f^{\prime}}})$ . Therefore, $\mathit{cex}^{\prime}=\hat{s}^{\prime}_{1},\ldots,\hat{s}^{\prime}_{m},\hat{s}_{m+1}$ is a trace to $\hat{s}_{m+1}$ in $A_{\mathcal{P}}({T^{f^{\prime}}})$ . If $\hat{s}_{m+1}\models(\bigwedge_{i=1}^{k}\varphi_{F^{i}}(\mathcal{B}))\wedge\neg\varphi_{\textit{post}}(\mathcal{B})$ , then $\mathit{cex}^{\prime}$ is a counterexample to S1 in $A_{\mathcal{P}}({T^{f^{\prime}}})$ as well. Consider the case where $\hat{s}_{m+1}\models\textit{Unreach}$ . By the construction of Unreach, this indicates that $\hat{s}_{m+1}$ has an outgoing abstract trace that leads to violation of S1 or S2 with every non-starving self composition function, and in particular in $A_{\mathcal{P}}({T^{f^{\prime}}})$ . ∎

Corollary 1

If there exists a composition-invariant pair $(f^{\prime},\mathit{Inv}^{\prime})$ , then there is also one where $f^{\prime}(\hat{s}_{m})\neq f(\hat{s}_{m})$ .

Proof

If $f^{\prime}(\hat{s}_{m})=f(\hat{s}_{m})$ , then by Lemma 5, $\hat{s}_{m}$ is necessarily unreachable in $A_{\mathcal{P}}({T^{f^{\prime}}})$ from $\varphi_{\textit{pre}}(\mathcal{B})$ . Therefore, if we change $f^{\prime}(\hat{s}_{m})$ , all the requirements of Lemma 4 will still hold. If no alternative value that admits the fairness requirement exists, then $f^{\prime}(\hat{s}_{m})$ can remain undefined. ∎

Therefore, we require that in the next self composition candidates the abstract state $\hat{s}_{m}$ must not be mapped to its current value in $f$ , i.e., $f^{\prime}(\hat{s}_{m})\neq M$ , where $f(\hat{s}_{m})=M$ 555If the conditions $\{C_{M}\}_{M}$ defining $f$ may overlap, we consider the condition $C_{M}$ by which the transition from $\hat{s}_{m}$ to $\hat{s}_{m+1}$ was defined..

Algorithm 1 accumulates these constraints in the set $E$ (Algorithm 1). Formally, the constraint $(\hat{s},M)\in E$ asserts that $C_{M}^{\prime}$ must imply $\neg(\bigwedge_{\hat{s}(b_{p})=1}{p}\wedge\bigwedge_{\hat{s}(b_{p})=0}\neg{p})$ , and hence $f^{\prime}(\hat{s})\neq M$ .

Identifying abstract states that must be unreachable.

A new candidate self composition is constructed such that it satisfies all the constraints in $E$ (thus ensuring that no abstract counterexample will re-appear). In the construction, we make sure to satisfy S3 (fairness). Therefore, for every abstract state $\hat{s}$ , we choose a value $f^{\prime}(\hat{s})$ that satisfies the constraints in $E$ and is non-starving: a value $M$ is starving for $\hat{s}$ if $\hat{s}\models\bigvee_{j=1}^{k}\neg\varphi_{F^{j}}(\mathcal{B})$ but $\hat{s}\not\models\bigvee_{j\in M}\neg\varphi_{F^{j}}(\mathcal{B})$ , i.e., some of the copies have not terminated in $\hat{s}$ but none of the non-terminating copies is scheduled. (Due to adequacy, a value $M$ is starving for $\hat{s}$ if and only if it is starving for every ${s^{\|}}\in\gamma(\hat{s})$ .)

If for some abstract state $\hat{s}$ , all the non-starving values have already been excluded (i.e., $(\hat{s},M)\in E$ for every non-starving $M$ ), we conclude that there is no $f^{\prime}$ such that $\hat{s}$ is reachable in $A_{\mathcal{P}}({T^{f^{\prime}}})$ and $f^{\prime}$ is part of a composition-invariant pair:

Lemma 6

Let $\hat{s}\in\hat{S}$ be an abstract state such that for every $\emptyset\neq M\subseteq\{1..k\}$ either $M$ is starving for $\hat{s}$ or $(\hat{s},M)\in E$ . Then, for every $f^{\prime}$ that satisfies S3, if $A_{\mathcal{P}}({T^{f^{\prime}}})$ satisfies S1 and S2, then $\hat{s}$ is unreachable in $A_{\mathcal{P}}({T^{f^{\prime}}})$ .

Proof

If $f^{\prime}$ satisfies S3 and $A_{\mathcal{P}}({T^{f^{\prime}}})$ satisfies S1 and S2, then according to Lemma 4 $f^{\prime}$ is a part of some composition-invariant pair $(f^{\prime},\mathit{Inv})$ for $T$ . Furthermore, as shown in the proof of Lemma 4, every (abstract) state that is reachable from $\varphi_{\textit{pre}}(\mathcal{B})$ in $A_{\mathcal{P}}({T^{f^{\prime}}})$ satisfies $\mathit{Inv}(\mathcal{B})$ . Assume to the contrary that $\hat{s}$ is reachable in $A_{\mathcal{P}}({T^{f^{\prime}}})$ . Then $\hat{s}\models\mathit{Inv}(\mathcal{B})$ . According to Definition 4, $f^{\prime}$ must be defined for $\hat{s}$ , thus $f^{\prime}(\hat{s})=M^{\prime}$ for some $\emptyset\neq M^{\prime}\subseteq\{1\ldots k\}$ . Since $f^{\prime}$ is fair (satisfies S3) it must be the case that $(\hat{s},M^{\prime})\in E$ . According to the algorithm, at some iteration there was a composition $f^{\prime\prime}$ with $f^{\prime\prime}(\hat{s})=M^{\prime}$ that caused adding $(\hat{s},M^{\prime})$ to $E$ , i.e., there was a counterexample to S1 or S2 in $A_{\mathcal{P}}({T^{f^{\prime\prime}}})$ in the form of a trace to $\hat{s}$ . Then Lemma 5 implies that there is also a counterexample to S1 or S2 in $A_{\mathcal{P}}({T^{f^{\prime}}})$ because $f^{\prime}(\hat{s})=f^{\prime\prime}(\hat{s})=M^{\prime}$ . This contradicts the assumption that $A_{\mathcal{P}}({T^{f^{\prime}}})$ satisfies S1 and S2. ∎

Corollary 2

If there exists a composition-invariant pair $(f^{\prime},\mathit{Inv}^{\prime})$ , then $\hat{s}$ is unreachable in $A_{\mathcal{P}}({T^{f^{\prime}}})$ .

This is because no matter how the self composition function $f^{\prime}$ would be defined, $\hat{s}$ is guaranteed to have an outgoing abstract counterexample trace in $A_{\mathcal{P}}({T^{f^{\prime}}})$ .

We, therefore, turn $f^{\prime}(\hat{s})$ to be undefined. As a result, condition S2 of Lemma 4 requires that $\hat{s}$ will be unreachable in $A_{\mathcal{P}}({T^{f^{\prime}}})$ . In Algorithm 1, this is enforced by adding $\hat{s}$ to Unreach (Algorithm 1).

Every abstract state $\hat{s}$ that is added to Unreach is a strengthening of the safety property by an additional constraint that needs to be obeyed in any composition-invariant pair, where obtaining a composition-invariant pair is the target of the algorithm. This makes our algorithm property directed.

If an abstract state that satisfies $\varphi_{\textit{pre}}(\mathcal{B})$ is added to Unreach, then Algorithm 1 determines that no solution exists (Algorithm 1). Otherwise, it generates a new constraint for $E$ based on the abstract state preceding $\hat{s}$ in the abstract counterexample (Algorithm 1).

Constructing the next candidate self composition function.

Given the set of constraints in $E$ and the formula Unreach, Modify_SC (Algorithm 1) generates the next candidate composition function by (i) taking a constraint $(\hat{s},M)$ such that $\hat{s}\not\models\textit{Unreach}$ (typically the one that was added last), (ii) selecting a non-starving value $M_{\text{new}}$ for $\hat{s}$ (such a value must exist, otherwise $\hat{s}$ would have been added to Unreach), and (iii) updating the conditions defining $f^{\prime}$ as follows:

[TABLE]

The conditions of other values remain as before. This definition is facilitated by the fact that the same set of predicates is used both for defining $f^{\prime}$ and for defining the abstract states $\hat{s}\in\hat{S}$ (by which $\mathit{Inv}$ is obtained). Note that in practice we do not explicitly turn $f^{\prime}$ to be undefined for $\gamma(\textit{Unreach})$ . However, these definitions are ignored. The definition ensures that $f^{\prime}$ is non-starving (satisfying condition S3) and that no two conditions $C^{\prime}_{M_{1}}\neq C^{\prime}_{M_{2}}$ overlap. While the latter is not required, it also does not restrict the generality of the approach (since the language we consider is closed under Boolean operations).

Theorem 4.1

Let $T$ be a transition system, $(\textit{pre},\textit{post})$ a $k$ -safety property and $\mathcal{P}$ a set of predicates over ${\mathcal{V}^{\|k}}$ . If Algorithm 1 returns “ no solution” then there is no composition-invariant pair for $T$ and $(\textit{pre},\textit{post})$ in $\mathcal{L}_{\mathcal{P}}$ . Otherwise, $(f,\mathit{Inv}(\mathcal{P}))$ returned by Algorithm 1 is a composition-invariant pair in $\mathcal{L}_{\mathcal{P}}$ , and thus $T\models^{k}(\textit{pre},\textit{post})$ .

Proof

Algorithm 1 returns “ no solution” when $\textit{Unreach}\wedge\varphi_{\textit{pre}}(\mathcal{B})$ is satisfiable. This means that there is an abstract state $\hat{s}$ that satisfies $\varphi_{\textit{pre}}(\mathcal{B})$ but also satisfies Unreach. By the construction of Unreach, this means that $\hat{s}$ must be unreachable from $\varphi_{\textit{pre}}(\mathcal{B})$ in any $A_{\mathcal{P}}({T^{f^{\prime}}})$ such that $(f^{\prime},\mathit{Inv}^{\prime})$ a composition-invariant pair in $\mathcal{L}_{\mathcal{P}}$ (see Corollary 2). Hence, no such $(f^{\prime},\mathit{Inv}^{\prime})$ exists. Conversely, Algorithm 1 returns $(f,\mathit{Inv}(\mathcal{P}))$ when all the conditions listed in Lemma 4 are met, thus $(f,\mathit{Inv}(\mathcal{P}))$ is a composition-invariant pair. ∎

Complexity.

Each iteration of Algorithm 1 adds at least one constraint to $E$ , excluding a potential value for $f$ over some abstract state $\hat{s}$ . An excluded values is never re-used. Hence, the number of iterations is at most the number of abstract states, $2^{|\mathcal{P}|}$ , multiplied by the number of potential values for each abstract state, $n=2^{k}$ . Altogether, the number of iterations is at most $O(2^{|\mathcal{P}|}\cdot 2^{k})$ . Each iteration makes one call to Abs_Reach which checks reachability via predicate abstraction, hence, assuming that satisfiability checks in the original logic are at most exponential, its complexity is $2^{O(|\mathcal{P}|)}$ . Therefore, the overall complexity of the algorithm is $2^{O(|\mathcal{P}|)+k}$ . Typically, $k$ is a small constant, hence the complexity is dominated by $2^{O(|\mathcal{P}|)}$ .

5 Evaluation and Conclusion

Implementation.

We implemented Pdsc (Algorithm 1) in Python on top of Z3 [25]. Its input is a transition system encoded by Constrained Horn Clauses (CHC) in SMT2 format, a $k$ -safety property and a set of predicates. The abstraction is implicitly encoded using the approach of [6], and is parameterized by a composition function that is modified in each iteration. For reachability checks (Abs_Reach) we use Spacer [22], which supports LRA and arrays. For the set of predicates used by Pdsc, we implemented an automatic procedure that mines these predicates from the CHC. Additional predicates may be added manually.

Experiments.

To evaluate Pdsc, we compare it to Synonym [26], the current state of the art in $k$ -safety verification.

To show the effectiveness of Pdsc, we consider examples that require a nontrivial composition (these examples are detailed in Appendix 0.A). We emphasize that the motivation for these example is originated in real-life scenarios. For example, Figure 1 follows a pattern of constant-time execution. The results of these experiments are summarized in Figure 3. Pdsc is able to find the right composition function and prove all of the examples, while Synonym cannot verify any of them. We emphasize that for these examples, lock-step composition is not sufficient. However, Pdsc infers a composition that depends on the programs’ state (variable values), rather than just program locations.

Next we consider Java programs from [26, 29], which we manually converted to C, and then converted to CHC using SeaHorn [19]. For all but 3 examples, only 2 types of predicates, which we mined automatically, were sufficient for verification: (i) relational predicates derived from the pre- and post-conditions, and (ii) for simple loops that have an index variable (e.g., for iterating over an array), an equality predicate between the copies of the indices. These predicates were sufficient since we used a large-step encoding of the transition relation, hence the abstraction via predicates takes effect only at cut-points. For the remaining 3 examples, we manually added 2–4 predicates. With the exception of 1 example where a timeout of 10 seconds was reached, all examples were solved with a lock-step composition function. Yet, we include them to show that on examples with simple compositions Pdsc performs similarly to Synonym. This can be seen in Figure 3.

5.0.1 Conclusion and Future Work.

This work formulates the problem of inferring a self composition function together with an inductive invariant for the composed program, thus capturing the interplay between the self composition and the difficulty of verifying the resulting composed program. To address this problem we present Pdsc– an algorithm for inferring a semantic self composition, directed at verifying the composed program with a given language of predicates. We show that Pdsc manages to find nontrivial self compositions that are beyond reach of existing tools. In future work, we are interested in further improving Pdsc by extending it with additional (possibly lazy) predicate discovery abilities. This has the potential to both improve performance and verify properties over wider range of programs. Additionally, we consider exploring further generalization techniques during the inference procedure.

Acknowledgements

This publication is part of a project that has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No [759102-SVIS]). The research was partially supported by Len Blavatnik and the Blavatnik Family foundation, the Blavatnik Interdisciplinary Cyber Research Center, Tel Aviv University, the Israel Science Foundation (ISF) under grant No. 1810/18 and the United States-Israel Binational Science Foundation (BSF) grant No. 2016260.

Appendix 0.A Benchmarks Used in the Evaluation

In this section, we elaborate on the examples from Figure 3.

0.A.1 DoubleSquareNI

Figure 4 depicts a non-interference problem (a 2-safety problem) where $x$ is the low input and $h$ is the high input. Taint analysis methods cannot prove non-interference for this program, and no proof exists when the product program presented in [14] is applied (see Appendix 0.B). However, using the language of predicates presented (also in Figure 4), our algorithm infers a composition-invariant pair that proves non-interference for the program.

0.A.2 HalfSquareNI

In the program presented at Figure 5 we consider the non-interference property, with pre-condition $low_{1}=low_{2}$ (low input) and post-condition $y_{1}=y_{2}$ (non-interference). The high input $h$ has no constraints as implied from the pre-condition. Intuitively, the difficulty of proving non-interference for this program is the need to "skip" the statement between the two loops in order to keep the outputs of the copies equal in every composed state along the execution. The suggested composition aligns the computations such that they proceed simultaniously only when both are at either loops, which makes $i_{1}=i_{2}\wedge y_{1}=y_{2}$ true for every state of the self composed program.

0.A.3 ArrayIntMod

The example in Figure 6 is a comparator based on a Java comparator from the evaluation comparator programs. The comparator was modified to have loop that might perform two steps in a single iteration. The 2-safety property to prove for the comparator is anti-symmetry, i.e. the pre-condition is $o1_{1}=o2_{2}\wedge o1_{2}=o2_{1}$ and the post-condition is $sign(compare(o1_{1},o2_{1}))=-sign(compare(o1_{2},o2_{2}))$ . The figure also describes a composition that aligns the loops according to the value of $flag$ . This yields a composed program that has an invariant that proves the desired property in the predicates language from Figure 6.

0.A.4 SquaresSum

For the program described in Figure 7 we consider the monotonicity property - a 2-safety property with pre-condition $[a_{2},b_{2}]\subset[a_{1},b_{1}]$ and post-condition $c_{2}<c_{1}$ . Considering a composition that aligns the computations to start together and run simultaniously, it is easy to see that $c_{1}<c_{2}$ for unbounded number iterations. However, in Figure 7 we see a composition that eases the task of finding a proof by scheduling the copies such that $c_{2}<c_{1}$ holds from the first iteration of copy 2 and to the end of both computations.

0.A.5 ArrayInsert

The program with a detailed explanation of its proof using a composition-invariant pair are presented in Section 1.

Appendix 0.B Demonstrating the Interplay Between Self Composition and Inductive Invariants

We illustrate the effect of the self composition function on the difficulty of verifying the obtained composed program, as well as the need for a semantic self composition function on the simple example depicted in Figure 4. The program receives as input an integer $x$ and a secret bit $h$ , and outputs $y=2x^{2}$ . The desired specification is that the output does not depend on $h$ , which is indeed the case. Formally, this is a $2$ -safety property with pre-condition $x_{1}=x_{2}$ and post-condition $y_{1}=y_{2}$ , requiring that in any two terminating executions that start with the same values for $x$ , the final value of $y$ is the same.

As explained earlier, any fair self composition function can be soundly used to reduce the $2$ -safety problem to an ordinary safety problem. This is because the variables of the two copies of the program are completely disjoint, making the states completely independent. Therefore, the output of each copy does not depend on the actual interleaving of the two copies. As a result, if some interleaving (a fair self composition function) violates the postcondition, all of them will. That is, the actual interleaving does not affect the soundness of the reduction to traditional safety. However, when we turn to verifying the safety of the composed program by finding an inductive invariant in a given language, the specific self composition function used plays a significant role. For example, consider a composition that “synchronizes” the two copies in each control structure (e.g. [14]). Such a composition runs the two copies of the loop in parallel until one copy exits the loop, and then continues to run the other copy. The self composed program obtained by this composition is displayed in Figure 8.

We show that for this composition, there exists no inductive invariant in quantifier free linear integer arithmetic (QFLIA) that is sufficient for establishing safety of the composed program.

Proof

If we examine the set $R$ of the reachable states of the composed program at the exit point we see that it includes (for every natural $n$ ):

[TABLE]

(We omit the second copy of $x$ since both copies are equal in all the reachable states – a fact that is also expressible in QFLIA – and similarly, we omit $h_{1}$ and $h_{2}$ .)

Clearly, an inductive invariant must be satisfied by all of these states, since all of them are reachable. However, we show that any QFLIA formula that is satisfied by all of these states is also satisfied by a state that reaches a bad state (i.e., a state where $y_{1}\neq y_{2}$ ), thus if it is safe, it necessarily violates the consecution requirement, which means it is not an inductive invariant.

Let $\varphi=\varphi_{1}\lor\ldots\varphi_{r}$ be a QFLIA formula, written in DNF form, where each $\varphi_{i}$ is a cube (conjunction of literals). Define $R_{1},\ldots,R_{r}\subseteq R$ such that $R_{i}=\{s\in R\mid s\models\varphi_{i}\}$ includes all states in $R$ that satisfy $\varphi_{i}$ . We show that there exists $i$ such that $\varphi_{i}$ is also satisfied by a state that reaches a bad state.

$R$ includes infinitely many “points” of the form $n,n^{2},n,n^{2},0$ where $n$ is an even number. Therefore, since there are finitely many $R_{i}$ ’s that together cover $R$ , there exists $i$ such that $R_{i}$ also includes infinitely many such points. Take two such points $(n,n^{2},n,n^{2},0)$ and $(m,m^{2},m,m^{2},0)$ in $R_{i}$ where $n\neq m$ . Then $(1/2(n+m),1/2(n^{2}+m^{2}),1/2(n+m),1/2(n^{2}+m^{2}),0)$ is a state (all values are integers) in the convex hull of $R_{i}$ . In particular, it must satisfy $\varphi_{i}$ ( $\varphi_{i}$ is a cube in LIA that is satisfied by all states in $R_{i}$ , hence it is also satisfied by all states in its convex hull).

However, when executing the while loop starting from the state $x\mapsto 1/2(n+m),y_{1}\mapsto 1/2(n^{2}+m^{2}),z_{1}\mapsto 1/2(n+m),y_{2}\mapsto 1/2(n^{2}+m^{2}),z_{2}\mapsto 0$ , the outcome is the state $x\mapsto 1/2(n+m),y_{1}\mapsto 1/2(n^{2}+m^{2})+1/4(n+m)^{2},z_{1}\mapsto 0,y_{2}\mapsto 1/2(n^{2}+m^{2}),z_{2}\mapsto 0$ , where $y_{1}\neq y_{2}$ , hence safety is violated.

This means that $\varphi$ is not an inductive invariant strong enough to establish safety of the composed program, in contradiction. ∎

In contrast, with the composition function inferred by Pdsc (see Figure 4), the composed program has an inductive invariant in QFLIA.

Bibliography33

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Antonopoulos, T., Gazzillo, P., Hicks, M., Koskinen, E., Terauchi, T., Wei, S.: Decomposition instead of self-composition for proving the absence of timing channels. In: Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2017, Barcelona, Spain, June 18-23, 2017. pp. 362–375 (2017). https://doi.org/10.1145/3062341.3062378, https://doi.org/10.1145/3062341.3062378 · doi ↗
2[2] Barthe, G., Crespo, J.M., Kunz, C.: Relational verification using product programs. In: FM 2011: Formal Methods - 17th International Symposium on Formal Methods, Limerick, Ireland, June 20-24, 2011. Proceedings. pp. 200–214 (2011). https://doi.org/10.1007/978-3-642-21437-0_17, https://doi.org/10.1007/978-3-642-21437-0_17 · doi ↗
3[3] Barthe, G., Crespo, J.M., Kunz, C.: Beyond 2-safety: Asymmetric product programs for relational program verification. In: Logical Foundations of Computer Science, International Symposium, LFCS 2013, San Diego, CA, USA, January 6-8, 2013. Proceedings. pp. 29–43 (2013). https://doi.org/10.1007/978-3-642-35722-0_3, https://doi.org/10.1007/978-3-642-35722-0_3 · doi ↗
4[4] Barthe, G., D’Argenio, P.R., Rezk, T.: Secure information flow by self-composition. In: 17th IEEE Computer Security Foundations Workshop, (CSFW-17 2004), 28-30 June 2004, Pacific Grove, CA, USA. pp. 100–114 (2004). https://doi.org/10.1109/CSFW.2004.17, http://doi.ieeecomputersociety.org/10.1109/CSFW.2004.17
5[5] Bradley, A.R.: Sat-based model checking without unrolling. In: Verification, Model Checking, and Abstract Interpretation - 12th International Conference, VMCAI 2011, Austin, TX, USA, January 23-25, 2011. Proceedings. pp. 70–87 (2011). https://doi.org/10.1007/978-3-642-18275-4_7, https://doi.org/10.1007/978-3-642-18275-4_7 · doi ↗
6[6] Cimatti, A., Griggio, A., Mover, S., Tonetta, S.: IC 3 modulo theories via implicit predicate abstraction. In: Tools and Algorithms for the Construction and Analysis of Systems - 20th International Conference, TACAS 2014, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2014, Grenoble, France, April 5-13, 2014. Proceedings. pp. 46–61 (2014). https://doi.org/10.1007/978-3-642-54862-8_4, https://doi.org/10.1007/978-3-642-54862-8_4 · doi ↗
7[7] Clarke, E.M., Henzinger, T.A., Veith, H., Bloem, R. (eds.): Handbook of Model Checking. Springer (2018)
8[8] Clarkson, M.R., Schneider, F.B.: Hyperproperties. In: Proceedings of the 21st IEEE Computer Security Foundations Symposium, CSF 2008, Pittsburgh, Pennsylvania, USA, 23-25 June 2008. pp. 51–65 (2008). https://doi.org/10.1109/CSF.2008.7, https://doi.org/10.1109/CSF.2008.7 · doi ↗

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Property Directed Self Composition

Abstract

1 Introduction

Example 1

1.0.1 Related work.

Equivalence checking and regression verification.

2 Preliminaries

Safety and inductive invariants.

kkk-safety.

3 Inferring Self Compositions for Restricted Languages of Inductive Invariants

3.1 Semantic Self Composition

Definition 1 (Semantic Self Composition Function)

Definition 2 (Composed Program)

Example 2

Example 3

Definition 3 (Fairness)

Lemma 1

Proof (sketch)

Remark 1

3.2 The Problem of Inferring Self Composition with Inductive Invariant

Definition 4

Lemma 2

Proof (sketch)

Definition 5 (Inference in L\mathcal{L}L)

Lemma 3

Proof

Definition 6 (Inference using predicate abstraction)

Remark 2

Theorem 3.1

4 Algorithm for Inferring Composition-Invariant Pairs

Finding an inductive invariant for a given composition function using predicate abstraction.

Notation

Definition 7

Lemma 4

Proof

Remark 3

Eliminating self composition candidates based on abstract counterexamples.

Claim

Notation

Lemma 5

Proof

Corollary 1

Proof

Identifying abstract states that must be unreachable.

Lemma 6

Proof

Corollary 2

Constructing the next candidate self composition function.

Theorem 4.1

Proof

Complexity.

5 Evaluation and Conclusion

Implementation.

Experiments.

5.0.1 Conclusion and Future Work.

Acknowledgements

Appendix 0.A Benchmarks Used in the Evaluation

0.A.1 DoubleSquareNI

0.A.2 HalfSquareNI

0.A.3 ArrayIntMod

0.A.4 SquaresSum

0.A.5 ArrayInsert

Appendix 0.B Demonstrating the Interplay Between Self Composition and Inductive Invariants

Proof

$k$ -safety.

Definition 5 (Inference in $\mathcal{L}$ )