Relational Verification via Invariant-Guided Synchronization

Qi Zhou (Georgia Institute of Technology); David Heath (Georgia; Institute of Technology); William Harris (Galois Inc.)

arXiv:1907.03997·cs.PL·July 10, 2019·HCVS/PERR@ETAPS

Relational Verification via Invariant-Guided Synchronization

Qi Zhou (Georgia Institute of Technology), David Heath (Georgia, Institute of Technology), William Harris (Galois Inc.)

PDF

TL;DR

This paper introduces PEQUOD, a novel verification tool that simultaneously searches for synchronization points and synthesizes relational invariants, enabling the verification of complex relational properties in JVM bytecode that previous methods cannot handle.

Contribution

It proposes a new approach that integrates invariant synthesis with synchronization point search, improving the verification of relational properties over multiple program executions.

Findings

01

PEQUOD successfully verifies properties of JVM bytecode that existing tools cannot handle.

02

The approach effectively guides synchronization point selection using synthesized invariants.

03

PEQUOD outperforms current techniques on challenging verification problems.

Abstract

Relational properties describe relationships that hold over multiple executions of one or more programs, such as functional equivalence. Conventional approaches for automatically verifying such properties typically rely on syntax-based, heuristic strategies for finding synchronization points among the input programs. These synchronization points are then annotated with appropriate relational invariants to complete the proof. However, when suboptimal synchronization points are chosen the required invariants can be complicated or even inexpressible in the target theory. In this work, we propose a novel approach to verifying relational properties. This approach searches for synchronization points and synthesizes relational invariants simultaneously. Specifically, the approach uses synthesized invariants as a guide for finding proper synchronization points that lead to a complete proof.…

Equations14

\forall x, y . x > y \Rightarrow f (x) > f (y)

\forall x, y . x > y \Rightarrow f (x) > f (y)

{\mbox n_{_} 0 = \mbox n_{_} 1} \mbox t r i 0 \times \mbox t r i 1 {ret_{_} 0 = ret_{_} 1}

{\mbox n_{_} 0 = \mbox n_{_} 1} \mbox t r i 0 \times \mbox t r i 1 {ret_{_} 0 = ret_{_} 1}

{\mbox n_{_} 0 = \mbox x_{_} 1} \mbox t r i 0_{_} 0 \times \mbox t r i 1 A ux_{_} 0 {ret_{_} 0 + \mbox a cc_{_} 1 = ret_{_} 1}

{\mbox n_{_} 0 = \mbox x_{_} 1} \mbox t r i 0_{_} 0 \times \mbox t r i 1 A ux_{_} 0 {ret_{_} 0 + \mbox a cc_{_} 1 = ret_{_} 1}

Com ::= skip ∣ x := e ∣ if e then Com else Com fi ∣ x := N (x) ∣ Com; Com

Com ::= skip ∣ x := e ∣ if e then Com else Com fi ∣ x := N (x) ∣ Com; Com

LTable : N \to (p, Com, o)

LTable : N \to (p, Com, o)

\inference [\textsc E - P r o d]

\inference [\textsc E - P r o d]

T, Γ ⊢ {P} PCom {Q}

T, Γ ⊢ {P} PCom {Q}

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Relational Verification via

Invariant-Guided Synchronization

Qi Zhou Georgia Institute of Technology [email protected] Georgia Institute of TechnologyGalois Inc.

David Heath Georgia Institute of Technology [email protected] Galois Inc.

William Harris Galois Inc. [email protected]

Abstract

Relational properties describe relationships that hold over multiple executions of one or more programs, such as functional equivalence. Conventional approaches for automatically verifying such properties typically rely on syntax-based, heuristic strategies for finding synchronization points among the input programs. These synchronization points are then annotated with appropriate relational invariants to complete the proof. However, when suboptimal synchronization points are chosen the required invariants can be complicated or even inexpressible in the target theory.

In this work, we propose a novel approach to verifying relational properties. This approach searches for synchronization points and synthesizes relational invariants simultaneously. Specifically, the approach uses synthesized invariants as a guide for finding proper synchronization points that lead to a complete proof. We implemented our approach as a tool named Pequod, which targets Java Virtual Machine (JVM) bytecode. We evaluated Pequod by using it to solve verification challenges drawn from the from the research literature and by verifying properties of student-submitted solutions to online challenge problems. The results show that Pequod solve verification problems that cannot be addressed by current techniques.

1 Introduction

Relational properties characterize multiple executions of one or more programs [22]. One example of such a property is that a particular program $f$ over integers is monotonic; i.e.,

[TABLE]

This property is relational because it is defined over two arbitrary inputs of $f$ (named $x$ and $y$ , respectively). Relational properties can express important problems, such as the equivalence of two programs or the information-flow security of a single program. Therefore, a tool that could automatically verify relational properties would be highly valuable both to program developers and users.

Substantial effort has been directed toward constructing relational verifiers, which attempt to prove that given programs satisfy a given relational property. One effective approach attempts to synthesize proofs in Cartesian Hoare Logic [22, 20], which extends Hoare Logic from individual programs to tuples of programs. Such approaches consider the execution of input programs simultaneously, which enables the construction of relational invariants that describe relationships across the programs. By examining the programs together, a verifier can potentially find simpler invariants than if it had attempted to summarize each program and then compared the summaries.

However, synthesizing proofs in such a system adds a critical new dimension to a verifier’s design. In particular, a verifier must choose pairs of control locations to relate, in addition to synthesizing sufficiently strong invariants that relate the programs’ data when they reach related locations. Such pairs of locations are referred to as synchronization points [20]. Intuitively, certain synchronization points can be annotated with simple relational invariants to form proofs because the variables at the related points maintain similar data. Conversely, non-ideal synchronization points may relate locations for which sufficient invariants can be expressed using only complex formulas, or even formulas expressible only in complex theories and logics.

Selecting synchronization points is particularly difficult when a verifier must prove a relational property over programs with loops or recursive procedures. Specifically, finding an ideal set of synchronization points may require the verifier to consider different numbers of iterations for different loops. As an example, a suitable synchronization strategy might be to model two iterations of a loop in one program for every one iteration of a loop in a second program. This highlights that no straightforward solution, such as modeling each loop exactly once, is effective in general.

For the reasons given above, it is clear that finding effective strategies for selecting synchronization points is an important and difficult problem. The effectiveness of a selecting synchronization points depends on the data relationships in the programs and the property. However, existing approaches [22, 20] have relied on syntax-driven, heuristic strategies that first find synchronization points and then attempt to annotate the points with relational invariants to complete the proof. The effectiveness of these strategies usually heavily depends on the programs to which they are applied.

In this paper, we propose a general, automatic technique for synthesizing proofs of relational properties. The key feature of our approach is that it searches the spaces of potential synchronization points and their relational invariants simultaneously. Our approach iteratively operates on a sequence of bounded under-approximations of input programs; in each bounded under-approximation, each recursive procedure call is only allowed to execute a bounded number of times. In each iteration, our approach attempts to generate a set of proofs that the bounded programs satisfy the given relational property under all possible relevant choices of synchronization points. Our approach synthesizes this set of proofs by solving a single system of Constrained Horn Clauses. Then, our approach attempts to find some proof of the correctness of the bounded under-approximations that can be generalized to form a proof for the original, unbounded programs. If a valid proof is found, then the verifier has validated the given relational property. Otherwise, our approach continues by considering larger under-approximations of the input programs.

We have implemented our approach as an executable tool, named Pequod. Pequod targets Java Virtual Machine (JVM) bytecode and has been evaluated on $33$ benchmarks, consisting of verification challenge problems and student solutions submitted to online coding platforms. Our evaluation indicates that, in a significant set of practical cases, Pequod can efficiently verify relational properties beyond the scope of existing techniques.

The rest of this paper is organized as follows. § 2 provides an informal overview of our proof system and of Pequod, by example. § 3 reviews the technical foundations for our work, and § 4 presents the proof system and Pequod in detail. § 5 presents an empirical evaluation of Pequod. § 6 concludes by comparing our contribution to related work.

2 Overview

In this section, we illustrate our approach by example. We first introduce a pair of programs that compute the same function. We formalize a relational property that these two programs are equivalent as an extended Hoare Logic Triple. Next, we describe how Pequod finds a proof of this triple in § 2.1.

Fig. 1 contains two programs, named tri0 and tri1, that each compute the $n$ th triangle number: i.e., the sum of all natural numbers up to and including $n$ . tri0 computes this value by direct recursion while tri1 makes use of an auxilliary procedure, tri1Aux, which maintains an accumulator. Despite these differences, these two programs compute the same function.

To verify this equivalence, we can construct a relational property that shows that given equal parameters $n$ , tri0 and tri1 compute the same output. This property can be represented as a Hoare Logic Triple over a product command:

[TABLE]

The product command $\mbox{{tri0}}\times\mbox{{tri1}}$ can be understood as the command that executes tri0 and tri1 simultaneously. A detailed explanation of product commands is given in § 4.1. We annotate variables with subscripts [math] or $1$ to indicate which program they model. Variables $\mathsf{ret}_{\_}0$ and $\mathsf{ret}_{\_}1$ are used to model the output of the respective programs. We refer to the proposition $\mbox{{n}}_{\_}0=\mbox{{n}}_{\_}1$ as the pre-condition, and the proposition $\mathsf{ret}_{\_}0=\mathsf{ret}_{\_}1$ as the post-condition. The triple above states that if $\mbox{{n}}_{\_}0=\mbox{{n}}_{\_}1$ and both tri0 and tri1 are executed, then both programs will return the same value. A proof of this triple would prove the equivalence of the two programs.

2.1 Proving Equivalence Automatically

Pequod proves this example Hoare Triple in three steps. First, Pequod constructs bounded versions of tri0 and tri1 that respect an upper bound on the allowed number of recursive procedure calls. In this example, we set this upper bound to three.

Fig. 2 lists the bounded programs $\mbox{{tri0}}_{\_}0$ and $\mbox{{tri1}}_{\_}0$ . These two program are bounded because each has finitely many execution paths. These two programs under-approximate tri0 and tri1 respectively, because their execution paths are a subset of the execution paths in the original programs. $\mbox{{tri0}}_{\_}2$ and $\mbox{{tri1Aux}}_{\_}2$ are incomplete because they do not have else branches in their conditional statement. These branches are assumed to be unreachable in the current under-approximation.

Second, Pequod tries to synthesize a set of proofs for a corresponding Hoare Triple over these bounded programs: $\{\mbox{{n}}_{\_}0=\mbox{{n}}_{\_}1\}~{}\mbox{{tri0}}_{\_}0\times\mbox{{tri1}}_{\_}0~{}\{\mathsf{ret}_{\_}0=\mathsf{ret}_{\_}1\}$ . The key idea is that Pequod will find proofs for all possible orders of modeling the execution of $\mbox{{tri0}}_{\_}0$ and $\mbox{{tri1}}_{\_}0$ . The resulting proofs represent all possible choices of synchronization points of the bounded programs. For example, in a subset of the bounded proofs, Pequod arrives at the following intermediate goal:

[TABLE]

Pequod continues the proof by either stepping in $\mbox{{tri0}}_{\_}0$ or by stepping in $\mbox{{tri1}}_{\_}0$ . ‘Stepping’ through the product program corresponds to applying particular proof rules that result in new Hoare Triple goals. In other words, Pequod proves the triple over $\mbox{{tri0}}_{\_}0\times\mbox{{tri1Aux}}_{\_}0$ by proving a series of Hoare Triples, which we refer to as a proof path, with proper invariants.

The fact that $\mbox{{tri0}}_{\_}0$ and $\mbox{{tri1}}_{\_}0$ are bounded commands implies that there are finitely many possible proof paths that can be used to prove this bounded goal. Fig. 3 depicts all possible proof paths for a partial proof of this Hoare Triple. The depicted proof is partial because we omit proof goals corresponding to the programs’ false branches, for clarity. The upper-leftmost node is the Hoare Triple that Pequod must prove. Every proof path over the true branches eventually reaches the the product command $\mbox{{tri0}}_{\_}2\times\mbox{{tri1Aux}}_{\_}2$ (since $\mbox{{tri0}}_{\_}2$ and $\mbox{{tri1Aux}}_{\_}2$ are under-approximations that allow no further recursion). Pequod encodes all possible proof paths into a single set of Constrained Horn Clauses (CHCs), and uses known techniques for solving this system to synthesize proper invariants. In short, Pequod uses CHCs to synthesize a set of proofs for all possible proof paths of the bounded program. The method for converting an input Hoare Triple into a CHC system is described in § 4.2.2.

Third, Pequod attempts to prove the original, unbounded problem by searching for a bounded proof that can be generalized. A suitable approach for solving this example is to consider the synchronization point $\mbox{{tri0}}\times\mbox{{tri1Aux}}$ , since the data at these two points is highly related. Pequod finds this synchronization point automatically and annotates it with appropriate invariants by searching the set of proofs for the bounded programs, as depicted in Fig. 3. Pequod begins its search from the top-left node: $\mbox{{tri0}}_{\_}0\times\mbox{{tri1Aux}}_{\_}0$ . In order to find a generalizable proof path, Pequod must choose a proof path that passes through the node $\mbox{{tri0}}_{\_}1\times\mbox{{tri1Aux}}_{\_}1$ . Note that the pre-condition and post-condition of two Hoare Triples over these two product commands are the same. Furthermore, the two product commands represent the same unbounded command, $\mbox{{tri0}}\times\mbox{{tri1Aux}}$ , from the original program. Thus, Pequod can use the Hoare Triple over the first command as a hypothesis to prove the second. The details of the proof rule that allows this reasoning is given in § 4.1. By choosing this proof path, Pequod has also decided to synchronize the two procedures by executing each recursive procedure once. Therefore, Pequod finds the proof and synchronization points simultaneously.

Not all proof paths can be generalized to form proofs for the original programs. In fact, any proof path that does not pass through the node $\mbox{{tri0}}_{\_}1\times\mbox{{tri1Aux}}_{\_}1$ will fail to generalize because (1) the relational invariants along the path are inappropriate for use as a hypothesis further down the path and (2) the node $\mbox{{tri0}}_{\_}2\times\mbox{{tri1Aux}}_{\_}2$ cannot be used in the proof because it incompletely models the original programs. When Pequod cannot find a valid proof for the original programs, Pequod increases the bounding number and starts over. The algorithm that generalizes bounded proofs is described in § 4.2.3.

3 Background

In this section, we present the technical background for our approach. In § 3.1, we formalize the imperative target language. In § 3.2, we introduce Constrained Horn Clause (CHC) systems as a class of logic-programming problems.

3.1 Target Language

In this section, we give the formal definition of the target language: an imperative language with conditionals and (possibly recursive) procedures. In order to define a program, we first give the definition of a command. In the following, we use the metavariable $x$ to represent program variables, $e$ to represent program expressions, and $N$ to represent procedure names. The space of commands Com is defined inductively, as follows:

[TABLE]

That is, a command is either a skip command, an assignment, a conditional, a procedure call, or a sequence of other commands.

Bibliography25

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1]
2[2] Alejandro Aguirre, Gilles Barthe, Marco Gaboardi, Deepak Garg & Pierre-Yves Strub (2017): A Relational Logic for Higher-Order Programs . In: ICFP , 10.1145/3110265 . · doi ↗
3[3] Aws Albarghouthi & Kenneth L. Mc Millan (2013): Beautiful Interpolants . In: CAV , 10.1007/978-3-642-39799-822 . · doi ↗
4[4] David A. Naumann Anindya Banerjee & Mohammad Nikouei (2016): Relational Logic with Framing and Hypotheses . In: FSTTCS , 10.4230/LIP Ics.FSTTCS.2016.11 . · doi ↗
5[5] John D. Backes, Suzette Person, Neha Rungta & Oksana Tkachuk (2013): Regression Verification Using Impact Summaries . In: SPIN , 10.1007/978-3-642-39176-77 . · doi ↗
6[6] Gilles Barthe, Juan Manuel Crespo & César Kunz (2011): Relational Verification Using Product Programs . In: FM , 10.1007/978-3-642-21437-017 . · doi ↗
7[7] Gilles Barthe, Juan Manuel Crespo & César Kunz (2013): Beyond 2-Safety: Asymmetric Product Programs for Relational Program Verification . In: LNCS , 10.1007/978-3-642-35722-03 . · doi ↗
8[8] Gilles Barthe, Juan Manuel Crespo & César Kunz (2016): Product Programs and Relational Program Logics . In: JLAMP , 10.1016/j.jlamp.2016.05.004 . · doi ↗