Fully Composable and Adequate Verified Compilation with Direct   Refinements between Open Modules (Technical Report)

Ling Zhang; Yuting Wang; Jinhua Wu; J\'er\'emie Koenig; Zhong Shao

arXiv:2302.12990·cs.PL·November 21, 2023

Fully Composable and Adequate Verified Compilation with Direct Refinements between Open Modules (Technical Report)

Ling Zhang, Yuting Wang, Jinhua Wu, J\'er\'emie Koenig, Zhong Shao

PDF

TL;DR

This paper introduces a new verified compilation approach for open modules that avoids assumptions about compiler internals, enabling more practical and compositional correctness proofs for complex modular programs.

Contribution

It proposes a novel Kripke memory relation-based semantic interface that supports direct, compositional refinements in verified compilation of heterogeneous modules.

Findings

01

Applied to the full CompCert chain with Clight, demonstrating compositional correctness.

02

Enabled end-to-end verification of mutually recursive modules.

03

Reduced verification complexity for realistic compiler optimizations.

Abstract

Verified compilation of open modules (i.e., modules whose functionality depends on other modules) provides a foundation for end-to-end verification of modular programs ubiquitous in contemporary software. However, despite intensive investigation in this topic for decades, the proposed approaches are still difficult to use in practice as they rely on assumptions about the internal working of compilers which make it difficult for external users to apply the verification results. We propose an approach to verified compositional compilation without such assumptions in the setting of verifying compilation of heterogeneous modules written in first-order languages supporting global memory and pointers. Our approach is based on the memory model of CompCert and a new discovery that a Kripke relation with a notion of memory protection can serve as a uniform and composable semantic interface for…

Tables2

Table 1. Table 1 . Significant Passes of CompCert

Languages/Passes	Outgoing $↠$ Incoming
Clight	$𝒞 ↠ 𝒞$
Self-Sim	$ro \cdot c_{injp} ↠ ro \cdot c_{injp}$
SimplLocals	$c_{injp} ↠ c_{inj}$
Csharpminor	$𝒞 ↠ 𝒞$
Cminorgen	$c_{injp} ↠ c_{inj}$
Cminor	$𝒞 ↠ 𝒞$
Selection	$wt \cdot c_{ext} ↠ wt \cdot c_{ext}$
CminorSel	$𝒞 ↠ 𝒞$
RTLgen	$c_{ext} ↠ c_{ext}$
RTL	$𝒞 ↠ 𝒞$
Self-Sim	$c_{inj} ↠ c_{inj}$
Tailcall	$c_{ext} ↠ c_{ext}$
Inlining	$c_{injp} ↠ c_{inj}$
Self-Sim	$c_{injp} ↠ c_{injp}$

Table 2. Table 2 . Comparison between Work on VCC Based on CompCert

	CompComp	CompCertM	CompCertO	CompCertX	This Work
Direct Refinement	No	No	No	No	Yes
Vertical Composition	Yes	RUSC	Trivial	CAL	Yes
Horizontal Composition	Yes	RUSC	Yes	CAL	Yes
Adequacy	No	Yes	Yes	Yes	Yes
End-to-end Verification	No	Yes	Unknown	CAL	Yes
Free-form Heterogeneity	Yes	Yes	Yes	No	Yes
Behavior Refinement	No	Yes	No	Yes	No

Equations56

L_{1} ≼ L_{2} \Rightarrow L_{2} ≼ L_{3} \Rightarrow L_{1} ≼ L_{3}

L_{1} ≼ L_{2} \Rightarrow L_{2} ≼ L_{3} \Rightarrow L_{1} ≼ L_{3}

L_{1} ≼ L_{1}^{'} \Rightarrow L_{2} ≼ L_{2}^{'} \Rightarrow L_{1} \oplus L_{2} ≼ L_{1}^{'} \oplus L_{2}^{'}

[[M_{1} + M_{2}]] ≼ [[M_{1}]] \oplus [[M_{2}]]

[[client.s + server.s]] ≼_{ac} L_{CS}

[[client.s + server.s]] ≼_{ac} L_{CS}

L_{1} ⩽_{R_{12} ↠ S_{12}} L_{2} \Rightarrow L_{2} ⩽_{R_{23} ↠ S_{23}} L_{3} \Rightarrow L_{1} ⩽_{R_{12} \cdot R_{23} ↠ S_{12} \cdot S_{23}} L_{3} .

L_{1} ⩽_{R_{12} ↠ S_{12}} L_{2} \Rightarrow L_{2} ⩽_{R_{23} ↠ S_{23}} L_{3} \Rightarrow L_{1} ⩽_{R_{12} \cdot R_{23} ↠ S_{12} \cdot S_{23}} L_{3} .

\forall (M : Clight) (M^{'} : Asm), CompCert (M) = M^{'} \Rightarrow [[M]] ⩽_{C} [[M^{'}]] .

\forall (M : Clight) (M^{'} : Asm), CompCert (M) = M^{'} \Rightarrow [[M]] ⩽_{C} [[M^{'}]] .

m_{1} ↪_{m}^{j_{23} \cdot j_{12}} m_{3}

m_{1} ↪_{m}^{j_{23} \cdot j_{12}} m_{3}

\exists v_{2}, m_{1} [b_{1}, o_{1}] ↪_{v}^{j_{12}} v_{2} \land v_{2} ↪_{v}^{j_{23}} m_{3} [b_{3}, o_{3}] .

\exists v_{2}, m_{1} [b_{1}, o_{1}] ↪_{v}^{j_{12}} v_{2} \land v_{2} ↪_{v}^{j_{23}} m_{3} [b_{3}, o_{3}] .

\exists b_{2}^{'} o_{2}^{'}, j_{12} (b_{1}^{'}) = ⌊(b_{2}^{'}, o_{2}^{'} - o_{1}^{'})⌋ \land j_{23} (b_{2}^{'}) = ⌊(b_{3}^{'}, o_{3}^{'} - o_{2}^{'})⌋

\exists b_{2}^{'} o_{2}^{'}, j_{12} (b_{1}^{'}) = ⌊(b_{2}^{'}, o_{2}^{'} - o_{1}^{'})⌋ \land j_{23} (b_{2}^{'}) = ⌊(b_{3}^{'}, o_{3}^{'} - o_{2}^{'})⌋

(b_{3}, o_{3}) \in out-of-reach (j_{23} \cdot j_{12}, m_{1})

(b_{3}, o_{3}) \in out-of-reach (j_{23} \cdot j_{12}, m_{1})

pub-tgt-mem (j_{12}, m_{1}) \subseteq perm_{max} (m_{2}, NA)

pub-tgt-mem (j_{12}, m_{1}) \subseteq perm_{max} (m_{2}, NA)

(1) j_{12} \subseteq j_{12}^{'} (2) j_{23} \subseteq j_{23}^{'} (3) inject-sep (j_{12}, j_{12}^{'}, m_{1}, m_{2}) (4) inject-sep (j_{23}, j_{23}^{'}, m_{2}, m_{3})

(1) j_{12} \subseteq j_{12}^{'} (2) j_{23} \subseteq j_{23}^{'} (3) inject-sep (j_{12}, j_{12}^{'}, m_{1}, m_{2}) (4) inject-sep (j_{23}, j_{23}^{'}, m_{2}, m_{3})

(1) out-of-reach (j_{12}, m_{1}) \subseteq unchanged-on (m_{2}, m_{2}^{'}) (2) unmapped (j_{23}) \subseteq unchanged-on (m_{2}, m_{2}^{'})

(1) out-of-reach (j_{12}, m_{1}) \subseteq unchanged-on (m_{2}, m_{2}^{'}) (2) unmapped (j_{23}) \subseteq unchanged-on (m_{2}, m_{2}^{'})

max-perm-dec (m_{2}, m_{2}^{'})

max-perm-dec (m_{2}, m_{2}^{'})

ro-unchanged (m_{2}, m_{2}^{'})

ro-unchanged (m_{2}, m_{2}^{'})

mem-acc (m_{2}, m_{2}^{'})

mem-acc (m_{2}, m_{2}^{'})

(b_{2}, o_{2}) \in perm_{k} (m_{2}^{'}, p) \lor (b_{2}, o_{2}) \neq \in perm_{max} (m_{2}^{'}, NA) .

(b_{2}, o_{2}) \in perm_{k} (m_{2}^{'}, p) \lor (b_{2}, o_{2}) \neq \in perm_{max} (m_{2}^{'}, NA) .

(b_{2}, o_{2}) \in perm_{k} (m_{2}^{'}, p) \lor (b_{2}, o_{2}) \in / perm_{max} (m_{2}^{'}, NA)

(b_{2}, o_{2}) \in perm_{k} (m_{2}^{'}, p) \lor (b_{2}, o_{2}) \in / perm_{max} (m_{2}^{'}, NA)

unmapped (j_{12}) \subseteq unmapped (j_{13})

unmapped (j_{12}) \subseteq unmapped (j_{13})

out-of-reach (j_{23}, m_{2}) \subseteq out-of-reach (j_{23} \cdot j_{12}, m_{1})

out-of-reach (j_{23}, m_{2}) \subseteq out-of-reach (j_{23} \cdot j_{12}, m_{1})

\forall b_{2} o_{2}, j_{23} (b_{2}) = ⌊(b_{3}, o_{3})⌋ \Rightarrow (b_{2}, o_{2}) \in / perm_{max} (m_{2}, NA)

\forall b_{2} o_{2}, j_{23} (b_{2}) = ⌊(b_{3}, o_{3})⌋ \Rightarrow (b_{2}, o_{2}) \in / perm_{max} (m_{2}, NA)

unmapped (j_{13}) \subseteq unchanged-on (m_{1}, m_{1}^{'})

unmapped (j_{13}) \subseteq unchanged-on (m_{1}, m_{1}^{'})

out-of-reach (j_{13}, m_{1}) \subseteq unchanged-on (m_{3}, m_{3}^{'})

out-of-reach (j_{13}, m_{1}) \subseteq unchanged-on (m_{3}, m_{3}^{'})

inject-sep (j_{13}, j_{13}^{'}, m_{1}, m_{3})

inject-sep (j_{13}, j_{13}^{'}, m_{1}, m_{3})

j_{12} (b_{1}) = \emptyset and \exists b_{2}, j_{12}^{'} (b_{1}) = ⌊(b_{2}, o_{2} - o_{1})⌋ \land j_{23}^{'} (b_{2}) = ⌊(b_{3}, o_{3} - o_{2})⌋

j_{12} (b_{1}) = \emptyset and \exists b_{2}, j_{12}^{'} (b_{1}) = ⌊(b_{2}, o_{2} - o_{1})⌋ \land j_{23}^{'} (b_{2}) = ⌊(b_{3}, o_{3} - o_{2})⌋

L_{C A} ≽_{R_{e}} a.spec \oplus b.spec ≽_{R_{e}} [[M_{C}]] \oplus [[M_{A}]]

L_{C A} ≽_{R_{e}} a.spec \oplus b.spec ≽_{R_{e}} [[M_{C}]] \oplus [[M_{A}]]

Beh (L_{C A}) \supseteq Beh ([[CompCert (M_{C})]] + [[M_{A}]])

Beh (L_{C A}) \supseteq Beh ([[CompCert (M_{C})]] + [[M_{A}]])

L_{C A} ≽_{R_{e} \cup R_{c}} [[CompCert (M_{C})]] \oplus [[M_{A}]]

L_{C A} ≽_{R_{e} \cup R_{c}} [[CompCert (M_{C})]] \oplus [[M_{A}]]

L_{S} ⩽_{c_{injp} + c_{inj} + c_{ext} + c_{vainj} + c_{vaext}} L_{S}

L_{S} ⩽_{c_{injp} + c_{inj} + c_{ext} + c_{vainj} + c_{vaext}} L_{S}

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Fully Composable and Adequate Verified Compilation with Direct Refinements between Open Modules

Ling Zhang

0000-0001-7190-6983

John Hopcroft Center for Computer Science, School of Electronic Information and Electrical EngineeringShanghai Jiao Tong UniversityChina

[email protected]

,

Yuting Wang

0000-0003-3990-2418

John Hopcroft Center for Computer Science, School of Electronic Information and Electrical EngineeringShanghai Jiao Tong UniversityChina

[email protected]

,

Jinhua Wu

0000-0001-5812-053X

John Hopcroft Center for Computer Science, School of Electronic Information and Electrical EngineeringShanghai Jiao Tong UniversityChina

[email protected]

,

Jérémie Koenig

0000-0002-3168-5925

Yale UniversityUSA

[email protected]

and

Zhong Shao

0000-0001-8184-7649

Yale UniversityUSA

[email protected]

(2024; 2023-07-11; 2023-11-07)

Abstract.

Verified compilation of open modules (i.e., modules whose functionality depends on other modules) provides a foundation for end-to-end verification of modular programs ubiquitous in contemporary software. However, despite intensive investigation in this topic for decades, the proposed approaches are still difficult to use in practice as they rely on assumptions about the internal working of compilers which make it difficult for external users to apply the verification results. We propose an approach to verified compositional compilation without such assumptions in the setting of verifying compilation of heterogeneous modules written in first-order languages supporting global memory and pointers. Our approach is based on the memory model of CompCert and a new discovery that a Kripke relation with a notion of memory protection can serve as a uniform and composable semantic interface for the compiler passes. By absorbing the rely-guarantee conditions on memory evolution for all compiler passes into this Kripke Memory Relation and by piggybacking requirements on compiler optimizations onto it, we get compositional correctness theorems for realistic optimizing compilers as refinements that directly relate native semantics of open modules and that are ignorant of intermediate compilation processes. Such direct refinements support all the compositionality and adequacy properties essential for verified compilation of open modules. We have applied this approach to the full compilation chain of CompCert with its Clight source language and demonstrated that our compiler correctness theorem is open to composition and intuitive to use with reduced verification complexity through end-to-end verification of non-trivial heterogeneous modules that may freely invoke each other (e.g., mutually recursively).

Verified Compositional Compilation, Direct Refinements, Kripke Relations

††copyright: rightsretained††doi: 10.1145/3632914††journalyear: 2024††submissionid: popl24main-p380-p††journal: PACMPL††journalvolume: 8††journalnumber: POPL††article: 72††publicationmonth: 1††ccs: Software and its engineering Formal software verification††ccs: Software and its engineering Compilers††ccs: Theory of computation Program verification

1. Introduction

Verified compilation ensures that behaviors of source programs are faithfully transported to target code, a property desirable for end-to-end verification of software whose development involves compilation. As software is usually composed of modules independently developed and compiled, researchers have developed a wide range of techniques for verified compositional compilation or VCC that support modules invoking each other (i.e., open), being written in different languages (i.e., heterogeneous) and transformed by different compilers (Patterson and Ahmed, 2019).

We are concerned with VCC for first-order languages with global memory states and support of pointers (e.g., see Stewart et al. (2015); Song et al. (2020); Koenig and Shao (2021); Jiang et al. (2019); Wang et al. (2019); Gu et al. (2015)). As it stands now, the proposed approaches are inherently limited at supporting open modules (e.g. libraries) as they either deviate from the native semantics of modules or expose the semantics of intermediate representations for compilation, resulting in correctness theorems that are difficult to work with for external users. In this paper, we investigate an approach that eliminates these limitations while retaining the full benefits of VCC, i.e., obtaining correctness of compiling open modules that is fully composable, adequate, and extensional.

1.1. Full Compositionality and Adequacy in Verified Compilation

Correctness of compiling open modules is usually described as refinement between semantics of source and target modules. We shall write $L$ (possibly with subscripts) to denote semantics of open modules and write $L_{1}\preccurlyeq L_{2}$ to denote that $L_{1}$ is refined by $L_{2}$ . Therefore, the compilation of any module $M_{2}$ into $M_{1}$ is correct iff $[\![M_{1}]\!]\preccurlyeq[\![M_{2}]\!]$ where $[\![M_{i}]\!]$ denotes the semantics of $M_{i}$ .

To support the most general form of VCC, it is critical that the established refinements are fully composable, i.e., both horizontally and vertically composable, and adequate for native semantics:

[TABLE]

The first property states that refinements are transitive. It is essential for composing proofs for multi-pass compilers. The second property guarantees that refinements are preserved by semantic linking (denoted by $\oplus$ ). It is essential for composing correctness of compiling open modules (possibly through different compilers). The last one ensures that, given any modules, their semantic linking coincides with their syntactic linking (denoted by $+$ ). It ensures that linked semantics do not deviate from native semantics and is essential to propagate verified properties to final target programs.

We use the example in Fig. 1 to illustrate the importance of the above properties in VCC where heterogeneous modules are compiled through different compilation chains and linked into a final target module. In this example, a source C module a.c is compiled into an assembly module a.s through a multi-pass optimizing compiler like CompCert: it is first compiled to $\text{a.i}_{1}$ in an intermediate representation (IR) for optimization (e.g., the RTL language of CompCert) and then to $\text{a.i}_{2}$ in another IR for code generation (e.g., the Mach language of CompCert). Finally, it is linked with a library module b.s which is not compiled at all (an extreme case where the compilation chain is empty). The goal is to prove that the semantics of linked target assembly $\text{a.s}+\text{b.s}$ refines the combined source semantics $[\![\text{a.c}]\!]\oplus L_{b}$ where $L_{b}$ is the semantic specification of b.s, i.e., $[\![\text{a.s}+\text{b.s}]\!]\preccurlyeq[\![\text{a.c}]\!]\oplus L_{b}$ . The proof proceeds as follows:

(1)

Prove every pass respects refinement, from which $[\![\text{a.i}_{1}]\!]\preccurlyeq[\![\text{a.c}]\!]$ , $[\![\text{a.i}_{2}]\!]\preccurlyeq[\![\text{a.i}_{1}]\!]$ and $[\![\text{a.s}]\!]\preccurlyeq[\![\text{a.i}_{2}]\!]$ . Moreover, show b.s meets its specification, i.e., $[\![\text{b.s}]\!]\preccurlyeq L_{b}$ ; 2. (2)

By vertically composing the refinement relations for compiling a.c, we get $[\![\text{a.s}]\!]\preccurlyeq[\![\text{a.c}]\!]$ ; 3. (3)

By further horizontally composing with $[\![\text{b.s}]\!]\preccurlyeq L_{b}$ , we get $[\![\text{a.s}]\!]\oplus[\![\text{b.s}]\!]\preccurlyeq[\![\text{a.c}]\!]\oplus L_{b}$ ; 4. (4)

By adequacy for assembly and vertical composition, conclude $[\![\text{a.s}+\text{b.s}]\!]\preccurlyeq[\![\text{a.c}]\!]\oplus L_{b}$ .

1.2. Problems with the Existing Approaches to Refinements

Despite the simplicity of VCC at an intuitive level, full compositionality and adequacy are surprisingly difficult to prove for any non-trivial multi-pass compiler. First and foremost, the formal definitions must take into account the facts that each intermediate representation has different semantics and each pass may imply a different refinement relation. To facilitate the discussion below, we classify different open semantics by language interfaces (or simply interfaces) which formalize their interaction with environments. We write $L:\mathcal{I}$ to denote that $L$ has a language interface $\mathcal{I}$ . For instance, $[\![\text{a.c}]\!]:\mathcal{C}$ denotes that the semantics of a.c has the interface $\mathcal{C}$ which only allows for interaction with environments through function calls and returns in C. Similarly, $[\![\text{a.s}]\!]:\mathcal{A}$ denotes the semantics of a.s where $\mathcal{A}$ only allows for interaction at the assembly level. Note that the interface for a module may not match its native semantics. For example, $[\![\text{a.s}]\!]:\mathcal{C}$ asserts that $[\![\text{a.s}]\!]$ actually converts assembly level calls/returns to C function calls/returns for interacting with C environments (e.g., extracting arguments from registers and memory to form an argument list for C function calls). In this case, $[\![\text{a.s}]\!]$ deviates from the native semantics of a.s. When the interface of $[\![M]\!]$ is not explicitly given, it is implicitly the native interface of $M$ . We write $\preccurlyeq:\mathcal{I}_{1}\Leftrightarrow\mathcal{I}_{2}$ to denote a refinement between two semantics with interfaces ${\mathcal{I}_{1}}$ and ${\mathcal{I}_{2}}$ . For instance, given $\preccurlyeq_{\texttt{ac}}:\mathcal{A}\Leftrightarrow\mathcal{C}$ that relates open semantics at the C and assembly levels, $[\![\text{b.s}]\!]\preccurlyeq_{{\texttt{ac}}}L_{b}$ asserts that ${[\![\text{b.s}]\!]}$ is the native semantics of b.s and is refined by the C level specification $L_{b}$ .

For VCC, it is essential that variance of open semantics and refinements does not impede compositionality and adequacy. The existing approaches achieve this by imposing algebraic structures on refinements. We categorize them by their algebraic structures below, and explain the problems facing them via three well-known extensions of CompCert (Leroy, 2023) (the state-of-the-art verified C compiler) to support VCC, i.e., Compositional CompCert (CompComp) (Stewart et al., 2015), CompCertM (Song et al., 2020) and CompCertO (Koenig and Shao, 2021).

Constant Refinement

An obvious way to account for different semantics in VCC is to force every semantics to use the same language interface $\mathcal{I}$ and a constant refinement $\preccurlyeq_{\mathcal{I}}:\mathcal{I}\Leftrightarrow\mathcal{I}$ . CompComp adopts this “one-type-fits-all” approach by having every language of CompCert to use C function calls/returns for module-level interactions and using a uniform refinement relation $\preccurlyeq_{\mathcal{C}}:\mathcal{C}\Leftrightarrow\mathcal{C}$ known as structured simulation (Stewart et al., 2015). In this case, vertical and horizontal compositionality is established by proving transitivity of $\preccurlyeq_{\mathcal{C}}$ and symmetry of rely-guarantee conditions of $\preccurlyeq_{\mathcal{C}}$ . However, because the C interface is adopted for assembly semantics, adequacy at the target level is lost, making end-to-end compiler correctness not provable as shown in Fig. 2(a).

Sum of Refinements

A more relaxed approach allows users to choose language interfaces for different IRs from a finite collection $\{\mathcal{I}_{1},\ldots,\mathcal{I}_{m}\}$ and refinements for different passes from a finite set $\{\preccurlyeq_{1},\ldots,\preccurlyeq_{n}\}$ relating these interfaces, i.e., $\preccurlyeq_{i}:\mathcal{I}_{1}+\ldots+\mathcal{I}_{m}\Leftrightarrow\mathcal{I}_{1}+\ldots+\mathcal{I}_{m}$ . In essence, a constant refinement is split into a sum of refinements s.t. ${L}\;{\preccurlyeq_{1}+\ldots+\preccurlyeq_{n}}\;{L^{\prime}}$ holds if $L\preccurlyeq_{{}_{i}}L^{\prime}$ for some $1\leq i\leq n$ . Then, every compiler pass can use ${\preccurlyeq_{1}+\ldots+\preccurlyeq_{n}}$ as the uniform refinement relation, which is proven both composable and adequate under certain well-formedness constraints. Fig. 2(b) depicts such an example where semantics have both C and assembly interfaces (e.g., $[\![\text{a.s}]\!]:\mathcal{A}+\mathcal{C}$ ) and the refinement relations $\preccurlyeq_{i}:\mathcal{A}+\mathcal{C}\Leftrightarrow\mathcal{A}+\mathcal{C}(1\leq i\leq 5)$ are tailored for each pass. This is the approach adopted by CompCertM (Song et al., 2020). However, the top-level refinement ${\preccurlyeq_{1}+\ldots+\preccurlyeq_{n}}$ is difficult to use by a third party without introducing complicated dependency on intermediate results of compilation. For example, horizontal composition with ${\preccurlyeq_{1}+\ldots+\preccurlyeq_{n}}$ only works for modules self-related by all the refinements $\preccurlyeq_{i}(1\leq i\leq n)$ . Since ${\preccurlyeq_{i}}$ s are tailored for individual passes, they inevitably depend on the intermediate semantics used in compilation. Such dependency is only exacerbated as new languages, compilers and optimizations are introduced.

Product of Refinements

The previous approach effectively “flattens” the refinements for individual compiler passes into an end-to-end refinement. A different approach adopted by CompCertO (Koenig and Shao, 2021) is to “concatenate” the refinements for individual passes into a chain of refinements by a product operation $(\_\boldsymbol{\cdot}\_)$ such that ${L}\;{\preccurlyeq_{1}\boldsymbol{\cdot}\preccurlyeq_{2}}\;{L^{\prime\prime}}$ if $L\preccurlyeq_{1}L^{\prime}$ and $L^{\prime}\preccurlyeq_{2}L^{\prime\prime}$ for some $L^{\prime}$ . Fig. 2(c) illustrates how it works. Vertical composition is simply the concatenation of refinements. For example, composing refinements for compiling a.c results in ${[\![\text{a.s}]\!]}\;{\preccurlyeq_{3}\boldsymbol{\cdot}\preccurlyeq_{2}\boldsymbol{\cdot}\preccurlyeq_{1}}\;{[\![\text{a.c}]\!]}$ . Adequacy is trivially guaranteed with native interfaces. However, horizontal composition still depends on the intermediate semantics of compilation because of the concatenation. For example, in Fig. 2(c), to horizontally compose with ${[\![\text{a.s}]\!]}\;{\preccurlyeq_{3}\boldsymbol{\cdot}\preccurlyeq_{2}\boldsymbol{\cdot}\preccurlyeq_{1}}\;{[\![\text{a.c}]\!]}$ , it is necessary to show $L_{b}$ refines $[\![\text{b.s}]\!]$ via the same product, i.e., to construct intermediate semantics bridging $\preccurlyeq_{1}$ , $\preccurlyeq_{2}$ and $\preccurlyeq_{3}$ .

Summary

The existing approaches for VCC either lack adequacy because they force non-native language interfaces on semantics for open modules (e.g., CompComp) or lack compositionality that is truly extensional because they depend on intermediate semantics used in compilation (e.g., CompCertM and CompCertO). Such dependency makes their correctness theorems for compiling open modules (e.g., libraries) difficult to further compose with and incurs a high cost in verification.

1.3. Challenges for Direct Refinement of Open Modules

The ideal approach to VCC should produce refinements that directly relate the native semantics of source and target open modules without mentioning any intermediate semantics and support both vertical and horizontal composition. We shall call them direct refinements of open modules. For example, a direct refinement between a.c and a.s could be $\preccurlyeq_{\texttt{ac}}:\mathcal{A}\Leftrightarrow\mathcal{C}$ s.t. $[\![\text{a.s}]\!]\preccurlyeq_{{\texttt{ac}}}[\![\text{a.c}]\!]$ . It relates assembly and C without mentioning intermediate semantics, and could be further horizontally composed with $[\![\text{b.s}]\!]\preccurlyeq_{{\texttt{ac}}}L_{b}$ and vertically composed by adequacy to get $[\![\text{a.s}+\text{b.s}]\!]\preccurlyeq_{\texttt{ac}}[\![\text{a.c}]\!]\oplus L_{b}$ . Note that even the top-level refinement is still open to horizontal and vertical composition, making direct refinements effective for supporting VCC for open modules.

The main challenge in getting direct refinements is tied to their “real” vertical composition, i.e., given any direct refinements ${\preccurlyeq_{1}}$ and ${\preccurlyeq_{2}}$ , how to show $\preccurlyeq_{1}\boldsymbol{\cdot}\preccurlyeq_{2}$ is equivalent to a direct refinement $\preccurlyeq_{3}$ . This is considered very technical and involved (see Song et al. (2020); Neis et al. (2015); Patterson and Ahmed (2019); Hur et al. (2012b)) because of the difficulty in constructing interpolating program states for transitively relating evolving source and target states across external calls of open modules. This problem also manifests in proving transitivity for logical relations where construction of interpolating terms of higher-order types is not in general possible (Ahmed, 2006). In the setting of compiling first-order languages with global memory, all previous work avoids proving real vertical composition of direct refinements. Some produce refinement without adequacy by introducing intrusive changes to semantics to make construction of interpolating states possible. For example, CompComp instruments the semantics of languages with effect annotations to expose internal effects for this purpose. Some essentially restrict vertical composition to closed programs (e.g., CompCertM). Some leave the top-level refinement a combination of refinements that still exposes the intermediate steps of compilation (e.g., CompCertO). Finally, even if the problem of vertical composition was solved, it is not clear if the solution can support realistic optimizing compilers.

1.4. Our Contributions

In this paper, we propose an approach to direct refinements for VCC of imperative programs that addresses all of the above challenges. Our approach is based on the memory model of CompCert which supports first-order states and pointers. We show that in this memory model interpolating states for proving vertical compositionality of refinements can be constructed by exploiting the properties on memory invariants known as memory injections. The solution is based on a new discovery that a Kripke relation with memory protection can serve as a uniform and composable relation for characterizing the evolution of memory states across external calls. With this relation we successfully combined the correctness theorems of CompCert’s passes into a direct refinement between C and assembly modules. We summarize our technical contributions below:

•

We prove that injp—a Kripke Memory Relation with a notion of memory protection—is both uniform (i.e., memory transformation in every compiler pass respects this relation) and composable (i.e., transitive modulo an equivalence relation). The critical observation making this proof possible is that interpolating memory states can be constructed by exploiting memory protection inherent to memory injections and the functional nature of injections.

•

Based on the above observation, we show that a direct refinement from C to assembly can be derived by composing open refinements for all of CompCert’s passes starting from Clight. In particular, we show that compiler passes can use different Kripke relations sufficient for their proofs (which may be weaker than injp) and these relations will later be absorbed into injp via refinements of open semantics. Furthermore, we show that assumptions for compiler optimizations can be formalized as semantic invariants and, when piggybacked onto injp, can be transitively composed. Based on these techniques, we upgrade the proofs in CompCertO to get a direct refinement from C to assembly for the full CompCert, including all of its optimization passes. These experiments show that direct refinements can be obtained without fundamental changes to the verification framework of CompCert.

•

We demonstrate the simplicity and usefulness of direct refinements by applying it to end-to-end verification of several non-trivial examples with heterogeneous modules that mutually invoke each other. In particular, we observe that C level refinements can be absorbed into the direct refinement of CompCert by transitivity of injp. Combining direct refinements with full compositionality and adequacy, we derive end-to-end refinements from high-level source specifications to syntactically linked assembly modules in a straightforward manner.

The above developments are fully formalized in Coq based on the latest CompCertO which is in turn built on top of CompCert v3.10 (see the data-availability statement at the end of the paper for more details). While the formalisation of our approach is tied to CompCert’s block-based memory model (Leroy et al., 2012), and applied to its particular chain of compilation, we present evidence in §7 that variants of injp could be adapted for alternate memory models for first-order languages, and that it may be extended to support new optimizations. Therefore, this work provides a promising direction for further evolving the techniques for VCC.

1.5. Structure of the Paper

Below we first introduce the key ideas supporting this work in §2. We then introduce necessary background and discuss the technical challenges for building direct refinements in §3. We present our technical contributions in §4, §5 and §6. We discuss the generality and limitations of our approach in §7. We discuss evaluation and related work in §8 and finally conclude in §9.

2. Key Ideas

We introduce a running example with heterogeneous modules and callback functions to illustrate the key ideas of our work. This example is representative of mutual dependency between modules that often appears in practice and it shows how free-form invocation between modules can be supported by our approach. As we shall see in §6, our approach also handles more complicated programs with mutually recursive heterogeneity without any problem.

The example is given in Fig. 3. It consists of a client written in C (Fig. 3(a)) and an encryption server hand-written in x86 assembly by using CompCert’s assembly syntax where instruction names begin with P (Fig. 3(b)). For now, let us ignore Fig. 3(c) which illustrates how optimizations work in direct refinements. Users invoke request to initialize an encryption request. It is relayed to the function encrypt in the server with the prototype void encrypt(int i, void (p)(int)) which respects a calling convention placing the first and second arguments in registers RDI and RSI, respectively. The main job of the server is to encrypt i (RDI) by XORing it with an encryption key (stored in the global variable key) and invoke the callback function p (RSI). Finally, the client takes over and stores the encrypted value in the global variable result. The pseudo instruction Pallocframe m n o allocates a stack frame of m bytes and stores its address in register RSP. In this frame, a pointer to the caller’s stack frame is stored at the o-th byte and the return address is stored at the n-th byte. Note that Pallocframe 24 16 0 in encrypt reserves $8$ bytes on the stack from RSP + 8 to RSP + 16 for storing the encrypted value whose address is passed to the callback function p. Pfreeframe m n o frees the frame and restores RSP and the return address RA.

With the running example, our goal is to verify its end-to-end correctness by exploiting the direct refinement $\preccurlyeq_{\texttt{ac}}:\mathcal{A}\Leftrightarrow\mathcal{C}$ derived from CompCert’s compilation chain as shown in Fig. 4. The verification proceeds as follows. First, we establish $[\![\text{client.s}]\!]\preccurlyeq_{\texttt{ac}}[\![\text{client.c}]\!]$ by the correctness of compilation. Then, we prove $[\![\text{server.s}]\!]\preccurlyeq_{\texttt{ac}}L_{\texttt{S}}$ manually by providing a specification ${L_{\texttt{S}}}$ for the server that respects the direct refinement. At the source level, the combined semantics is further refined to a single top-level specification $L_{\texttt{CS}}$ . Finally, the source and target level refinements are absorbed into the direct refinement by vertical composition and adequacy, resulting in a single direct refinement between the top-level specification and the target program:

[TABLE]

The refinements of open modules discussed in our paper are based on forward simulations between small-step operational semantics (often in the form of labeled transition systems or LTS) which have been witnessed in a wide range of verification projects (Stewart et al., 2015; Song et al., 2020; Koenig and Shao, 2021; Jiang et al., 2019; Wang et al., 2019; Gu et al., 2015). Fig. 5(a) depicts a refinement $L_{2}\preccurlyeq L_{1}$ between two open semantics (LTS) $L_{1}$ and $L_{2}$ . The source (target) semantics $L_{1}$ ( $L_{2}$ ) is initialized with a query (i.e., function call) $q_{1}$ ( $q_{2}$ ) and may invoke an external call $q_{1}^{\prime}$ ( $q_{2}^{\prime}$ ) as the execution goes. The execution continues when $q_{1}^{\prime}$ ( $q_{2}^{\prime}$ ) returns with a reply $r_{1}^{\prime}$ ( $r_{2}^{\prime}$ ) and finishes with a reply $r_{1}$ ( $r_{2}$ ). For the refinement to hold, an invariant between the source and target program states must hold throughout the execution which is denoted by the vertical double arrows in Fig. 5(a). Furthermore, this refinement relies on external calls satisfying certain well-behavedness conditions (known as rely-conditions; e.g., external calls do not modify the private memory of callers). In turn, it guarantees the entire source and target execution satisfy some well-behavedness conditions (known as guarantee-conditions, e.g., they do not modify the private memory of their calling environments). The rely-guarantee conditions are essential for horizontal composition: two refinements $L_{1}\preccurlyeq L_{2}$ and $L_{1}^{\prime}\preccurlyeq L_{2}^{\prime}$ with complementary rely-guarantee conditions can be composed into a single refinement $L_{1}\oplus L_{2}\preccurlyeq L_{1}^{\prime}\oplus L_{2}^{\prime}$ . However, vertical composition of such refinements is difficult. A naive vertical composition of two refinements (one between $L_{1}$ and $L_{2}$ and another between $L_{2}$ and $L_{3}$ ) simply concatenates them together like Fig. 5(b), instead of generating a single refinement between $L_{1}$ and $L_{3}$ like Fig. 5(c). 111To simplify the presentation, we often elide the guarantee conditions in figures for simulation. This exposes the intermediate semantics (i.e., $L_{2}$ ) and imposes serious limitations on VCC as discussed in §1. Therefore, to the best of our knowledge, none of the existing approaches fully support the verification outlined in Fig. 4.

To address the above problem, we develop direct refinements with the following distinguishing features: 1) they always relate the semantics of modules at their native interfaces, thereby supporting adequacy; 2) they do not mention the intermediate process of compilation, thereby supporting heterogeneous modules and compilers; 3) they provide direct memory protection for source and target semantics via a Kripke relation, thereby enabling horizontal composition of refinements for heterogeneous modules; 4) most importantly, they are vertically composable. The first three features are manifested in the very definition of direct refinements, which we shall discuss in §2.1 below. We then discuss the vertical composition of direct refinements in §2.2, which relies on the discovery of the uniformity and transitivity of a Kripke relation for memory protection.

2.1. Refinement Supporting Adequacy, Heterogeneity and Horizontal Composition

To illustrate the key ideas, we use the top-level direct refinement $\preccurlyeq_{\text{ac}}$ in Fig. 4 as an example. In the remaining discussions we adopt the block-based memory model of CompCert (Leroy et al., 2012) where a memory state consists of a disjoint set of memory blocks. $\preccurlyeq_{\text{ac}}$ is a forward simulation that directly relates C and assembly modules with their native language interfaces. By the definition of these interfaces (See §3.1), a C query $q_{\mathcal{C}}=v_{f}[\mathit{sg}](\vec{v})@m$ is a function call to $v_{f}$ with signature $\mathit{sg}$ , a list of arguments $\vec{v}$ and a memory state $m$ ; a C reply $r_{\mathcal{C}}=v^{\prime}@m^{\prime}$ carries a return value $v^{\prime}$ and an updated memory state $m^{\prime}$ . An assembly query $q_{\mathcal{A}}=\mathit{rs}@m$ invokes a function with the current register set $\mathit{rs}$ and memory state $m$ . An assembly reply $r_{\mathcal{A}}=\mathit{rs}^{\prime}@m^{\prime}$ returns from a function with the updated registers $\mathit{rs}^{\prime}$ and memory $m^{\prime}$ . By definition, $L_{2}\preccurlyeq_{\text{ac}}L_{1}$ means that $L_{1}$ and $L_{2}$ behave like C and assembly programs at the boundary of modules, respectively. However, there is no restriction on how $L_{1}$ and $L_{2}$ are actually implemented internally, which enables specifications like $L_{\text{S}}$ in Fig. 4.

The rely and guarantee conditions imposed by $\preccurlyeq_{\text{ac}}$ are symmetric and bundled with the simulation invariants at the boundary of modules. They make assumptions about how C and assembly queries should be related at the call sites and provide conclusions about how the replies should be related after the calls return. Given any matching source and target queries $q_{\mathcal{C}}=v_{f}[\mathit{sg}](\vec{v})@m_{1}$ and $q_{\mathcal{A}}=\mathit{rs}@m_{2}$ , it is assumed that

(1)

The memory states are related by an invariant $j$ known as a memory injection function (Leroy et al., 2012), i.e., memory blocks in $m_{1}$ are projected by $j$ into those in $m_{2}$ ; 2. (2)

The function pointer $v_{f}$ is related to the program counter register in $\mathit{rs}$ ; 3. (3)

The source arguments $\vec{v}$ are projected either to registers in $\mathit{rs}$ or to outgoing argument slots in the stack frame RSP in $m_{2}$ according to the C calling convention; 4. (4)

The outgoing arguments on the target stack frame are freeable and not in the image of $j$ .

The first three requirements ensure that C arguments and memory are related to assembly registers and memory according to CompCert’s C calling convention. The last one ensures outgoing arguments are protected, thereby preserving the invariant of open simulation across external calls.

After the function calls return, the source and target queries $r_{\mathcal{C}}=\mathit{res}@m_{1}^{\prime}$ and $r_{\mathcal{A}}=\mathit{rs}^{\prime}@m_{2}^{\prime}$ must satisfy the following requirements:

(1)

The updated memory states $m_{1}^{\prime}$ and $m_{2}^{\prime}$ are related by an updated memory injection $j^{\prime}$ ; 2. (2)

The C-level return value $\mathit{res}$ is related to the value stored in the register for return value; 3. (3)

For any callee-saved register $\mathit{r}$ , $\mathit{rs}^{\prime}(\mathit{r})=\mathit{rs}(\mathit{r})$ ; 4. (4)

The stack pointer register and program counter are restored. 5. (5)

The access to memory during the function call is protected by a Kripke Memory Relation injp such that the private stack data for other function calls are not modified.

The first two requirements ensure that return values and memories are related according to the calling convention. The following two ensure that registers are correctly restored before returning. The last requirement plays a critical role in rely-guarantee reasoning and enables horizontal composition of direct refinements as we shall see soon.

2.1.1. Adequacy and Heterogeneity via Direct Refinement

By definition, $\preccurlyeq_{\text{ac}}$ is basically a formalized C calling convention for CompCert with direct relations between C and assembly operational semantics and with invariants for protecting register values and memory states. Adequacy is automatically guaranteed as syntactic linking coincides with semantics linking at the assembly level. That is, given any assembly modules a.s and b.s, ${[\![\text{a.s}+\text{b.s}]\!]}\preccurlyeq_{\text{id}}{[\![\text{a.s}]\!]\oplus[\![\text{b.s}]\!]}$ .

Moreover, $\preccurlyeq_{\text{ac}}$ does not mention anything about compilation. It works for any heterogeneous module and compilation chain that meet its requirements, even for hand-written assembly. Take the refinement of ${[\![\text{server.s}]\!]}\preccurlyeq_{\text{ac}}{L_{\texttt{S}}}$ in Fig. 4 as an example. The first few steps of the simulation are depicted in Fig. 6, where $L_{\texttt{S}}$ is an LTS hand-written by us and ${[\![\text{server.s}]\!]}$ is derived from the CompCert assembly semantics. Because $L_{\texttt{S}}$ is only required to respect the C interface, we choose a form easy to comprehend where its internal executions are in big steps. Now, suppose the environment calls encrypt with source and target queries initially related by CompCert’s calling convention s.t. $rs(\text{RDI})=i$ and $rs(\text{RSI})=p$ . After the initialization $I_{1}$ and $I_{2}$ , the execution enters internal states related by an invariant $R$ . Then, the target execution takes internal steps $K_{2}$ until reaching an external call. This corresponds to executing lines 5-13 in Fig. 3(b), which allocates the stack frame RSP, performs encryption by storing i XOR key at the address RSP+8, and calls back $p$ with RSP+8. At the source level, these steps correspond to one big-step execution $K_{1}$ which allocates a memory block $b_{0}$ , stores i XOR key at $b_{0}$ , and prepares to call $p$ with $b_{0}$ . Therefore, the memory injection in $R$ maps $b_{0}$ to RSP+8. The source and target execution continue with transitions $X_{1}$ and $X_{2}$ to the external calls to $p$ , return from $p$ and go on until they return from encrypt.

2.1.2. Horizontal Composition via Kripke Memory Relations

The Kripke Memory Relation (KMR) injp provides essential protection for private values on the stack, which ensures that simulations between heterogeneous modules can be established and their horizontal composition is feasible.

We illustrate these points via our running example. Assume that the environment calls request in the client with $11$ which in turn calls encrypt in the server to get the value $11\;\text{XOR}\;42=33$ whose address is passed back to the client by calling process. Fig. 7 depicts a snapshot of the memory states and the injection right after process is entered (i.e., at line 8 in Fig. 3(a)), where boxes denote allocated memory blocks, black arrows between blocks denote injections, and red arrows denote pointers. The source semantics allocates one block for each local variable ( $b_{i}$ for i, $b_{0}$ for the encrypted value $33$ and $b_{r}$ for r) while the target semantics stores their values in registers or stacks ( $11$ is stored in RDI while $33$ on the stack because its address is taken and may be modified by the callee). One stack frame is allocated for each function call which stores private data including pointers to previous frames ( $b_{\texttt{RSP}}$ ), return addresses (RA), and callee-saved registers (e.g., RBX).

injp is essential for proving simulation for open modules as it guarantees simulation can be re-established after external calls return. Informally, at every external call site, injp marks all memory regions outside the footprint (domain and image) of the current injection as private and does not allow the external call to modify those memory regions. From the perspective of server.s, when the snapshot in Fig. 7 is taken, the execution is inside the thick dashed line in Fig. 6 and protected by injp. Therefore, all the shaded memory in Fig. 7 are marked as private and protected against the callback to process. Indeed, they correspond to either memory values turned into temporary variables (e.g., $b_{i}$ ) or private stack data (e.g., $b_{\text{RSP}}$ , RBX and RA in block $b_{\texttt{RSP}_{1}}$ ) that should not be touched by process. Such protection ensures that when process returns, all the private values are still valid, thereby re-establishing the simulation invariant.

The role of injp is reversed for the incoming calls from the environment: it guarantees that the entire execution from the initial query to the final reply will not touch any private memory of the environment. Therefore, injp is used to impose a reliance on memory protection by external calls and to provide a symmetric guarantee of memory protection for the environment callers. Any simulations with compatible language interfaces that satisfy this rely-guarantee condition can be horizontally composed. For example, we can horizontally compose $[\![\text{client.s}]\!]\preccurlyeq_{\texttt{ac}}[\![\text{client.c}]\!]$ and $[\![\text{server.s}]\!]\preccurlyeq_{\texttt{ac}}L_{\text{S}}$ into $[\![\text{client.s}]\!]\oplus[\![\text{server.s}]\!]\preccurlyeq_{\texttt{ac}}[\![\text{client.c}]\!]\oplus L_{\text{S}}$ in Fig. 4.

2.2. Uniform and Transitive KMR for Vertical Composition of Direct Refinements

Direct refinements are only useful if they can be vertically composed, which is critical for composing refinements obtained from individual compiler passes into a single top-level refinement such as $\preccurlyeq_{\text{ac}}$ and for further composition with source-level refinements as shown in Fig. 4.

We discuss our approach for addressing this problem by using CompCert and CompCertO as the concrete platforms. It is based on the following two observations. First, injp in fact captures the rely-guarantee conditions for memory protection needed by every compiler pass in CompCert. At a high-level, it means that the rely-guarantee conditions as depicted in Fig. 5 can all be replaced by injp (modulo the details on language interfaces). Second, injp is transitively composable, i.e., any vertical pairing of injp can be proved equivalent to a single injp. It means that given two refinements $L_{2}\preccurlyeq_{12}L_{1}$ and $L_{3}\preccurlyeq_{23}L_{2}$ as depicted in Fig. 5(b), when their rely-guarantee conditions are uniformly represented by injp, they can be merged into the direct refinement $L_{3}\preccurlyeq_{13}L_{1}$ in Fig. 5(c) with a single injp as the rely-guarantee condition. We shall present the technical challenges leading to these observations in §3 and elaborate on the observations themselves in §4.

By the above observations, an obvious approach for applying direct refinements to realistic optimizing compilers is to prove open simulation for every compiler pass using injp, and vertically compose those simulations into a single simulation. However, for a non-trivial compiler like CompCert, it means we need to rewrite a significant part of its proofs. More importantly, optimization passes in CompCert need additional rely-guarantee conditions as they are based on value analysis. To address the first problem, we start from the refinement proofs with least restrictive KMRs for individual passes in CompCertO (Koenig and Shao, 2021), and exploit the properties that these KMRs can eventually be “absorbed” into injp in vertical composition to generate a direct refinement parameterized only by injp. To address the second problem, we propose a notion of semantic invariant that captures the rely-guarantee conditions for value analysis. When piggybacked onto injp, this semantic invariant can be transitively composed along with injp and eventually pushed to the C level. It then becomes a condition for enabling optimizations at the source level, e.g., for supporting the refinement of the optimized server in Fig. 3(c). We discuss those solutions in §5.

Finally, we observe that source-level refinements can also be parameterized by injp, which enables end-to-end program verification as depicted in Fig. 4 as we shall discuss in §6.

3. Background and Challenges

3.1. Background

We introduce necessary background, including the memory model, the framework for simulation-based refinement, and injp which is critical for direct refinements.

3.1.1. Block-based Memory Model

By Leroy et al. (2012), a memory state $m$ (of type mem) consists of a disjoint set of memory blocks with unique identifiers and linear address space. A memory address or pointer $(b,o)$ points to the $o$ -th byte in the block $b$ where $b$ has type block and $o$ has type Z (integers). The value at $(b,o)$ is denoted by $m[b,o]$ . Values (of type val) are either undefined (Vundef), 32- or 64-bit integers or floats, or pointers of the form $\text{Vptr}(b,o)$ . For simplicity, we often write $b$ for $\text{Vptr}(b,0)$ . The memory operations including allocation, free, read and write are provided and governed by permissions of cells. The permission of a memory cell is ordered from high to low as $\text{Freeable}\geqslant\text{Writable}\geqslant\text{Readable}\geqslant\text{NA}$ where Freeable enables all operations, Writable enables all but free, Readable enables only read, and NA enables none. If $p_{1}\geqslant p_{2}$ then any cell with permission $p_{1}$ also implicitly has permission $p_{2}$ . $\text{perm}(m,p)$ denotes the set of memory cells with at least permission $p$ . For example, $(b,o)\in\text{perm}(m,\text{Readable})$ iff the cell at $(b,o)$ in $m$ is readable. An address with no permission at all is not in the footprint of memory.

Transformations of memory states are captured via partial functions $j:\text{block}\to\lfloor\text{block}\times\text{Z}\rfloor$ called injection functions, s.t. $j(b)=\emptyset$ if $b$ is removed from memory and $j(b)=\lfloor(b^{\prime},o)\rfloor$ if $b$ is shifted (injected) to $(b^{\prime},o)$ in the target memory. We define $\text{meminj}=\text{block}\to\lfloor\text{block}\times\text{Z}\rfloor$ . $v_{1}$ and $v_{2}$ are related under $j$ (denoted by $v_{1}\hookrightarrow_{v}^{j}v_{2}$ ) if either $v_{1}$ is Vundef, or they are both equal scalar values, or pointers shifted according to $j$ , i.e., $v_{1}=\text{Vptr}(b,o)$ , $j(b)=\lfloor(b^{\prime},o^{\prime})\rfloor$ and $v_{2}=\text{Vptr}(b^{\prime},o+o^{\prime})$ .

Given this relation, there is a memory injection between the source memory state $m_{1}$ and the target state $m_{2}$ under $j$ (denoted by $m_{1}\hookrightarrow_{m}^{j}m_{2}$ ) if the following properties are satisfied which ensure preservation of permissions and values under injection:

[TABLE]

Memory injections are transitive and necessary for verifying compiler transformations of memory structures (e.g., merging local variables into stack-allocated data and generating a concrete stack frame). For the remaining passes, a simpler relation called memory extension is used instead, which employs an identity injection. Reasoning about permissions under refinements is a major source of complexity.

3.1.2. A Framework for Open Simulations

In CompCertO (Koenig and Shao, 2021), a language interface $A=\langle A^{q},A^{r}\rangle$ is a pair of sets $A^{q}$ and $A^{r}$ denoting acceptable queries and replies for open modules, respectively. Different interfaces may be used for different languages. The relevant ones for our discussion have been introduced in §2.1 and listed as follows:

[TABLE]

Open labeled transition systems (LTS) represent semantics of modules that may accept queries and provide replies at the incoming side and provide queries and accept replies at the outgoing side (i.e., calling external functions). An open LTS $L:A\twoheadrightarrow B$ is a tuple $\langle D,S,I,\to,F,X,Y\rangle$ where $A$ ( $B$ ) is the language interface for outgoing (incoming) queries and replies, $D\subseteq B^{q}$ a set of initial queries, $S$ a set of internal states, $I\subseteq D\times S$ ( $F\subseteq S\times B^{r}$ ) transition relations for incoming queries (replies), $X\subseteq S\times A^{q}$ ( $Y\subseteq S\times A^{r}\times S$ ) transitions for outgoing queries (replies), and $\to\subseteq S\times\mathcal{E}^{*}\times S$ internal transitions emitting events of type $\mathcal{E}$ . Note that $(s,q^{O})\in X$ iff an outgoing query $q^{O}$ happens at $s$ ; $(s,r^{O},s^{\prime})\in Y$ iff after $q^{O}$ returns with $r^{O}$ the execution continues with an updated state $s^{\prime}$ .

Kripke relations are used to describe evolution of program states in open simulations between LTSs. A Kripke relation $R:W\to\{S\;|\;S\subseteq A\times B\}$ is a family of relations indexed by a Kripke world $W$ ; for simplicity, we define $\mathcal{K}_{W}(A,B)=W\to\{S\;|\;S\subseteq A\times B\}$ . A simulation convention relating two language interfaces $A_{1}$ and $A_{2}$ is a tuple $\mathbb{R}=\langle W,\mathbb{R}^{q}:\mathcal{K}_{W}(A^{q}_{1},A^{q}_{2}),\mathbb{R}^{r}:\mathcal{K}_{W}(A^{r}_{1},A^{r}_{2})\rangle$ which we write as $\mathbb{R}:A_{1}\Leftrightarrow A_{2}$ . Simulation conventions serve as interfaces of open simulations by relating source and target language interfaces. For example, a C-level convention $\text{c}:\mathcal{C}\Leftrightarrow\mathcal{C}=\langle\text{meminj},\mathbb{R}_{\text{c}}^{q},\mathbb{R}_{\text{c}}^{r}\rangle$ relates C queries and replies as follows, where the Kripke world consists of injections and, in a given world $j$ , the values and memory in queries and replies are related by $j$ .

[TABLE]

Open forward simulations describe refinement between LTS. To establish an open (forward) simulation between $L_{1}:A_{1}\twoheadrightarrow B_{1}$ and $L_{2}:A_{2}\twoheadrightarrow B_{2}$ , one needs to find two simulation conventions $\mathbb{R}_{A}:A_{1}\Leftrightarrow A_{2}$ and $\mathbb{R}_{B}:B_{1}\Leftrightarrow B_{2}$ that connect queries and replies at the outgoing and incoming sides, and show the internal execution steps and external interactions of open modules are related by an invariant $R$ . This simulation is denoted by $L_{1}\leqslant_{\mathbb{R}_{A}\twoheadrightarrow\mathbb{R}_{B}}L_{2}$ and formally defined as follows (for simplicity, we shall write $L_{1}\leqslant_{\mathbb{R}}L_{2}$ to denote $L_{1}\leqslant_{\mathbb{R}\twoheadrightarrow\mathbb{R}}L_{2}$ ):

Definition 3.1.

Given $L_{1}:A_{1}\twoheadrightarrow B_{1}$ , $L_{2}:A_{2}\twoheadrightarrow B_{2}$ , $\mathbb{R}_{A}:A_{1}\Leftrightarrow A_{2}$ and $\mathbb{R}_{B}:B_{1}\Leftrightarrow B_{2}$ , $L_{1}\leqslant_{\mathbb{R}_{A}\twoheadrightarrow\mathbb{R}_{B}}L_{2}$ holds if there is some Kripke relation $R\in\mathcal{K}_{W_{B}}(S_{1},S_{2})$ that satisfies:

[TABLE]

Here, property (1) requires initial queries to match; (2) requires initial states to hold under the invariant $R$ ; (3) requires internal execution to preserve $R$ ; (4) requires $R$ to be preserved across external calls, and (5) requires final replies to match. According to these properties, a complete forward simulation looks like Fig. 8. From the above definition, it is easy to prove the horizontal and vertical compositionality of open simulations and adequacy for assembly modules, i.e., $\forall\;L_{1}\;L_{2}\;L_{1}^{\prime}\;L_{2}^{\prime},\;L_{1}\leqslant_{\mathbb{R}}L_{2}\Rightarrow L_{1}^{\prime}\leqslant_{\mathbb{R}}L_{2}^{\prime}\Rightarrow L_{1}\oplus L_{1}^{\prime}\leqslant_{\mathbb{R}}L_{2}\oplus L_{2}^{\prime}$ and $\forall\;(M_{1}\;M_{2}:\text{Asm}),\;[\![M_{1}]\!]\oplus[\![M_{2}]\!]\leqslant_{\text{id}}[\![M_{1}+M_{2}]\!]$ .

The Kripke worlds (e.g., memory injections) may evolve as the execution goes on. Rely-guarantee reasoning about such evolution is essential for horizontal composition of simulations. For illustration, the Kripke worlds at the boundary of modules are displayed in Fig. 8. The evolution of worlds across external calls is governed by an accessibility relation $w_{A}\leadsto_{A}w_{A}^{\prime}$ for describing the rely-condition. By assuming $w_{A}\leadsto_{A}w_{A}^{\prime}$ , one needs to prove the guarantee condition $w_{B}\leadsto_{B}w_{B}^{\prime}$ , i.e., the evolution of worlds in the whole execution respects $\leadsto_{B}$ . Simulations with symmetric rely-guarantee conditions can be horizontally composed, even with mutual calls between modules.

Note that the accessibility relation and evolution of worlds between queries and replies is not encoded explicitly in the definition of simulation conventions. Instead, they are implicit by assuming a modality operator $\Diamond$ is always applied to $\mathbb{R}^{r}$ s.t. $r\in{\Diamond\mathbb{R}^{r}(w)}\Leftrightarrow\exists\;w^{\prime},w\leadsto w^{\prime}\land r\in\mathbb{R}^{r}(w^{\prime})$ . For simplicity, we often ignore accessibility and modality when talking purely about simulation conventions in the remaining discussion.

Accessibility relations are mainly for describing evolution of memory states across external calls. For this, simulation conventions are parameterized by Kripke Memory Relations or KMR.

Definition 3.2.

A Kripke Memory Relation is a tuple $\langle W,f,\leadsto,R\rangle$ where $W$ is a set of worlds, $f:W\to\text{meminj}$ a function for extracting injections from worlds, $\leadsto\subseteq W\times W$ an accessibility relation between worlds and $R:\mathcal{K}_{W}(\text{mem},\text{mem})$ a Kripke relation over memory states that is compatible with the memory operations. We write ${w}\leadsto{w^{\prime}}$ for $(w,w^{\prime})\in\leadsto$ .

We write $\mathbb{R}_{K}$ to emphasize that a simulation convention $\mathbb{R}$ is parameterized by the KMR ${K}$ , meaning $\mathbb{R}_{K}$ shares the same type of worlds with $K$ and inherits its accessibility relation.

The most interesting KMR is injp as it provides protection on memory w.r.t. injections.

Definition 3.3 (Kripke Relation with Memory Protection).

$\text{injp}=\langle W_{\text{injp}},f_{\text{injp}},\leadsto_{\text{injp}},R_{\text{injp}}\rangle$ where $W_{\text{injp}}=(\text{meminj}\times\text{mem}\times\text{mem})$ , $f_{\text{injp}}(j,\_,\_)=j$ , $(m_{1},m_{2})\in R_{\text{injp}}(j,m_{1},m_{2})\Leftrightarrow m_{1}\hookrightarrow_{m}^{j}m_{2}$ and

[TABLE]

Here, $\text{mem-acc}(m,m^{\prime})$ denotes monotonicity of memory states such as valid blocks can only increase and read-only data does not change in value. $\text{unchanged-on}(m,m^{\prime})$ denotes memory cells whose permissions and values are not changed from $m$ to $m^{\prime}$ and

[TABLE]

By definition, a world $(j,m_{1},m_{2})$ evolves to $(j^{\prime},m_{1}^{\prime},m_{2}^{\prime})$ under injp only if $j^{\prime}$ is strictly larger than $j$ and any memory cells in $m_{1}$ and $m_{2}$ not in the domain (i.e., unmapped by $j$ ), or image of $j$ (i.e., out-of-reach by $j$ from $m_{1}$ ) will be protected, meaning their values and permissions are unchanged from $m_{1}$ ( $m_{2}$ ) to $m_{1}^{\prime}$ ( $m_{2}^{\prime}$ ). An example is shown in Fig. 9 where the shaded regions in $m_{1}$ are unmapped by $j$ and unchanged while those in $m_{2}$ are out-of-reach from $j$ and unchanged. $m_{1}^{\prime}$ and $m_{2}^{\prime}$ may contain newly allocated blocks which are not protected by injp. When injp is used at the outgoing side, it denotes that the simulation relies on knowing that the unmapped and out-of-reach regions at the call side are not modified by external calls. When injp is used at the incoming side, it denotes that the simulation guarantees such regions at initial queries are not modified by the simulation itself.

3.2. Challenges for Vertically Composing Open Simulations

As discussed in §2.2, the challenge for constructing direct refinements for multi-pass optimizing compilers lies in their vertical composition. The most basic vertical composition for open simulations is stated below which is easily proved by pairing of individual simulations (Koenig and Shao, 2021).

Theorem 3.4 (V. Comp).

Given $L_{1}:A_{1}\twoheadrightarrow B_{1}$ , $L_{2}:A_{2}\twoheadrightarrow B_{2}$ and $L_{3}:A_{3}\twoheadrightarrow B_{3}$ , and given $\mathbb{R}_{12}:A_{1}\Leftrightarrow A_{2}$ , $\mathbb{S}_{12}:B_{1}\Leftrightarrow B_{2}$ , $\mathbb{R}_{23}:A_{2}\Leftrightarrow A_{3}$ and $\mathbb{S}_{23}:B_{2}\Leftrightarrow B_{3}$ ,

[TABLE]

Here, $(\_\boldsymbol{\cdot}\_)$ is a composed simulation convention s.t. $\mathbb{R}\boldsymbol{\cdot}\mathbb{S}=\langle W_{\mathbb{R}}\times W_{\mathbb{S}},{\mathbb{R}^{q}}\boldsymbol{\cdot}{\mathbb{S}^{q}},{\mathbb{R}^{r}}\boldsymbol{\cdot}{\mathbb{S}^{r}}\rangle$ where for any $q_{1}$ and $q_{3}$ , ${(q_{1},q_{3})\in{\mathbb{R}^{q}}\boldsymbol{\cdot}{\mathbb{S}^{q}}}(w_{\mathbb{R}},w_{\mathbb{S}})\Leftrightarrow\exists q_{2},(q_{1},q_{2})\in{\mathbb{R}^{q}(w_{\mathbb{R}})}\land(q_{2},q_{3})\in{\mathbb{S}^{q}(w_{\mathbb{S}})}$ (similarly for ${{\mathbb{R}^{r}}\boldsymbol{\cdot}{\mathbb{S}^{r}}}$ ). Then, given any compiler with $N$ passes and their refinement relations $L_{1}\leqslant_{\mathbb{R}_{12}\twoheadrightarrow\mathbb{S}_{12}}L_{2},\ldots,L_{N}\leqslant_{\mathbb{R}_{N,N+1}\twoheadrightarrow\mathbb{S}_{N,N+1}}L_{N+1}$ , we get their concatenation $L_{1}\leqslant_{\mathbb{R}_{12}\boldsymbol{\cdot}\ldots\boldsymbol{\cdot}\mathbb{R}_{N,N+1}\twoheadrightarrow\mathbb{S}_{12}\boldsymbol{\cdot}\ldots\boldsymbol{\cdot}\mathbb{S}_{N,N+1}}L_{N+1}$ , which exposes internal compilation and weakens compositionality as we have discussed in §1.2.

The above problem may be solved if the composed simulation convention can be refined into a single convention directly relating source and target queries and replies. Given two simulation conventions $\mathbb{R},\mathbb{S}:A_{1}\Leftrightarrow A_{2}$ , $\mathbb{R}$ is refined by $\mathbb{S}$ if

[TABLE]

which we write as $\mathbb{R}\sqsubseteq\mathbb{S}$ . If both $\mathbb{R}\sqsubseteq\mathbb{S}$ and $\mathbb{S}\sqsubseteq\mathbb{R}$ , then $\mathbb{R}$ and $\mathbb{S}$ are equivalent and written as $\mathbb{R}\equiv\mathbb{S}$ . By definition, $\mathbb{R}\sqsubseteq\mathbb{S}$ indicates any query for ${\mathbb{S}}$ can be converted into a query for ${\mathbb{R}}$ and any reply resulting from the converted query can be converted back to a reply for ${\mathbb{S}}$ . By wrapping the incoming side of an open simulation with a more general convention and its outgoing side with a more specialized convention, one gets another valid open simulation (Koenig and Shao, 2021):

Theorem 3.5.

Given $L_{1}:A_{1}\twoheadrightarrow B_{1}$ and $L_{2}:A_{2}\twoheadrightarrow B_{2}$ , if $\mathbb{R}^{\prime}_{A}\sqsubseteq\mathbb{R}_{A}:A_{1}\Leftrightarrow A_{2}$ , $\mathbb{R}_{B}\sqsubseteq\mathbb{R}^{\prime}_{B}:B_{1}\Leftrightarrow B_{2}$ and $L_{1}\leqslant_{\mathbb{R}_{A}\twoheadrightarrow\mathbb{R}_{B}}L_{2}$ , then $L_{1}\leqslant_{\mathbb{R}^{\prime}_{A}\twoheadrightarrow\mathbb{R}^{\prime}_{B}}L_{2}$ .

Now, we would like to prove the “real” vertical composition generating direct refinements (simulations). Given any $L_{1}\leqslant_{\mathbb{R}_{12}\twoheadrightarrow\mathbb{R}_{12}}L_{2}$ and $L_{2}\leqslant_{\mathbb{R}_{23}\twoheadrightarrow\mathbb{R}_{23}}L_{3}$ , if we can show the existence of simulation conventions $\mathbb{R}_{13}$ directly relating source and target semantics s.t. $\mathbb{R}_{13}\equiv\mathbb{R}_{12}\boldsymbol{\cdot}\mathbb{R}_{23}$ , then $L_{1}\leqslant_{\mathbb{R}_{13}\twoheadrightarrow\mathbb{R}_{13}}L_{3}$ holds by Theorem 3.4 and Theorem 3.5, which is the desired direct refinement. This composition is illustrated in Fig. 10 where the parts enclosed by dashed boxes represent the concatenation of $L_{1}\leqslant_{\mathbb{R}_{12}\twoheadrightarrow\mathbb{R}_{12}}L_{2}$ and $L_{2}\leqslant_{\mathbb{R}_{23}\twoheadrightarrow\mathbb{R}_{23}}L_{3}$ . The direct queries and replies are split and merged for interaction with parallelly running simulations underlying the direct refinement.

Since simulation conventions are parameterized by KMRs, a major obstacle to the real vertical composition of open simulations is to prove KMRs for individual simulations can be composed into a single KMR. For this, one needs to define refinements between KMRs. Given any KMRs $K$ and $L$ , $K\sqsubseteq L$ (i.e., $K$ is refined by $L$ ) holds if the following is true:

[TABLE]

We write $K\equiv L$ to denote that $K$ and $L$ are equivalent, i.e., $K\sqsubseteq L$ and $L\sqsubseteq K$ .

Continue with the proof of real vertical composition, i.e., proving $\mathbb{R}_{13}\equiv\mathbb{R}_{12}\boldsymbol{\cdot}\mathbb{R}_{23}$ . Assume $\mathbb{R}_{i}$ is parameterized by KMR $K_{i}$ , showing the existence of $\mathbb{R}_{13}$ s.t. $\mathbb{R}_{13}\sqsubseteq\mathbb{R}_{12}\boldsymbol{\cdot}\mathbb{R}_{23}$ amounts to proving a parallel refinement over the parameterizing KMRs, i.e., there exists $K_{13}$ s.t. $K_{13}\sqsubseteq K_{12}\boldsymbol{\cdot}K_{23}$ where $K_{12}\boldsymbol{\cdot}K_{23}=\langle W_{12}\times W_{23},f_{12}\times f_{23},\leadsto_{12}\times\leadsto_{23},R_{12}\times R_{23}\rangle$ . A more intuitive interpretation is depicted in Fig. 11(a) where black symbols are $\forall$ -quantified (assumptions we know) and red ones are $\exists$ -quantified (conclusions we need to construct). Note that Fig. 11(a) exactly mirrors the refinement on the outgoing side in Fig. 10. For simplicity, we use $w_{i}$ not only to represent worlds, but also to denote $R_{i}(w_{i})$ (where $R_{i}$ is the Kripke relation given by KMR $K_{i}$ ) when it connects memory states through vertical lines. A dual property we need to prove for the incoming side is shown in Fig. 11(b).

In both cases in Fig. 11, we need to construct interpolating states for relating source and target memory (i.e., $m_{2}^{\prime}$ in Fig. 11(a) and $m_{2}$ in Fig. 11(b)). The construction of $m_{2}^{\prime}$ is especially challenging, for which we need to decompose the evolved world $w_{13}^{\prime}$ into $w_{12}^{\prime}$ and $w_{23}^{\prime}$ s.t. they are accessible from the original worlds $w_{12}$ and $w_{23}$ . It is not clear at all how this construction is possible because 1) $m_{2}^{\prime}$ may have many forms since Kripke relations are in general non-deterministic and 2) KMRs (e.g., injp) introduce memory protection for external calls which may not hold after the (de-)composition.

Because of the above difficulties, existing approaches either make substantial changes to semantics for constructing interpolating states, thereby destroying adequacy (Stewart et al., 2015), or do not even try to merge Kripke memory relations, but instead leave them as separate entities (Song et al., 2020; Koenig and Shao, 2021). As a result, direct refinements cannot be achieved.

4. A Uniform and Transitive Kripke Memory Relation

To overcome the challenge for vertically composing open simulations, we exploit the observation that injp in fact can be viewed as a most general KMR. Then, the compositionality of KMRs discussed in §3.2 is reduced to transitivity of injp, i.e., $\text{injp}\equiv\text{injp}\boldsymbol{\cdot}\text{injp}$ .

4.1. Uniformity of injp

We show that injp is both a reasonable guarantee condition and a reasonable rely condition for all the compiler passes in CompCert. It is based on the observation that a notion of private and public memory can be derived from injections and coincides with the protection provided by injp.

4.1.1. Public and Private Memory via Memory Injections

Definition 4.1.

Given $m_{1}\hookrightarrow_{m}^{j}m_{2}$ , the public memory regions in $m_{1}$ and $m_{2}$ are defined as follows:

[TABLE]

By definition, a cell $(b,o)$ is public in the source memory if it is in the domain of $j$ , and $(b,o)$ is public in the target memory if it is mapped by $j$ from some valid public source memory. Any memory not public with respect to $j$ is private. We can see that private memory corresponds exactly to unmapped and out-of-reach memory defined by injp, i.e., for any $b$ and $o$ , $(b,o)\in\text{pub-src-mem}(j)\Leftrightarrow(b,o)\not\in\text{unmapped}(j)$ and $(b,o)\in\text{pub-tgt-mem}(j,m)\Leftrightarrow(b,o)\not\in\text{out-of-reach}(j,m)$ .

With Definition 4.1 and the properties of memory injection (see §3.1.1), we can easily prove access of pointers in a readable and public source location gets back another public location.

Lemma 4.2.

Given $m_{1}\hookrightarrow_{m}^{j}m_{2}$ ,

[TABLE]

It implies that readable public memory regions form a “closure” such that the sequences of reads are bounded inside these regions, as shown in Fig. 12. The horizontal arrows indicates a pointer value $(b_{i+1},o_{i+1})$ is read from $(b_{i},o_{i})$ with possible adjustment with pointer arithmetic. Note that all memory cells at $(b_{i},o_{i})$ s and $(b_{i}^{\prime},o_{i}^{\prime})$ s have Readable permission. By Lemma 4.2, $(b_{i},o_{i})$ s are all in public regions. By Definition 4.1, the mirroring reads $(b_{i}^{\prime},o_{i}^{\prime})$ s are also in public regions.

4.1.2. injp as a Uniform Rely Condition

injp is adequate for preventing external calls from interfering with internal execution for all the compiler passes of CompCert. 222In fact, the properties in Definition 3.3 are exactly from CompCert’s assumptions on external calls. To illustrate this point, we discuss the effect of injp on two of CompCert’s passes using Fig. 13(a) as an example where g is an external function. The first pass is SimplLocals which converts local variables whose memory addresses are not taken into temporary ones. As shown in Fig. 13(b), x is turned into a temporary variable at the target level which is not visible to g. Therefore, x at the source level becomes private data as its block $b_{x}$ is unmapped by $j$ , thereby protected by injp and cannot be modified by g. The second pass is Stacking which expands the stack frames with private regions for return addresses, spilled registers, arguments, etc. Continuing with our example, the only public stack data in Fig. 13(c) is $y$ . All the private data is out-of-reach, thereby protected by injp.

4.1.3. injp as a Uniform Guarantee Condition

For injp to serve as a uniform guarantee condition, it suffices to show the private memory of the environment is protected between initial calls and final replies. During an open forward simulation, all incoming values and memories are related by some initial injection $j$ (e.g., $\vec{v_{1}}\hookrightarrow_{v}^{j}\vec{v_{2}}$ and $m_{1}\hookrightarrow_{m}^{j}m_{2}$ ). In particular, the pointers in them are related by $j$ . Therefore, any sequence of reads starting from pointers stored in the initial queries only inspect public memories in the source and target, as already shown in Fig. 12. The private (i.e., unmapped or out-of-reach) regions of the initial memories are not modified by internal execution. Moreover, because injection functions only grow bigger during execution but never change in value and the outgoing calls have injp as a rely-condition, the initially unmapped (out-of-reach) regions will stay unmapped (out-of-reach) and be protected during external calls. As a result, we conclude that injp is a reasonable guarantee condition for any open simulation.

4.2. Transitivity of injp

The goal is to show the two refinements in Fig. 11 hold when $K_{ij}=\text{injp}$ , i.e., $\text{injp}\equiv\text{injp}\boldsymbol{\cdot}\text{injp}$ . As discussed in §3.2 the critical step is to construct interpolating memory states that transitively relate source and target states. The construction is based on two observations: 1) the memory injections deterministically decide the value and permissions of public memory because they encode partial functional transformations on memory states, and 2) any memory not in the domain or range of the partial functions is protected (private) and unchanged throughout external calls. Although the proof is quite involved, the result can be reused for all compiler passes thanks to injp’s uniformity. The formal proof of transitivity of injp can be found in Appendix A.

4.2.1. $\text{injp}\sqsubseteq\text{injp}\boldsymbol{\cdot}\text{injp}$

By definition, we need to prove the following lemma:

Lemma 4.3.

$\text{injp}\sqsubseteq\text{injp}\boldsymbol{\cdot}\text{injp}$ * holds. That is,*

[TABLE]

This lemma conforms to the graphic representation in Fig. 11(a). To prove it, an obvious choice is to pick $j_{13}=j_{23}\boldsymbol{\cdot}j_{12}$ . Then, we are left to prove the existence of interpolating state $m_{2}^{\prime}$ and the memory and accessibility relations as shown in Fig. 14. By definition, $m_{2}^{\prime}$ consists of memory blocks newly allocated with respect to $m_{2}$ and blocks that already exist in $m_{2}$ . The latter can be further divided into public and private memory regions with respect to injections $j_{12}$ and $j_{23}$ . Then, $m_{2}^{\prime}$ is constructed following the ideas that 1) the public and newly allocated memory should be projected from the updated source memory $m_{1}^{\prime}$ by $j_{12}^{\prime}$ , and 2) the private memory is protected by injp and should be copied over from $m_{2}$ to $m_{2}^{\prime}$ .

We use the concrete example in Fig. 15 to motivate the construction of $m_{2}^{\prime}$ . Here, the white and green areas correspond to locations in $\text{perm}(\_,\text{NA})$ (with at least some permission) and in $\text{perm}(\_,\text{Readable})$ (with at least readable permission), respectively. Given $m_{1}\hookrightarrow_{m}^{j_{12}}m_{2}$ , $m_{2}\hookrightarrow_{m}^{j_{23}}m_{3}$ and $(j_{23}\boldsymbol{\cdot}j_{12},m_{1},m_{3})\leadsto_{\text{injp}}(j_{13}^{\prime},m_{1}^{\prime},m_{3}^{\prime})$ , we need to define $j_{12}^{\prime}$ and $j_{23}^{\prime}$ and build $m_{2}^{\prime}$ satisfying $m_{1}^{\prime}\hookrightarrow_{m}^{j_{12}^{\prime}}m_{2}^{\prime}$ , $m_{2}^{\prime}\hookrightarrow_{m}^{j_{23}^{\prime}}m_{3}^{\prime}$ , $(j_{12},m_{1},m_{2})\leadsto_{\text{injp}}(j_{12}^{\prime},m_{1}^{\prime},m_{2}^{\prime})$ , and $(j_{23},m_{2},m_{3})\leadsto_{\text{injp}}(j_{23}^{\prime},m_{2}^{\prime},m_{3}^{\prime})$ . $m_{1}^{\prime}$ and $m_{3}^{\prime}$ are expansions of $m_{1}$ and $m_{3}$ with new blocks and possible modification to the public regions of $m_{1}$ and $m_{3}$ . Here, $m_{1}^{\prime}$ has a new block $b_{1}^{4}$ and $m_{3}^{\prime}$ has two new block $b_{3}^{3}$ and $b_{3}^{4}$ .

We first fix $j_{12}^{\prime}$ , $j_{23}^{\prime}$ and the shape of blocks in $m_{2}^{\prime}$ . We begin with $m_{2}$ and introduce a newly allocated block $b_{2}^{4}$ whose shape matches $b_{1}^{4}$ in $m_{1}^{\prime}$ . Then, $j_{12}^{\prime}$ is obtained by expanding $j_{12}$ with identity mapping from $b_{1}^{4}$ to $b_{2}^{4}$ . Furthermore, $j_{23}^{\prime}$ is also expanded with a mapping from $b_{2}^{4}$ to a block in $m_{3}^{\prime}$ ; this mapping is determined by $j_{13}^{\prime}$ .

We then set the values and permissions for memory cells in $m_{2}^{\prime}$ so that it satisfies injection and the unchanged-on properties for readable memory regions implied by $(j_{12},m_{1},m_{2})\leadsto_{\text{injp}}(j_{12}^{\prime},m_{1}^{\prime},m_{2}^{\prime})$ and $(j_{23},m_{2},m_{3})\leadsto_{\text{injp}}(j_{23}^{\prime},m_{2}^{\prime},m_{3}^{\prime})$ . The values and permissions for newly allocated blocks are obviously mapped from $m_{1}^{\prime}$ by $j_{12}^{\prime}$ . Those for old blocks are fixed as follows. By memory protection provided in $(j_{23}\boldsymbol{\cdot}j_{12},m_{1},m_{3})\leadsto_{\text{injp}}(j_{13}^{\prime},m_{1}^{\prime},m_{3}^{\prime})$ , the only memory cells in $m_{1}$ that may have been modified in $m_{1}^{\prime}$ are those mapped all the way to $m_{3}$ by $j_{23}\boldsymbol{\cdot}j_{12}$ , while the cells in $m_{3}$ that may be modified in $m_{3}^{\prime}$ must be in the image of $j_{23}\boldsymbol{\cdot}j_{12}$ . To match this fact, the only old memory regions in $m_{2}^{\prime}$ whose values and permissions may be modified are those both in the image of $j_{12}$ and the domain of $j_{23}$ . Those are the public memory with respect to $j_{12}$ and $j_{23}$ and displayed as the gray areas in Fig. 15(b). Following idea 1) above, the values and permissions in those regions are projected from $m_{1}^{\prime}$ by applying the injection function $j_{12}$ . Note that there is an exception: values in read-only public regions are copied over from $m_{2}$ . Following idea 2) above, the remaining old memory regions are private with respect to $j_{12}$ and $j_{23}$ and should have the same values and permissions as in $m_{2}$ .

Note that the accessibility relations $(j_{12},m_{1},m_{2})\leadsto_{\text{injp}}(j_{12}^{\prime},m_{1}^{\prime},m_{2}^{\prime})$ and $(j_{23},m_{2},m_{3})\leadsto_{\text{injp}}(j_{23}^{\prime},m_{2}^{\prime},m_{3}^{\prime})$ can be derived from $(j_{23}\boldsymbol{\cdot}j_{12},m_{1},m_{3})\leadsto_{\text{injp}}(j_{13}^{\prime},m_{1}^{\prime},m_{3}^{\prime})$ because the latter enforces stronger protection than the former. This is due to unmapped and out-of-reach regions getting bigger as memory injections get composed. For example, in Fig. 15, $b_{1}^{1}$ is mapped by $j_{12}$ but becomes unmapped by $j_{23}\boldsymbol{\cdot}j_{12}$ ; the image of $b_{2}^{1}$ in $b_{3}^{1}$ is in reach by $j_{23}$ but becomes out-of-reach by $j_{23}\boldsymbol{\cdot}j_{12}$ .

4.2.2. $\text{injp}\boldsymbol{\cdot}\text{injp}\sqsubseteq\text{injp}$

By definition, we need to prove:

Lemma 4.4.

$\text{injp}\boldsymbol{\cdot}\text{injp}\sqsubseteq\text{injp}$ * holds. That is,*

[TABLE]

This lemma conforms to Fig. 11(b). To prove it, we pick $j_{12}$ to be an partial identity injection ( $j_{12}(b)=\lfloor b,0\rfloor$ when $j_{13}(b)\neq\emptyset$ ) , $j_{23}=j_{13}$ and $m_{2}=m_{1}$ . Then the lemma is reduced to proving the existence of $j_{13}^{\prime}$ that satisfies $(j_{13},m_{1},m_{3})\leadsto_{\text{injp}}(j_{13}^{\prime},m_{1}^{\prime},m_{3}^{\prime})$ and $m_{1}^{\prime}\hookrightarrow_{m}^{j_{13}^{\prime}}m_{3}^{\prime}$ . By picking $j_{13}^{\prime}=j_{12}^{\prime}\boldsymbol{\cdot}j_{23}^{\prime}$ , we can easily prove these properties by exploiting the properties of injp.

5. Derivation of the Direct Refinement for CompCert

In this section, we discuss the proofs and composition of open simulations for the compiler passes of CompCert into the direct refinement $\preccurlyeq_{\texttt{ac}}$ following the ideas discussed in §2.2. CompCert compiles Clight programs into Asm programs through 19 passes (Leroy, 2023), including several optimization passes working on the RTL intermediate language. First, we prove the open simulations for all these passes with appropriate simulation conventions. In particular, we directly reuse the proofs of non-optimizing passes in CompCertO and update the proofs of optimizing passes with semantic invariants. Second, we prove a collection of properties for refining simulation conventions in preparation for vertical composition. Those properties enable absorption of KMRs into injp and composition of semantic invariants. They rely critically on transitivity of injp. Finally, we vertically compose the simulations and refine the incoming and outgoing simulation conventions into a single simulation convention $\mathbb{C}$ , thereby establishing $\leqslant_{\mathbb{C}}$ as the top-level refinement $\preccurlyeq_{\texttt{ac}}$ .

5.1. Open Simulation of Individual Passes

We list the compiler passes and their simulation types in Table 1 (passes on the right follow the passes on the left) together with their source and target languages and interfaces (in bold fonts). The passes in black are reused from CompCertO, while those in red are reproved optimizing passes. The passes in blue are self-simulating passes we inserted; they will be used in §5.3 for refining composed simulation conventions. Note that we have omitted passes with the identity simulation convention (i.e., simulations of the form $L_{1}\leqslant_{\text{id}}L_{2}$ ) in Table 1 as they do not affect the proofs. 333The omitted passes are Cshmgen, Renumber, Linearize, CleanupLabels and Debugvar.

5.1.1. Simulation Conventions and Semantic Invariants

We first introduce relevant simulation conventions and semantic invariants shown in Table 1. The simulation conventions $\text{c}_{K}:\mathcal{C}\Leftrightarrow\mathcal{C}$ , $\text{ltl}_{K}:\mathcal{L}\Leftrightarrow\mathcal{L}$ , $\text{mach}_{K}:\mathcal{M}\Leftrightarrow\mathcal{M}$ , and $\text{asm}_{K}:\mathcal{A}\Leftrightarrow\mathcal{A}$ relate the same language interfaces with queries and replies native to the associated intermediate languages. They are parameterized by a KMR $K$ to allow different compiler passes to have different assumptions on memory evolution. Conceptually, this parameterization is unnecessary as we can simply use injp for every pass due to its uniformity (as discussed in §4.1). Nevertheless, it is useful because the compiler proofs become simpler and more natural with the least restrictive KMRs which may be weaker than injp. CompCertO defines several KMRs weaker than injp: id is used when memory is unchanged; ext is used when the source and target memory share the same structure; inj is a simplified version of injp without its memory protection. The simulation conventions $\text{CL}:\mathcal{C}\Leftrightarrow\mathcal{L}$ , $\text{LM}:\mathcal{L}\Leftrightarrow\mathcal{M}$ and $\text{MA}:\mathcal{M}\Leftrightarrow\mathcal{A}$ capture the calling convention of CompCert: CL relates C-level queries and replies to those in the LTL language where the arguments are distributed to abstract stack slots; LM further relates abstract stack slots with states on an architecture independent machine; MA relates this state to registers and memory in the assembly language (X86 assembly in our case). As discussed before, some refinements rely on invariants on the source semantics. The semantic invariant wt enforces that arguments and return values of function calls respect function signatures. ro is critical for ensuring the correctness of optimizations, which will be discussed next.

5.1.2. Open Simulation of Optimizations

The optimizing passes Constprop, CSE and Deadcode perform constant propagation, common subexpression elimination and dead code elimination, respectively. They make use of a static value analysis algorithm for collecting information of variables during the execution. For each function, this algorithm starts with the known initial values of read-only (constant) global variables. It simulates the function execution to analyze the values of global or local variables after executing each instruction. In particular, for global constant variables, their references at any point should have the initial values of constants. For local variables stored on the stack, their references may have initial values or may not if interfered by other function calls. When the analysis encounters a call to another function, it checks whether the address of current stack frame is leaked to the callee directly through arguments or indirectly through pointers in memory. If not, then the stack frame is considered unreachable from its callee. Consequently, the references to local variables on unreachable stack frames after function calls remain to be their initial values. Based on this analysis, the three passes then identify and perform optimizations.

Most of the proofs of closed simulations for those passes can be adapted to open simulation straightforwardly. The only and main difficulty is to prove that information derived from static analysis is consistent with the dynamic memory states in incoming queries and after external calls return. We introduce the semantic invariant ro and combine it with injp to ensure this consistency. The above optimization passes all use $\text{ro}\boldsymbol{\cdot}\text{c}_{\text{injp}}$ as their simulation conventions (because RTL conforms to the C interface). The adaptation of optimization proofs for those passes is similar. As an example, we only discuss constant propagation whose correctness theorem is stated as follows:

Lemma 5.1.

$\forall(M\;M^{\prime}:{\text{RTL}}),\text{Constprop}(M)=M^{\prime}\Rightarrow[\![M]\!]\leqslant_{\text{ro}\boldsymbol{\cdot}\text{c}_{\text{injp}}}[\![M^{\prime}]\!].$ **

Instead of presenting its proof, we illustrate how ro and injp help establish the open simulation for Constprop through a concrete example as depicted in Fig. 16. This example covers optimization for both global constants (e.g., key) and local variables (e.g., a). By static analysis of Fig. 16(a), *1) *key contains $42$ at line 4 because key is a constant global variable and, *2) * both key and a contain $42$ after the external call to foo returns to line 6. Here, the analysis confirms key has the value $42$ because foo (if well-behaved) will not modify a constant global variable. Furthermore, a has the value $42$ because it resides in the stack frame of double_key which is unreachable from foo (in fact, a is the only variable in the frame). As a result, the source program is optimized into Fig. 16(b).

We first show that ro guarantees the dynamic values of global constants are consistent with static analysis. That is, global variables are correct in incoming memory and are protected during external calls. ro is defined as follows:

Definition 5.2.

$\text{ro}:\mathcal{C}\Leftrightarrow\mathcal{C}=\langle W_{\text{ro}},\mathbb{R}_{\text{ro}}^{q},\mathbb{R}_{\text{ro}}^{r}\rangle$ where $W_{\text{ro}}=(\text{symtbl}\times\text{mem})$ and

[TABLE]

Note that although ro takes the form of a simulation convention, it only relates the same queries and replies, i.e., it only enforces invariants on the source side. This kind of simulation conventions are what we called semantic invariants. A symbol table $se$ (of type symtbl) is provided together with memory, so that the semantics can locate memory blocks of global definitions and find the initial values of global variables. $\text{ro-valid}(se,m)$ states that the values of global constant variables in the incoming memory $m$ are the same as their initial values. Therefore, the optimization of key into $42$ at line 4 of Fig. 16(a) is correct. For the external call to foo, monotonicity $\text{mem-acc}(m,m^{\prime})$ ensures that read-only values in memory are unchanged, therefore the above property is preserved from external queries to replies (i.e., $\text{ro-valid}(se,m)\Rightarrow\text{mem-acc}(m,m^{\prime})\Rightarrow\text{ro-valid}(se,m^{\prime})$ ). As a result, replacing key with $42$ at line 6 makes sense.

We then show that injp guarantees the dynamic values of unreachable local variables are consistent with static analysis. That is, unreachable stack values are unchanged by external calls. This protection is realized by injp with shrinking memory injections. Fig. 17 shows the protection of a when calling foo. Before the external call to foo, the source blocks $b_{\text{a}}$ and $b_{\text{key}}$ are mapped to target blocks by the current injection $j$ . The analysis determines that the argument and memory passed to foo do not contain any pointer to $b_{\text{a}}$ . Therefore, we can simply remove $b_{\text{a}}$ from $j$ to get a shrunk yet valid memory injection $j_{1}$ . Then, $b_{\text{a}}$ is protected during the call to foo. $b_{\text{a}}$ is added back to the injection after foo returns and the simulation continues.

Finally, Unusedglob which removes unused static global variables is verified by assuming that global symbols remain the same throughout the compilation and with a weaker KMR inj.

5.2. Properties for Refining Simulation Conventions

We present properties necessary for refining the composed simulation conventions in Table 1.

5.2.1. Commutativity of KMRs and Structural Conventions.

Lemma 5.3.

For $\text{Z}\in\{\text{CL},\text{LM},\text{MA}\}$ and $K\in\{\text{ext},\text{inj},\text{injp}\}$ we have $\text{X}_{K}\boldsymbol{\cdot}\text{Z}\sqsubseteq\text{Z}\boldsymbol{\cdot}\text{Y}_{K}$ .

This lemma is provided by CompCertO (Koenig and Shao, 2021). X and Y denote the simulation conventions for the source and target languages of Z, respectively (e.g., $\text{X}=\text{c}$ and $\text{Y}=\text{ltl}$ when $\text{Z}=\text{CL}$ ). If $K=\text{injp}$ we get $\text{c}_{\text{injp}}\boldsymbol{\cdot}\text{CL}\sqsubseteq\text{CL}\boldsymbol{\cdot}\text{ltl}_{\text{injp}}$ . This lemma indicates at the outgoing (incoming) side a convention lower (higher) than CL, LM and MA may be lifted over them to a higher position (pushed down to a lower position).

5.2.2. Absorption of KMRs into injp

The lemma below is needed for absorbing KMRs into injp:

Lemma 5.4.

For any $\mathbb{R}$ , $(1)\mathbb{R}_{\text{injp}}\boldsymbol{\cdot}\mathbb{R}_{\text{injp}}\equiv\mathbb{R}_{\text{injp}}$ $\;(2)\mathbb{R}_{\text{injp}}\sqsubseteq\mathbb{R}_{\text{inj}}$ $\;(3)\mathbb{R}_{\text{injp}}\boldsymbol{\cdot}\mathbb{R}_{\text{inj}}\boldsymbol{\cdot}\mathbb{R}_{\text{injp}}\sqsubseteq\mathbb{R}_{\text{injp}}$

[TABLE]

The simulation convention $\mathbb{R}$ is parameterized over a KMR. Property (1) is a direct consequence of $\text{injp}\boldsymbol{\cdot}\text{injp}\equiv\text{injp}$ , which is critical for merging simulations using injp. The remaining ones either depend on transitivity of injp, or trivially hold as shown by Koenig and Shao (2021).

5.2.3. Composition of Semantic Invariants

Lastly, we also need to handle the two semantic invariants ro and wt. They cannot be absorbed into injp because their assumptions are fundamentally different. Therefore, our goal is to permute them to the top-level and merge any duplicated copies. The following lemmas enable elimination and permutation of wt:

Lemma 5.5.

For any $\mathbb{R}_{K}:\mathcal{C}\Leftrightarrow\mathcal{C}$ , we have $(1)\mathbb{R}_{K}\boldsymbol{\cdot}\text{wt}\equiv\text{wt}\boldsymbol{\cdot}\mathbb{R}_{K}\boldsymbol{\cdot}\text{wt}$ and $(2)\mathbb{R}_{K}\boldsymbol{\cdot}\text{wt}\equiv\text{wt}\boldsymbol{\cdot}\mathbb{R}_{K}$ .

ro is more difficult to handle as it does not commute with arbitrary simulation conventions. To eliminate redundant ro, we piggyback ro onto injp and prove the following transitivity property:

Lemma 5.6.

$\text{ro}\boldsymbol{\cdot}\text{c}_{\text{injp}}\equiv\text{ro}\boldsymbol{\cdot}\text{c}_{\text{injp}}\boldsymbol{\cdot}\text{ro}\boldsymbol{\cdot}\text{c}_{\text{injp}}$ **

Its proof follows the same steps for proving $\text{c}_{\text{injp}}\equiv\text{c}_{\text{injp}}\boldsymbol{\cdot}\text{c}_{\text{injp}}$ with additional reasoning for establishing properties of ro. A graphic presentation of the proof is given in Fig. 18 which mirrors Fig. 11. We focus on explaining the additional reasoning and have omitted the $\leadsto_{\text{injp}}$ relations and the worlds for injp in Fig. 18. Note that by definition the worlds $(se,m)$ for ro do not evolve like those for injp. A red circle around a memory state $m$ indicates it is required to prove ro-valid in $\mathbb{R}_{\text{ro}}^{q}$ holds for $m$ . The mem-acc relations over dashed arrows are the properties over replies in $\mathbb{R}_{\text{ro}}^{r}$ and must also be verified.

The above additional properties are proved based on two observations. First, the properties for queries (i.e., ro-valid) are propagated in refinement along with copying of memory states. For example, to prove the refinement in Fig. 18(a), we are given $\text{ro-valid}(se_{1},m_{1}^{q})$ and $\text{ro-valid}(se_{2},m_{2}^{q})$ according to the initial $\mathbb{R}_{\text{ro}}^{q}$ relations. By choosing $(se_{1},m_{1}^{q})$ to be the world for the composed $\mathbb{R}_{\text{ro}}^{q}$ , $\text{ro-valid}(se_{1},m_{1}^{q})$ holds trivially for $m_{1}^{q}$ in the circle. To prove the refinement in Fig. 18(b), we need to prove that the interpolating memory state after the initial decomposition satisfies $\mathbb{R}_{\text{ro}}^{q}$ . By choosing $m_{1}^{q}$ to be this state (in the middle circle in Fig. 18(b) and according to the proof of Lemma 4.4), $\text{ro-valid}(se_{1},m_{1}^{q})$ follows directly from the initial assumption. Second, the properties for replies (i.e., mem-acc) have already been encoded into $\leadsto_{\text{injp}}$ by Definition 3.3. For example, $m_{2}^{r}$ in Fig. 18(a) is constructed by following exactly Lemma 4.3. Therefore, $\text{mem-acc}(m_{2}^{q},m_{2}^{r})$ trivially holds.

Finally, at the top level, we need ro and wt to commute which is straightforward to prove:

Lemma 5.7.

$\text{ro}\boldsymbol{\cdot}\text{wt}\equiv\text{wt}\boldsymbol{\cdot}\text{ro}$ **

5.3. Proving the Direct Open Simulation for CompCert

We first insert self-simulations into the compiler passes, as shown in Table 1. This is to supply extra $\mathbb{R}_{\text{inj}}$ , $\mathbb{R}_{\text{injp}}$ , and ro for absorbing $\mathbb{R}_{\text{ext}}$ ( $\mathbb{R}_{\text{inj}}$ ) into $\mathbb{R}_{\text{inj}}$ ( $\mathbb{R}_{\text{injp}}$ ) by properties in Lemma 5.4 and for transitive composition of ro. Self-simulations are obtained by the following lemma:

Theorem 5.8.

If $p$ is a program written in Clight or RTL and $\mathbb{R}\in\{\text{ro},\text{c}_{\text{ext}},\text{c}_{\text{inj}},\text{c}_{\text{injp}}\}$ , or $p$ is written in Asm and $\mathbb{R}\in\{\text{asm}_{\text{ext}},\text{asm}_{\text{inj}},\text{asm}_{\text{injp}}\}$ , then $[\![p]\!]\leqslant_{\mathbb{R}\twoheadrightarrow\mathbb{R}}[\![p]\!]$ holds.

We unify the conventions at the incoming and outgoing sides. We start with the simulation $L_{1}\leqslant_{\mathbb{R}\twoheadrightarrow\mathbb{S}}L_{2}$ which is the transitive composition of compiler passes in Table 1 where

[TABLE]

We then find two sequences of refinements $\mathbb{C}\sqsubseteq\mathbb{R}_{n}\sqsubseteq\ldots\sqsubseteq\mathbb{R}_{1}\sqsubseteq\mathbb{R}$ and $\mathbb{S}\sqsubseteq\mathbb{S}_{1}\sqsubseteq\ldots\sqsubseteq\mathbb{S}_{m}\sqsubseteq\mathbb{C}$ , by which and Theorem 3.5 we get the simulation $L_{1}\leqslant_{\mathbb{C}\twoheadrightarrow\mathbb{C}}L_{2}$ . The direct simulation convention is $\mathbb{C}=\text{ro}\boldsymbol{\cdot}\text{wt}\boldsymbol{\cdot}\text{CAinjp}\boldsymbol{\cdot}\text{asm}_{\text{injp}}$ . ro enables optimizations at C level while wt ensures well-typedness. The definition of CAinjp has already been discussed informally in §2.1; its formal definition is given as follows. Note that, to simplify the presentation, we have omitted minor constraints such as function values should not be undefined, stack pointers must have a pointer type, etc. Interested readers should consult the our artifact for a complete definition.

Definition 5.9.

$\text{CAinjp}:\mathcal{C}\Leftrightarrow\mathcal{A}=\langle W_{\text{CAinjp}},\mathbb{R}_{\text{CAinjp}}^{q},\mathbb{R}_{\text{CAinjp}}^{r}\rangle$ where $W_{\text{CAinjp}}=(W_{\text{injp}}\times\text{sig}\times\text{regset})$ and ${\mathbb{R}_{\text{CAinjp}}^{q}:\mathcal{K}_{W_{\text{CAinjp}}}(\mathcal{C}^{q},\mathcal{A}^{q})}$ and ${\mathbb{R}_{\text{CAinjp}}^{r}:\mathcal{K}_{W_{\text{CAinjp}}}(\mathcal{C}^{r},\mathcal{A}^{r})}$ are defined as:

•

$(v_{f}[\mathit{sg}](\vec{v})@m_{1},\mathit{rs}@m_{2})\in\mathbb{R}_{\text{CAinjp}}^{q}((j,m_{1},m_{2}),\mathit{sg},\mathit{rs})$ if

[TABLE]

${\text{get-args}(\mathit{sg},\mathit{rs}(\text{RSP}),\mathit{rs},m_{2})}$ is a list of values for arguments at the assembly level obtained by inspecting locations for arguments in $\mathit{rs}$ and $m_{2}$ corresponding to the signature $\mathit{sg}$ which are determined by CompCert’s calling convention. $\text{outgoing-arguments}\allowbreak(\mathit{sg},\mathit{rs}(\text{RSP}))$ is a set of addresses on the stack frame for outgoing function arguments computed from the given signature $\mathit{sg}$ and the value of stack pointer.

•

$(r@m_{1}^{\prime},\mathit{rs}^{\prime}@m_{2}^{\prime})\in\mathbb{R}_{\text{CAinjp}}^{r}((j,m_{1},m_{2}),\mathit{sg},\mathit{rs})$ if there is a $j^{\prime}$ s.t.

[TABLE]

${\text{get-result}(\mathit{sg},\mathit{rs}^{\prime})}$ is the return value stored in a register designated by CompCert’s calling convention for the given signature $\mathit{sg}$ . callee-save-regs is the set of callee-save registers.

The last $\text{asm}_{\text{injp}}$ is irrelevant as assembly code is self-simulating by Theorem 5.8. The final correctness theorem is shown below:

Theorem 5.10.

Compilation in CompCert is correct in terms of open simulations,

[TABLE]

We explain how the refinements are carried out at both sides. The following is the sequence of refined simulation conventions $\mathbb{C}\sqsubseteq\mathbb{R}_{n}\sqsubseteq\ldots\sqsubseteq\mathbb{R}_{1}\sqsubseteq\mathbb{R}$ at the outgoing side. It begins with $\mathbb{R}$ and ends with $\mathbb{C}$ .

[TABLE]

In each line, the letters in red are simulation conventions transformed by the refinement operation at that step. In step (1), we merge consecutive simulation conventions by applying property (1) in Lemma 5.4 for $\text{c}_{\text{injp}}$ and properties (5-7) to compose $\text{c}_{\text{ext}}$ and absorb it into $\text{c}_{\text{inj}}$ . We also apply Lemma 5.6 to merge consecutive $\text{ro}\boldsymbol{\cdot}\text{c}_{\text{injp}}$ . In steps (2-3), we move and eliminate wt by Lemmas 5.5 and 5.7. In step (4), we lift conventions over CL, LM and MA to higher positions by Lemma 5.3. In step (5), we absorb $\text{c}_{\text{ext}}$ into $\text{c}_{\text{inj}}$ again and further turns $\text{c}_{\text{inj}}$ into $\text{c}_{\text{injp}}$ by applying $\text{c}_{\text{injp}}\sqsubseteq\text{c}_{\text{inj}}$ (property (2) in Lemma 5.4). In step (6), we compose $\text{c}_{\text{injp}}$ by applying $\text{c}_{\text{injp}}\equiv{\text{c}_{\text{injp}}}\boldsymbol{\cdot}{\text{c}_{\text{injp}}}$ . In step (7), we apply Lemma 5.6 again to eliminate the second $\text{ro}\boldsymbol{\cdot}\text{c}_{\text{injp}}$ . In step (8), we commute the two semantic invariants of the source semantics by Lemma 5.7. Finally, we merge $\text{c}_{\text{injp}}$ with ${\text{CL}\boldsymbol{\cdot}\text{LM}\boldsymbol{\cdot}\text{MA}}$ into CAinjp in step (9).

The original simulation conventions at the incoming side are parameterized by inj which does not have memory protection as in injp. One can modify the proofs of CompCert to make injp an incoming convention. However, we show that this is unnecessary: with the inserted self-simulations over injp, conventions over inj may be absorbed into them. The following is the refinement sequence $\mathbb{S}\sqsubseteq\mathbb{S}_{1}\sqsubseteq\ldots\sqsubseteq\mathbb{S}_{m}\sqsubseteq\mathbb{C}$ that realizes this idea.

[TABLE]

Steps (1-3) are the same as for the outgoing side except for using property (4) in Lemma 5.4. In step (4), we split $\text{c}_{\text{injp}}$ into two, one will be used to absorb the $\text{asm}_{\text{inj}}$ at the target level. In step (5), we push all simulation conventions parameterized over KMRs starting with the second split $\text{c}_{\text{injp}}$ to target level by Lemma 5.3. In step (6), we absorb $\text{asm}_{\text{ext}}$ into $\text{asm}_{\text{inj}}$ by properties (5-7) in Lemma 5.4. In step (7), we compose the consecutive $\text{c}_{\text{inj}}$ and $\text{asm}_{\text{inj}}$ by $\mathbb{R}_{\text{inj}}\boldsymbol{\cdot}\mathbb{R}_{\text{inj}}\sqsubseteq\mathbb{R}_{\text{inj}}$ ( property (4) in Lemma 5.4). In step (8), we absorb inj into injp at both levels by property (3) in Lemma 5.4. In step (9), we eliminate a redundant $\text{ro}\boldsymbol{\cdot}\text{c}_{\text{injp}}$ by Lemma 5.6. The last two steps are the same as above.

6. End-to-End Verification of Heterogeneous Modules

In this section, we give a formal account of end-to-end verification of heterogeneous modules based on direct refinements. The discussion focuses on the running example in Fig. 4 and its variants. More detailed development of those examples can be found in Appendix B. We also develop an additional example adapted from CompCertM in Appendix C.

6.1. Refinement for the Hand-written Server

We use server_opt.s instead of server.s to illustrate how optimizations are enabled by ro. The proof for the unoptimized server is similar with only minor adjustments. A formal definition of LTS for $L_{\texttt{S}}$ is given below and its transition diagram is given in Fig. 19(a).

Definition 6.1.

LTS of $L_{\texttt{S}}$ :

[TABLE]

The LTS has four internal states as depicted in Fig. 19(a). Initialization is encoded in $I_{S}$ . If the incoming query $q^{I}$ contains a function pointer $\text{Vptr}(b_{e},0)$ which points to encrypt, $L_{\texttt{S}}$ enters $\text{Calle}\;i\;v_{f}\;m$ where $i$ and $v_{f}$ are its arguments. The first internal transition allocates the stack frame $\mathit{sp}$ and stores the result of encryption $i\;\text{XOR}\;m[b_{k}]$ in $\mathit{sp}$ where $b_{k}$ contains key. Then, it enters Callp which is the state before calling process. If the pointer $v_{f}=\text{Vptr}(b_{p},0)$ of the current state points to an external function, $L_{\texttt{S}}$ issues an outgoing C query $q^{O}$ with a pointer to its stack frame as its argument. After the external call, $Y_{S}$ updates the memory with the reply and enters Retp. The second internal transition frees $sp$ and enters Rete and finally returns. Note that complete semantics of $L_{\texttt{S}}$ is accompanied by a local symbol table which determines the initial value of global variables (key) and asserts that it is a constant (read-only). The only difference between the specifications for server_opt.s and server.s is whether key is a constant in the symbol table. The semantics of assembly module $[\![\text{server\_opt.s}]\!]$ is given by CompCertO whose transition diagram is shown in Fig. 19(b). All the states, including queries and replies, are composed of register sets and memories.

Now, we need to prove the following forward simulation. The most important points of the proof are how ro enables optimizations and how injp preserves memory across external calls.

Theorem 6.2.

${L_{\texttt{S}}}\leqslant_{\mathbb{C}}[\![\text{server\_opt.s}]\!]$ .

At the top level, we expand $\mathbb{C}$ to $\text{ro}\boldsymbol{\cdot}\text{wt}\boldsymbol{\cdot}\text{CAinjp}\boldsymbol{\cdot}\text{asm}_{\text{injp}}$ and switch the order of ro and wt by Lemma 5.7. By the vertical compositionality (Theorem 3.4), we first establish $L_{\texttt{S}}\leqslant_{\text{wt}}L_{\texttt{S}}$ with the well-typed outgoing arguments and return value. $[\![\text{server\_opt.s}]\!]\leqslant_{\text{asm}_{\text{injp}}}[\![\text{server\_opt.s}]\!]$ is proved by Theorem 5.8.

We are left with proving ${L_{\texttt{S}}}\leqslant_{\text{ro}\boldsymbol{\cdot}\text{CAinjp}}[\![\text{server\_opt.s}]\!]$ . That is, we need to show the simulation diagram in Fig. 20 holds where $\mathbb{S}=\text{ro}\boldsymbol{\cdot}\text{CAinjp}$ (which mirrors Fig. 6). Here, the given assumptions and the conclusions to be proved are represented as black and red arrows, respectively. For the proof, we need an invariant $R\in\mathcal{K}_{W_{\text{ro}\boldsymbol{\cdot}\text{CAinjp}}}(S_{S},\text{regset}\times\text{mem})$ . The most important point is that ro and injp play essential roles in establishing the invariant. First, ro-valid is propagated from the initial source query $q_{\mathcal{C}}^{I}$ to internal program states. This guarantees that the value of key read from the source memory states is always $42$ , hence matching the constant in Pxori 42 RDI in server_opt.s. Second, injp is essential for deriving that memory locations in the target stack frame with offset $o$ ( $o<8$ or $16\leq o$ ) are unchanged since they are designated out-of-reach by $R$ . Therefore, the private stack values of the server are protected. For the unoptimized server, the only difference is that we decompose ${L_{\texttt{S}}}\leqslant_{\text{ro}\boldsymbol{\cdot}\text{CAinjp}}[\![\text{server.s}]\!]$ into ${L_{\texttt{S}}}\leqslant_{\text{ro}}{L_{\texttt{S}}}$ which trivially holds and ${L_{\texttt{S}}}\leqslant_{\text{CAinjp}}[\![\text{server.s}]\!]$ which can be proved without the help of ro.

6.2. End-to-end Correctness Theorem

We first prove the following source-level refinement where ${L_{\texttt{CS}}}$ is the top-level specification. Its proof follows the same pattern as Theorem 6.2 but is considerably simpler because the source and target semantics share the same $\mathcal{C}$ interface.

Lemma 6.3.

${L_{\texttt{CS}}}\leqslant_{\text{ro}\boldsymbol{\cdot}\text{wt}\boldsymbol{\cdot}\text{c}_{\text{injp}}}[\![\text{client.c}]\!]\oplus{L_{\texttt{S}}}$ .

We then prove the forward simulation between the top-level specification and the linked assembly as depicted in Fig. 4, which is immediate from the horizontal compositionality and adequacy for assembly described in §3.1.2, Theorem 5.10 and Theorem 6.2:

Lemma 6.4.

$[\![\text{client.c}]\!]\oplus L_{\texttt{S}}\leqslant_{\mathbb{C}}[\![\text{client.s}+\text{server\_opt.s}]\!]$ .

For end-to-end direct refinement, we need to absorb Lemma 6.3 into Lemma 6.4. The following theorem is easily derived by applying Lemmas 5.5, 5.6 and 5.7.

Lemma 6.5.

$\mathbb{C}\equiv\text{ro}\boldsymbol{\cdot}\text{wt}\boldsymbol{\cdot}\text{c}_{\text{injp}}\boldsymbol{\cdot}\mathbb{C}$ .

The final end-to-end simulation is immediate by vertically composing Lemma 6.3, Lemma 6.4 and refining the simulation convention using Lemma 6.5.

Theorem 6.6.

${L_{\texttt{CS}}}\leqslant_{\mathbb{C}}[\![\text{client.s}+\text{server\_opt.s}]\!]$ .

6.3. Verification of the Mutually Recursive Client and Server

We introduce a variant of the running example with mutual recursion in Fig. 23. The server remains the same while the client is changed. request itself is passed as a callback function to encrypt, resulting in recursive calls to encrypt for encrypting and storing an array of values. To perform the same end-to-end verification for this example, we only need to define a new top-level specification ${L_{\texttt{CS}}}^{\prime}$ and prove ${L_{\texttt{CS}}}^{\prime}\leqslant_{\text{ro}\boldsymbol{\cdot}\text{wt}\boldsymbol{\cdot}\text{c}_{\text{injp}}}[\![\text{client.c}]\!]\oplus{L_{\texttt{S}}}$ . Other proofs are either unchanged (e.g., the refinement of the server) or can be derived from Theorem 5.10, full compositionality and adequacy. More detailed proofs can be found in Appendix B.

7. Generality and Limitations of Our Approach

We explain how our approach may be generalized to support other memory models, compilers and optimizations for first-order languages. We also discuss the limitations of our approach.

7.1. Supporting Different Memory Models and Compilers

At a high level, injp is simply a general and transitive relation on evolving functional memory invariants (represented as injections) enhanced with memory protection to guard against modification to private memory by external calls. Many other first-order memory models can be viewed as employing either a richer or a simplified version of injection as memory invariants and equipped with a similar notion of memory protection. For example, the memory model of CompCertS (Besson et al., 2015) extends injections to map symbolic values. The memory refinements in the $CH_{2}O$ memory model (Krebbers, 2016) function like injections except that pointer offsets are represented as abstract paths pointing into aggregated data structures. The memory model defined by Kang et al. (2015) explicitly divides a memory state into public and private memory. Its memory invariant is an equivalence relation between public source and target memory which is essentially an identity injection. Therefore, uniform KMRs may be defined for those memory models as variants of injp.

To prove the transitivity of these KMRs, the key is the construction of interpolating memory states after external calls as described in §4.2. This construction is based on the following general ideas: 1) as KMRs are transitively composed, more memory gets protected, 2) the private memory should be identical to the initial memory, and 3) the public memory should be projected from the updated source memory via memory invariants. As we can see, these ideas are applicable to any memory model with functional memory invariants and a notion of private memory. Therefore, our approach should work for the aforementioned memory models and compilers based on them.

7.2. Supporting Additional Optimization Passes

Given any new optimization pass whose additional rely-guarantee condition can be represented as a semantics invariant $I$ , we may piggyback $I$ onto an enriched injp to achieve direct refinement. To see that, note that any $I$ consists of two parts: a condition for initial queries (e.g., ro-valid in ro) and a condition for replies (e.g., mem-acc in ro). By extending injp to include the latter (just like that Definition 3.3 includes mem-acc), if the enriched injp is still transitive, then we can easily prove the following proposition which is generalized from Lemma 5.6.

Proposition 7.1.

For any $I:\mathcal{C}\Leftrightarrow\mathcal{C}$ and enriched injp, $I\boldsymbol{\cdot}\text{c}_{\text{injp}}\equiv I\boldsymbol{\cdot}\text{c}_{\text{injp}}\boldsymbol{\cdot}I\boldsymbol{\cdot}\text{c}_{\text{injp}}$ .

The proof follows exactly the steps for proving Lemma 5.6. It is based on the two observations we made near the end of §5.2.3, i.e., 1) the properties for initial queries of $I$ hold along with copying of memory states, and 2) the properties for replies trivially hold as they are part of the enriched injp.

7.3. Limitations

We discuss the limitations of our approach and possible solutions. First, it does not yet support behavior refinements for whole programs in CompCert (Leroy, 2023). This is a technical limitation and can be solved by reducing open simulations into closed simulations in CompCert. Second, the proof for Unusedglob assumes that global symbols for removed definitions are preserved as CompCertO’s simulation framework requires the same set of global symbols throughout compilation. We need to weaken this requirement to enable removal of global symbols by compilation. Third, open simulations assume given any input injection $j$ , the execution outputs some injection $j^{\prime}$ related to $j$ by injp. This may not work for memory models with fixed injection functions (Wang et al., 2022). A possible solution is to enrich injp to account for this fixed definition. Finally, given a new optimization, if its rely-guarantee condition cannot be described as a semantic invariant $I$ or if injp enriched with $I$ becomes intransitive, then direct refinements may not be derivable. In this case, we may need stronger restrictions on this optimization for our approach to work.

8. Evaluation and Related Work

Our Coq development took about 7 person-months and 18.3k lines of code (LOC) on top of CompCertO. We added 3.7k LOC to prove the transitivity of injp, 3k LOC to verify the compiler passes as discussed in §5.1, 1.2k LOC for composing simulation conventions as described in the rest of §5 and 7.3k LOC for the Client-Server examples. We also ported CompCertM’s example on mutually recursive summation (Song et al., 2020), which adds 3.1k LOC (Zhang et al., 2023b). For now, the cost of examples is relatively high. However, we observe that a lot of low-level proofs such as pointer arithmetic can be automated by proof scripts, many proofs with predictable patterns can be directly derived from the program structures, and a lot of duplicated lemmas in the examples can be eliminated. We will carry out those exercises in the future which should simplify the proofs significantly. Below we compare our work with other frameworks for VCC and program verification.

8.1. Verified Compositional Compilation for First-Order Languages

In this work, we are concerned with VCC of first-order imperative programs with global memory states and support of pointers. A majority of the work in this setting is based on CompCert. We compare them from the perspectives listed in the first column of Table 2. An answer that is not a simple “Yes” or “No” denotes that special constraints are enforced to support the given feature.

Compositional CompCert

CompComp supports VCC based on interaction semantics which is a specialized version of open semantics with C interfaces (Stewart et al., 2015). We have already talked about its merits and limitations in §1.2. It is interesting to note that CompComp can also be obtained based on our approach by adopting $\text{c}_{\text{injp}}$ for every compiler pass and exploiting the transitivity of $\text{c}_{\text{injp}}$ , which does not require the instrumentation of semantics in CompComp.

CompCertM

CompCertM supports adequacy and end-to-end verification of mixed C and assembly programs. A distinguishing feature of CompCertM is Refinement Under Self-related Contexts or RUSC (Song et al., 2020). A RUSC relation is a fixed collection of simulation relations. By exploiting contexts that are self-relating under all of these simulation relations, horizontal and vertical compositionality are achieved. However, refinements based on RUSC relations can be difficult to use as they are not extensional. For example, the complete open refinement relation $\leqslant_{R_{1}+\ldots+R_{9}}$ in CompCertM carries 9 RUSC relations $R_{1},\ldots,R_{9}$ (6 for compiler passes and 3 for source-level verification). To establish the refinement between a.s and its specification $L_{S}$ , one needs to prove $L_{S}$ are self-simulating over all 9 simulation relations. This can quickly get out of hand as more modules and more compiler passes are introduced. By contrast, we only need to prove direct refinement for once and the refinement is open to further horizontal or vertical composition. On the other hand, CompCertM supports behavior refinement of closed programs which we do not yet (See §7.3).

CompCertO

Vertical composition is a trivial pairing of simulations in CompCertO, which exposes internal compilation steps. CompCertO tries to alleviate this problem via ad-hoc refinement of simulation conventions. The resulting top-level convention is $\mathbb{C}_{\text{CCO}}=\mathcal{R}^{*}\cdot\text{wt}\cdot\text{CL}\cdot\text{LM}\cdot\text{MA}\cdot\text{asm}_{\text{vainj}}$ where $\mathcal{R}=\text{c}_{\text{injp}}+\text{c}_{\text{inj}}+\text{c}_{\text{ext}}+\text{c}_{\text{vainj}}+\text{c}_{\text{vaext}}$ is a sum of conventions parameterized over KMRs. In particular, $\text{c}_{\text{vaext}}$ is an ad-hoc combination of KMR and internal invariants for optimizations. $\mathcal{R}^{*}$ means that $\mathcal{R}$ may be repeated for an arbitrary number of times. Since the top-level summation of KMRs is similar to that in CompCertM, we need to go through a reasoning process similar to CompCertM, only more complicated because of the need to reason about internal invariants of optimizations in $\text{c}_{\text{vaext}}$ and indefinitely repeated combination of all the KMRs by $\mathcal{R}^{*}$ . Therefore, it is unknown if the correctness theorem of CompCertO suffices for end-to-end program verification.

CompCertX

CompCertX (Gu et al., 2015; Wang et al., 2019) realizes a weaker form of VCC that only allows assembly contexts to invoke C programs, but not the other way around. Therefore, it does not support horizontal composition of modules with mutual recursions. The compositionality and program verification are delegated to Certified Abstraction Layers (CAL) (Gu et al., 2015, 2018). Furthermore, CompCertX does not support stack-allocated data (e.g., our server example). However, its top-level semantic interface is similar to our interface, albeit not carrying a symmetric rely-guarantee condition. This indicates that our work is a natural evolution of CompCertX.

VCC for Concurrent Programs

VCC for concurrent programs needs to deal with multiple threads and their linking. CASCompCert is an extension of CompComp that supports compositional compilation of concurrency with no (or benign) data races (Jiang et al., 2019). To make CompComp’s approach to VCC work in a concurrent setting, CASCompCert imposes some restrictions including not supporting stack-allocated data and allowing only nondeterminism in scheduling threads. A recent advancement based on CASCompCert is about verifying concurrent programs (Zha et al., 2022) running on weak memory models using the promising semantics (Kang et al., 2017; Lee et al., 2020). We believe the ideas in CASCompCert are complementary to this work and can be combined with our approach to achieve VCC for concurrency with cleaner interface and less restrictions.

8.2. Verified Compositional Compilation for Higher-Order Languages

Another class of work on VCC focuses on compilation of higher-order languages. In this setting, the main difficulty comes from complex language features together with higher-order states. A prominent example is the Pilsner compiler (Neis et al., 2015) that compiles a higher-order language into some form of assembly programs. The technique Pilsner adopts is called parametric simulations that evolves from earlier work on reasoning about program equivalence via bisimulation (Hur et al., 2012a). Another line of work is multi-language semantics (Patterson and Ahmed, 2019; Patterson et al., 2017; Perconti and Ahmed, 2014; Scherer et al., 2018) where a language combining all source, intermediate and target languages is used to formalize semantics. Compiler correctness is stated as contextual equivalence or logical relations. It seems that our techniques are not directly applicable to those work because relations on higher-order states cannot deterministically fix the interpolating states. A possible solution is to divide the higher-order memory into a first-order and a higher-order part such that the former does not contain pointers to the latter (forming a closure). By encapsulating higher-order programs inside first-order states, we may be able to apply our approach.

The high-level ideas for constructing interpolating states for proving transitivity of injp can also be found in some of the work on program equivalence (Hur et al., 2012b; Ahmed, 2006). To the best of our knowledge, our approach is the first concrete implementation of these ideas that works for a realistic optimizing compiler for imperative languages with non-trivial memory models.

8.3. Frameworks for Compositional Program Verification

Researchers have proposed frameworks for compositional program verification based on novel semantics, refinements and separation logics (Gu et al., 2015, 2018; Xia et al., 2019; He et al., 2021; Chappe et al., 2023; Sammler et al., 2023; Song et al., 2023). These frameworks aim at broader program verification and may be combined with our approach to generate more flexible end-to-end verification techniques. For example, to support more flexible certified abstraction layers, we may combine our approach with data abstraction in CAL and extend horizontal linking to work with abstraction layers. It is not entirely clear whether their solutions can be successfully applied to or combined with VCC of realistic optimizing compilers like CompCert. However, comparing with these frameworks is still meaningful as it provides different perspectives and potential directions for improving our work. We discuss representative frameworks in these categories below.

DimSum

DimSum (Sammler et al., 2023) is a framework for multi-language program verification. Program semantics are defined as LTSs which emit events to communicate with the environment. The concept of events in DimSum is similar to the language interfaces in CompCertO and in our work. For verification of heterogeneous programs, it uses wrappers to relate the events between two languages (a C-like language called Rec and assembly in its paper), which is similar to simulation conventions. The rely-guarantee protocol is expressed in the wrappers by angelic non-determinism and memory protection is expressed in separation logic. On one hand, it is unclear if their framework can scale to realistic languages or compilers like CompCert. For example, it is interesting to investigate if their wrappers can support more complicated languages and compiler optimizations which can be handled by our framework and refinement relations. On the other hand, DimSim allows assembly modules that exploit a flat memory model. Therefore, their framework supports refinements between semantics using different memory models, which we do not support yet.

Conditional Contextual Refinement

Conditional Contextual Refinement (CCR) is a framework which combines contextual refinement and separation logics to achieve both conditional and composable verification of program semantics (Song et al., 2023). CCR employs separation logics to constrain the behavior of open modules to achieve horizontal and vertical composition of refinements, such separation logic wrapper plays a similar role as simulation conventions in this paper. On one hand, separation logics provide more fine-grained control of shared resources. On the other hand, they have specific requirements of contexts unlike the open protocols encoded in our direct refinements. The horizontal composition of two refinements requires specific knowledge of specifications of each other to control interaction. It is interesting to investigate if the program specific conditions imposed by CCR can be handled or piggybacked upon our framework.

9. Conclusion and Future Work

We have proposed an approach to compositional compiler correctness for first-order languages via direct refinements between source and target semantics at their native interfaces, which overcomes the limitations of the existing approaches on compositionality, adequacy and other important criteria for VCC. In the future, we plan to support behavior (trace) refinement for closed programs by reducing our open simulation into the whole-program correctness theorem for the original CompCert. We also plan to combine our work with refinement-based program verification like certified abstraction layers to support more substantial applications. Another research direction is to apply our approach to different memory models and compilers for first-order and higher-order languages, which will better test the limit of our approach and the usefulness of our discoveries.

Data-Availability Statement

The Coq artifact containing the formal developments described in this paper is available on Zenodo (Zhang et al., 2023a).

Acknowledgements.

We would like to thank our shepherd Yannick Zakowski and the anonymous referees for their helpful feedback which improved this paper significantly. This work is supported in part by the National Natural Science Foundation of China (NSFC) under Grant No. 62002217 and 62372290, and by the Natural Science Foundation of the United States (NSF) under Grant No. 1763399, 2019285, and 2313433. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the funding agencies.

Appendix A Transitivity of injp

A.1. More complete definitions of memory injection and injp accessibility

We have used a simplified version of definitions of perm, $\hookrightarrow_{m}$ and $\leadsto_{\text{injp}}$ in Sec. 4. To present a more detailed proof of the KMR with memory protection (injp), we present full definitions of perm and $\leadsto_{\text{injp}}$ . We also present a more complete definition of $\hookrightarrow_{m}$ which is still not 100% complete because we ignore two properties for simplicity. They are about alignment and range of size $\delta$ in mapping $j(b)=\lfloor(b^{\prime},\delta)\rfloor$ . They are not essential for this proof as preconditions and can be proved similarly as other properties of $\hookrightarrow_{m}$ . Readers interested in these details can find them in our artifact.

By the definition of CompCert memory model, a memory cell has both maximum and current permissions such that $\text{perm}_{\text{cur}}(m,p)\subseteq\text{perm}_{\text{max}}(m,p)$ . During the execution of a program, the current permission of a memory cell may be lowered or raised by an external call. However, the maximum permission can only decrease in both internal and external calls. This invariant was defined in CompCert as:

[TABLE]

Definition A.1.

Definition of memory injection $\hookrightarrow_{m}$ .

[TABLE]

Definition A.2.

Definition of memory accessibility mem-acc.

[TABLE]

For the complete definition of $\leadsto_{\text{injp}}$ , we further define the separation property for injection as:

[TABLE]

This invariant states that when we start from $m_{1}\hookrightarrow_{m}^{j}m_{2}$ , after executing on source and target semantics, the future injection $j^{\prime}$ only increases from $j$ by relating newly allocated blocks. Note that we write $b\in m$ for $b\in\text{validblock}(m)$ .

Definition A.3.

Accessibility relation of injp

[TABLE]

A.2. Auxiliary Properties

In this section we present several lemmas about properties of memory injection and injp accessibility. These lemmas are used in the proof of injp refinement.

Firstly, the memory injections are composable.

Lemma A.4.

Given $m_{1}\hookrightarrow_{m}^{j_{12}}m_{2}$ and $m_{2}\hookrightarrow_{m}^{j_{23}}m_{3}$ , we have

[TABLE]

This property is proved and used in CompCert, we do not repeat the proof here.

Lemma A.5.

Given $m_{1}\hookrightarrow_{m}^{j_{23}\cdot j_{12}}m_{3}$ , $(b_{1},o_{1})\in\text{perm}_{\text{cur}}(m_{1},\text{Readable})$ and $j_{23}\cdot j_{12}(b_{1})=\lfloor(b_{3},o_{3}-o_{1})\rfloor$ , then

[TABLE]

Note that $j_{23}\cdot j_{12}(b_{1})=\lfloor(b_{3},o_{3}-o_{1})\rfloor$ iff $\exists\;b_{2}\;o_{2},j_{12}(b_{1})=\lfloor(b_{2},o_{2}-o_{1})\rfloor\land j_{23}(b_{2})=\lfloor(b_{3},o_{3}-o_{2})\rfloor$ .

Proof.

According to property (2) in Definition A.1, we know that $m_{1}[b_{1},o_{1}]\hookrightarrow_{v}^{j_{23}\cdot j_{12}}m_{3}[b_{3},o_{3}]$ . We divide the value $m_{1}[b_{1},o_{1}]$ into:

•

If $m_{1}[b_{1},o_{1}]=\text{Vundef}$ , we take $v_{2}=\text{Vundef}$ . Then $\text{Vundef}\hookrightarrow_{v}^{j_{12}}\text{Vundef}\land\text{Vundef}\hookrightarrow_{v}^{j_{23}}m_{3}[b_{3},o_{3}]$ trivially holds.

•

If $m_{1}[b_{1},o_{1}]$ is a concrete value, we take $v_{2}=m_{1}[b_{1},o_{1}]$ . In such case we have $m_{1}[b_{1},o_{1}]=v_{2}=m_{3}[b_{3},o_{3}]$ .

•

If $m_{1}[b_{1},o_{1}]=\text{Vptr}(b_{1}^{\prime},o_{1}^{\prime})$ , we can derive that $\text{Vptr}(b_{1}^{\prime},o_{1}^{\prime})\hookrightarrow_{v}^{j_{23}\cdot j_{12}}m_{3}[b_{3},o_{3}]$ implies $\exists b_{3}^{\prime}\ o_{3}^{\prime},s.t.m_{3}[b_{3},o_{3}]=\text{Vptr}(b_{3}^{\prime},o_{3}^{\prime})$ and $j_{23}\cdot j_{12}(b_{1}^{\prime})=\lfloor(b_{3}^{\prime},o_{3}^{\prime}-o_{1}^{\prime})\rfloor$ . Therefore

[TABLE]

We take $v_{2}=\text{Vptr}(b_{2}^{\prime},o_{2}^{\prime})$ and $m_{1}[b_{1},o_{1}]\hookrightarrow_{v}^{j_{12}}v_{2}\land v_{2}\hookrightarrow_{v}^{j_{23}}m_{3}[b_{3},o_{3}]$ can be derived from the formula above.

∎

Lemma A.6.

Given $m_{1}\hookrightarrow_{m}^{j_{12}}m_{2}$ , $m_{2}\hookrightarrow_{m}^{j_{23}}m_{3}$ and $j_{23}(b_{2})=\lfloor(b_{3},o_{3}-o_{2})\rfloor$ . If $(b_{2},o_{2})\in\text{out-of-reach}(j_{12},m_{1})$ and $(b_{2},o_{2})\in\text{perm}_{\text{max}}(m_{2},\text{NA})$ , then

[TABLE]

Proof.

According to the definition of out-of-reach, If $j_{12}(b_{1})=\lfloor(b_{2}^{\prime},o_{2}^{\prime}-o_{1})\rfloor$ and $j_{23}(b_{2}^{\prime})=\lfloor(b_{3},o_{3}-o_{1})\rfloor$ , we need to prove that $(b_{1},o_{1})\notin\text{perm}_{\text{max}}(m_{1},\text{NA})$ . If $b_{2}=b_{2}^{\prime}$ , from $(b_{2},o_{2})\in\text{out-of-reach}(j_{12},m_{1})$ we can directly prove $(b_{1},o_{1})\notin\text{perm}_{\text{max}}(m_{1},\text{NA})$ .

If $b_{2}\neq b_{2}^{\prime}$ , we assume that $(b_{1},o_{1})\in\text{perm}_{\text{max}}(m_{1},\text{NA})$ , by property (1) of $m_{1}\hookrightarrow_{m}^{j_{12}}m_{2}$ we get $(b_{2}^{\prime},o_{2}^{\prime})\in\text{perm}_{\text{max}}(m_{2},\text{NA})$ . Now $(b_{2},o_{2})$ and $(b_{2}^{\prime},o_{2}^{\prime})$ are two different positions in $m_{2}$ which are mapped to the same position $(m_{3},o_{3})$ in $m_{3}$ . This scenario is prohibited by the non-overlapping property (5) of $m_{2}\hookrightarrow_{m}^{j_{23}}m_{3}$ . So $(b_{1},o_{1})\notin\text{perm}_{\text{max}}(m_{1},\text{NA})$ . ∎

A.3. Proof of Lemma 4.3

Based on definitions and lemmas before, we prove Lemma 4.3 in this section:

[TABLE]

Given $m_{1}\hookrightarrow_{m}^{j_{12}}m_{2}$ and $m_{2}\hookrightarrow_{m}^{j_{23}}m_{3}$ . We take $j_{13}=j_{23}\cdot j_{12}$ , from Lemma A.4 we can prove $m_{1}\hookrightarrow_{m}^{j_{13}}m_{3}$ . After the external call, given $(j_{13},m_{1},m_{3})\leadsto_{\text{injp}}(j_{13}^{\prime},m_{1}^{\prime},m_{3}^{\prime})$ and $m_{1}^{\prime}\hookrightarrow_{m}^{j_{13}^{\prime}}m_{3}^{\prime}$ .

We present the construction and properties of $j_{12}^{\prime},j_{23}^{\prime}$ and $m_{2}^{\prime}$ in Sec. A.3.1. Then the proof reduce to prove $m_{1}^{\prime}\hookrightarrow_{m}^{j_{12}^{\prime}}m_{2}^{\prime}$ , $m_{2}^{\prime}\hookrightarrow_{m}^{j_{23}^{\prime}}m_{3}^{\prime}$ , $(j_{12},m_{1},m_{2})\leadsto_{\text{injp}}(j_{12}^{\prime},m_{1}^{\prime},m_{2}^{\prime})$ and $(j_{23},m_{2},m_{3})\allowbreak\leadsto_{\text{injp}}(j_{23}^{\prime},m_{2}^{\prime},m_{3}^{\prime})$ , they are proved in Sec. A.3.2

A.3.1. Construction and properties of $j_{12}^{\prime}$ , $j_{23}^{\prime}$ and $m_{2}^{\prime}$

Definition A.7.

We construct the memory state $m_{2}^{\prime}$ by the following three steps, $j_{12}^{\prime}$ and $j_{23}^{\prime}$ are constructed in step (1).

(1)

We first extend $m_{2}$ by allocating new blocks, at the same time we extend $j_{12},j_{23}$ to get $j_{12}^{\prime}$ and $j_{23}^{\prime}$ such that $j_{13}^{\prime}=j_{23}^{\prime}\cdot j_{12}^{\prime}$ . Specifically, for each new block $b_{1}$ in $m_{1}^{\prime}$ relative to $m_{1}$ which is mapped by $j_{13}^{\prime}$ as $j_{13}^{\prime}(b_{1})=\lfloor(b_{3},\delta)\rfloor$ , we allocate a new memory block $b_{2}$ from $m_{2}$ and add new mappings $(b_{1},(b_{2},0))$ and $(b_{2},(b_{3},\delta))$ to $j_{12}$ and $j_{23}$ , respectively. 2. (2)

We then copy the contents of new blocks in $m_{1}^{\prime}$ into corresponding new blocks in $m_{2}^{\prime}$ as follows. For each mapped new block $b_{1}$ in $m_{1}^{\prime}$ where $j_{12}^{\prime}(b_{1})=\lfloor(b_{2},0)\rfloor$ , we enumerate all positions $(b_{1},o_{1})\in\text{perm}_{\text{max}}(m_{1}^{\prime},\text{NA})$ and copy the permission of $(b_{1},o_{1})$ in $m_{1}^{\prime}$ to $(b_{2},o_{1})$ in $m_{2}^{\prime}$ . If $(b_{1},o_{1})\in\text{perm}_{\text{cur}}(m_{1}^{\prime},\text{Readable})$ , we further set $m_{2}^{\prime}[b_{2},o_{1}]$ to $v_{2}$ where $m_{1}^{\prime}[b_{1},o_{1}]\hookrightarrow_{m}^{j_{12^{\prime}}}v_{2}$ . The existence of $v_{2}$ here is provided by Lemma A.5 with preconditions $m_{1}^{\prime}\hookrightarrow_{m}^{j_{13}^{\prime}}m_{3}^{\prime}$ , $(b_{1},o_{1})\in\text{perm}_{\text{cur}}(m_{1}^{\prime},\text{Readable})$ and $j_{13}^{\prime}(b_{1})=\lfloor(b_{3},\delta)\rfloor$ (because $b_{1}$ is a new block chosen in step (1)). 3. (3)

Finally, we update the old blocks of $m_{2}$ . If a position $(b_{2},o_{2})\in\text{pub-tgt-mem}(j_{12},m_{1})\cap\text{pub-src-mem}(j_{23})$ , the permission and value of this position in $m_{2}^{\prime}$ should comes from the corresponding position $(b_{1},o_{1})$ in $m_{1}^{\prime}$ as depicted in Fig. 15(b). Note that the values are changed only if the position is not read-only in $m_{2}$ . Other positions just remain unchanged from $m_{2}$ to $m_{2}^{\prime}$ . To complete the construction, we have to enumerate the set $\text{pub-tgt-mem}(j_{12},m_{1})\cap\text{pub-src-mem}(j_{23})$ . We state that

[TABLE]

where $\text{perm}_{\text{max}}(m_{2},\text{NA})$ is enumerable. Note that $(b_{2},o_{2})\in\text{pub-tgt-mem}(j_{12},m_{1})\allowbreak\Leftrightarrow(b_{2},o_{2})\notin\text{out-of-reach}(j_{12},m_{1})$ by definition. If $(b_{2},o_{2})\in\text{pub-tgt-mem}(j_{12},m_{1})$ , then there exists $(b_{1},o_{1})\in\text{perm}_{\text{max}}(m_{1},\text{NA})$ such that $j_{12}(b_{1})=\lfloor(b_{2},o_{2}-o_{1})\rfloor$ . The property (1) of $m_{1}\hookrightarrow_{m}^{j_{12}}m_{2}$ ensures that $(b_{2},o_{2})\in\text{perm}_{\text{max}}(m_{2},\text{NA})$ .

The concrete algorithm can be described as follows. For $(b_{2},o_{2})\in\text{perm}_{\text{max}}(m_{2},\text{None-}\allowbreak\text{empty})$ , we can enumerate $\text{perm}_{\text{max}}(m_{1},\text{NA})$ to find whether there exists a corresponding position $(b_{1},o_{1})\in\text{perm}_{\text{max}}(m_{1},\text{NA})$ such that $j_{12}(b_{1})=\lfloor(b_{2},o_{2}-o_{1})\rfloor$ . Note that the property (3) of $m_{1}\hookrightarrow_{m}^{j_{12}}m_{2}$ ensures that we cannot find more than one of such position. If there exists such $(b_{1},o_{1})$ and $j_{23}(b_{2})\neq\lfloor(b_{3},o_{3})\rfloor$ , We copy the permission of position $(b_{1},o_{1})$ in $m_{1}^{\prime}$ to $(b_{2},o_{2})$ . If $(b_{1},o_{1})\in\text{perm}_{\text{cur}}(m_{1}^{\prime},\text{Readable})$ and $(b_{2},o_{2})\in\text{perm}_{\text{max}}(m_{2},\text{Writable})$ , we further set $m_{2}^{\prime}[b_{2},o_{2}]$ to $v_{2}$ where $m_{1}^{\prime}[b_{1},o_{1}]\hookrightarrow_{v}^{j_{12^{\prime}}}v_{2}$ .

We present several lemmas about $j_{12}^{\prime},j_{23}^{\prime}$ and $m_{2}^{\prime}$ according to Definition A.7 as follows.

Lemma A.8.

[TABLE]

Proof.

Directly from the construction step (1) ∎

Lemma A.9.

[TABLE]

Proof.

For each changed position $(b_{2},o_{2})$ from $m_{2}$ to $m_{2}^{\prime}$ in step (3), we enforce that $\exists b_{1},j_{12}(b_{1})=\lfloor(b_{2},o_{2}-o_{1})\rfloor\land(b_{1},o_{1})\in\text{perm}_{\text{max}}(m_{1},\text{NA})$ and $(b_{2},o_{2})\notin\text{unmapped}(j_{23})$ . Thus, if $(b_{2},o_{2})\in\text{out-of-reach}(j_{12},m_{1})$ or $(b_{2},o_{2})\in\text{unmapped}(j_{23})$ , then $(b_{2},o_{2})\in\text{unchanged-}\allowbreak\text{on}(m_{2},m_{2}^{\prime})$ .

∎

Lemma A.10.

[TABLE]

Proof.

For unchanged position $(b_{2},o_{2})$ in $m_{2}$ , we trivially have $(b_{2},o_{2})\in\text{perm}_{\text{max}}(m_{2}^{\prime},p)\Leftrightarrow(b_{2},o_{2})\in\text{perm}_{\text{max}}(m_{2},p)$ . If $(b_{2},o_{2})$ is changed in step (3), then the permission of $(b_{2},o_{2})$ in $m_{2}^{\prime}$ is copied from some corresponding position $(b_{1},o_{1})$ in $m_{1}^{\prime}$ ( $j_{12}(b_{1})=\lfloor(b_{2},o_{2}-o_{1})\rfloor$ ). Given $(b_{2},o_{2})\in\text{perm}_{\text{max}}(m_{2}^{\prime},p)$ , we get $(b_{1},o_{1})\in\text{perm}_{\text{max}}(m_{1}^{\prime},p)$ . Form $\text{max-perm-dec}(m_{1},m_{1}^{\prime})$ we can further derive that $(b_{1},o_{1})\in\text{perm}_{\text{max}}(m_{1},p)$ . Finally, by property (1) of $m_{1}\hookrightarrow_{m}^{j_{12}}m_{2}$ we can conclude that $(b_{2},o_{2})\in\text{perm}_{\text{max}}(m_{2},p)$ . ∎

Lemma A.11.

[TABLE]

Proof.

For each position $(b_{2},o_{2})$ which has changed value from $m_{2}$ to $m_{2}^{\prime}$ in step (3). We enforce that it is not read-only in $m_{2}$ . ∎

Lemma A.12.

[TABLE]

Proof.

From step(1) we have $m_{2}\subseteq m_{2}^{\prime}$ . Together with Lemma A.10 and Lemma A.11 we can derive this lemma. ∎

A.3.2. Proof of remaining formulas

Recall that we are still proving Lemma 4.3, we have constructed $j_{12}^{\prime},j_{23}^{\prime}$ and $m_{2}^{\prime}$ . Based on the construction and properties of them presented above, we present complete proofs of last four formulas separately in this section.

Lemma A.13.

$m_{1}^{\prime}\hookrightarrow_{m}^{j_{12}^{\prime}}m_{2}^{\prime}$ **

Proof.

We check the properties in Definition A.1 as follows:

(1)

Given $j_{12}^{\prime}(b_{1})=\lfloor(b_{2},o_{2}-o_{1})\rfloor\land(b_{1},o_{1})\in\text{perm}_{k}(m_{1}^{\prime},p)$ . We prove $(b_{2},o_{2})\in\text{perm}_{k}(m_{2}^{\prime},p)$ by cases of $j_{12}(b_{1})$ . Note that $j_{12}(b_{1})$ is either $\emptyset$ or the same as $j^{\prime}_{12}(b_{1})$ because of $j_{12}\subseteq j_{12}^{\prime}$ .

•

If $j_{12}(b_{1})=\emptyset$ , the mapping $j_{12}^{\prime}(b_{1})=\lfloor(b_{2},o_{2}-o_{1})\rfloor$ is added in step (1). As a result, we know $\exists b_{3}\;\delta,j_{13}^{\prime}(b_{1})=\lfloor b_{3},\delta\rfloor$ . Since $\text{perm}_{k}(m_{1}^{\prime},p)\subseteq\text{perm}_{\text{max}}(m_{1}^{\prime},\text{NA})$ , we know $(b_{1},o_{1})\in\text{perm}_{\text{max}}(m_{1}^{\prime},\text{NA})$ and the permission of $(b_{1},o_{1})$ in $m_{1}^{\prime}$ is copied to $(m_{2},o_{2})$ in $m_{2}^{\prime}$ in step (2). Therefore $(b_{2},o_{2})\in\text{perm}_{k}(m_{2}^{\prime},p)$ .

•

If $j_{12}(b_{1})=\lfloor(b_{2},o_{2})\rfloor$ , we further divide whether $(b_{2},o_{2})$ is a public position by $j_{23}(b_{2})$

–

If $j_{23}(b_{2})=\emptyset$ , i.e. $(b_{2},o_{2})\in\text{unmapped}(j_{23})$ , According to Lemma A.9, we know $(b_{2},o_{2})\in\text{unchanged-on}(m_{2},m_{2}^{\prime})$ . At the same time, we also get $(b_{1},o_{1})\in\text{unmapped}(j_{13})$ because of $j_{13}=j_{23}\cdot j_{12}$ . Together with $(j_{13},m_{1},m_{3})\leadsto_{\text{injp}}(j_{13}^{\prime},m_{1}^{\prime},m_{3}^{\prime})$ , we can conclude that $(b_{1},o_{1})\in\text{unchanged-}\allowbreak\text{on}(m_{1},m_{1}^{\prime})$ .

Therefore, we get $(b_{1},o_{1})\in\text{perm}_{k}(m_{1},p)$ . Using property (1) of $m_{1}\hookrightarrow_{m}^{j_{12}}m_{2}$ we get $(b_{2},o_{2})\in\text{perm}_{k}(m_{2},p)$ . Since $(b_{2},o_{2})$ is also unchanged between $m_{2}$ and $m_{2}^{\prime}$ , $(b_{2},o_{2})\in\text{perm}_{k}(m_{2}^{\prime},p)$ .

–

If $j_{23}(b_{2})=\lfloor(b_{3},o_{3}-o_{2})\rfloor$ , the permission of $(b_{2},o_{2})$ in $m_{2}^{\prime}$ is set as the same as $(b_{1},o_{1})$ in $m_{1}^{\prime}$ in step (3). So $(b_{2},o_{2})\in\text{perm}_{k}(m_{2}^{\prime},p)$ holds trivially. 2. (2)

Given $j_{12}^{\prime}(b_{1})=\lfloor(b_{2},o_{2}-o_{1})\rfloor\land(b_{1},o_{1})\in\text{perm}_{\text{cur}}(m_{1}^{\prime},\text{Readable})$ , following the method in (1) we can prove $m_{1}^{\prime}[b_{1},o_{1}]\hookrightarrow_{v}^{j_{12}^{\prime}}m_{2}^{\prime}[b_{2},o_{2}]$ . Note that if $(b_{2},o_{2})$ is read-only in $m_{2}$ , from property (6) of $m_{1}\hookrightarrow_{m}^{j_{12}}m_{2}$ we can derive that $(b_{1},o_{1})$ is also read-only in $m_{1}$ . Thus the values related by $j$ are both unchanged in $m_{1}^{\prime}$ and $m_{2}^{\prime}$ thus can be related by $j^{\prime}$ where $j\subseteq j^{\prime}$ . 3. (3)

Given $b_{1}\notin m_{1}^{\prime}$ , we know $b_{1}\notin m_{1}$ , therefore $j_{12}(b_{1})=\emptyset$ . Since $b_{1}$ cannot be added to $j_{12}^{\prime}$ in step (1), we can conclude that $j_{12}^{\prime}(b_{1})=\emptyset$ . 4. (4)

Given $j_{12}^{\prime}(b_{1})=\lfloor(b_{2},\delta)\rfloor$ , It is easy to show $b_{2}$ is either old block in $m_{2}$ ( $j_{12}(b_{1})=\lfloor(b_{2},\delta)\rfloor$ ) or newly allocated block( $j_{12}(b_{1})=\emptyset$ ), therefore $b_{2}\in m_{2}^{\prime}$ . 5. (5)

Given $j_{12}^{\prime}(b_{1})=\lfloor(b_{2},o_{2}-o_{1})\rfloor\land(b_{1},o_{1})\in\text{perm}_{\text{max}}(m_{1}^{\prime},\text{NA})$ and $j_{12}^{\prime}(b_{1}^{\prime})=\lfloor(b_{2}^{\prime},o_{2}^{\prime}-o_{1}^{\prime})\rfloor\land(b_{1}^{\prime},o_{1}^{\prime})\in\text{perm}_{\text{max}}(m_{1}^{\prime},\text{NA})$ where $b_{1}\neq b_{1}^{\prime}$ . We need to prove these two positions do not overlap ( $(b_{2},o_{2})\neq(b_{2}^{\prime},o_{2}^{\prime})$ ) by cases of whether $b_{1}$ and $b_{1}^{\prime}$ are mapped by old injection $j_{12}$ . Note that $j_{12}\subseteq j_{12}^{\prime}$ , so $j_{12}(b)$ is either $\emptyset$ or the same as $j_{12}^{\prime}(b)$ .

•

$j_{12}(b_{1})=j_{12}(b_{1}^{\prime})=\emptyset$ . The $j_{12}^{\prime}$ mappings of them are added in step (1). It is obvious that newly added mappings in step (1) never map different blocks in $m_{1}^{\prime}$ into the same block in $m_{2}^{\prime}$ . Therefore $b_{2}\neq b_{2}^{\prime}$ .

•

$j_{12}(b_{1})=\lfloor(b_{2},o_{2}-o_{1})\rfloor,j_{23}(b_{1}^{\prime})=\emptyset$ . we can derive that $b_{2}\in m_{2}$ by property (4) of $m_{1}\hookrightarrow_{m}^{j_{12}}m_{2}$ . While $b_{2}^{\prime}$ is newly allocated from $m_{2}$ in step (1). Therefore $b_{2}\neq b_{2}^{\prime}$ .

•

$j_{12}(b_{1})=\emptyset,j_{12}(b_{1}^{\prime})=\lfloor(b_{2}^{\prime},o_{2}^{\prime}-o_{1}^{\prime})\rfloor$ . Similarly we have $b_{2}\neq b_{2}^{\prime}$ .

•

$j_{12}(b_{1})=\lfloor(b_{2},o_{2}-o_{1})\rfloor,j_{12}(b_{1}^{\prime})=\lfloor(b_{2}^{\prime},o_{2}^{\prime}-o_{1}^{\prime})\rfloor$ . We can prove $(b_{2},o_{2})\neq(b_{2},o_{2}^{\prime})$ using the property (5) in $m_{1}\hookrightarrow_{m}^{j_{12}}m_{2}$ by showing $(b_{1},o_{1})\in\text{perm}_{\text{max}}(m_{1},\text{NA})$ and $(b_{1}^{\prime},o_{1}^{\prime})\in\text{perm}_{\text{max}}(m_{1},\text{NA})$ . This follows from $\text{max-perm-dec}(m_{1},m_{1}^{\prime})$ in $(j_{13},m_{1},m_{3})\leadsto_{\text{injp}}(j_{13}^{\prime},m_{1}^{\prime},m_{3}^{\prime})$ . 6. (6)

Given $j_{12}^{\prime}(b_{1})=\lfloor(b_{2},o_{2}-o_{1})\rfloor\land(b_{2},o_{2})\in\text{perm}_{k}(m_{2}^{\prime},p)$ . Similarly we prove $(b_{1},o_{1})\in\text{perm}_{k}(m_{1}^{\prime},p)$ or $(b_{1},o_{1})\not\in\text{perm}_{\text{max}}(m_{1}^{\prime},\text{NA})$ by cases of $j_{12}(b_{1})$ :

•

If $j_{12}(b_{1})=\emptyset$ , then $b_{1}$ and $b_{2}$ are new blocks by $\text{inject-sep}(j_{12},j_{12}^{\prime},m_{1},m_{2})$ . According to the construction steps , every nonempty permission of $(b_{2},o_{2})$ in $m_{2}^{\prime}$ is copied from $(b_{1},o_{1})$ in $m_{1}^{\prime}$ . Therefore $(b_{1},o_{1})\in\text{perm}_{k}(m_{1}^{\prime},p)$ .

•

If $j_{12}(b_{1})=\lfloor(b_{2},o_{2}-o_{1})\rfloor$ , then $b_{1}$ and $b_{2}$ are old blocks. We further divide $j_{23}(b_{2})$ into two cases:

–

$j_{23}(b_{2})=\emptyset$ . In this case we have $(b_{1},o_{1})\in\text{unchanged-on}(m_{1},m_{1}^{\prime})$ and $(b_{2},o_{2})\in\text{unchanged-on}(m_{2},m_{2}^{\prime})$ (same as (1)). We can derive $(b_{2},o_{2})\in\text{perm}_{k}(m_{2},p)$ , then $(b_{1},o_{1})\in\text{perm}_{k}(m_{1},p)\lor(b_{1},o_{1})\not\in\text{perm}_{\text{max}}(m_{1},\text{NA})$ by property (6) of $m_{1}\hookrightarrow_{m}^{j_{12}}m_{2}$ . Finally $(b_{1},o_{1})\in\text{perm}_{k}(m_{1}^{\prime},p)\lor(b_{1},o_{1})\not\in\text{perm}_{\text{max}}(m_{1}^{\prime},\text{NA})$ by $(b_{1},o_{1})\in\text{unchanged-on}(m_{1},m_{1}^{\prime})$ .

–

$j_{23}(b_{2})=\lfloor(b_{3},o_{3})\rfloor$ . We assume that $(b_{1},o_{1})\in\text{perm}_{\text{max}}(m_{1}^{\prime},\text{NA})$ (other-wise the conclusion holds trivially), by $\text{max-perm-dec}(m_{1},m_{1}^{\prime})$ we can derive that $(b_{1},o_{1})\in\text{perm}_{\text{max}}(m_{1},\text{NA})$ . Therefore $(b_{2},o_{2})\in\text{pub-tgt-}\allowbreak\text{mem}(j_{12},m_{1})\cap\text{pub-src-mem}(j_{23})$ is copied from $m_{1}^{\prime}$ in step (3). As a result, we get $(b_{1},o_{1})\in\text{perm}_{k}(m_{1}^{\prime},p)$ from $(b_{2},o_{2})\in\text{perm}_{k}(m_{2}^{\prime},p)$ .

∎

Lemma A.14.

$m_{2}^{\prime}\hookrightarrow_{m}^{j_{23}^{\prime}}m_{3}^{\prime}$ **

Proof.

(1)

Given $j_{23}^{\prime}(b_{2})=\lfloor(b_{3},o_{3}-o_{2})\rfloor\land(b_{2},o_{2})\in\text{perm}_{k}(m_{2}^{\prime},p)$ . We prove $(b_{3},o_{3})\in\text{perm}_{k}(m_{3}^{\prime},p)$ by cases of whether $b_{2}\in m_{2}$ .

•

If $b_{2}\notin m_{2}$ is a new block relative to $m_{2}$ , then $(b_{2},o_{2})\in\text{perm}_{k}(m_{2}^{\prime},p)$ is copied from $m_{1}^{\prime}$ in step (2). Therefore we get $(b_{1},o_{1})\in\text{perm}_{k}(m_{1}^{\prime},p)$ and $j_{23}^{\prime}\cdot j_{12}^{\prime}(b_{1})=\lfloor(b_{3},o_{3}-o_{1})\rfloor$ according to step (1). From property (1) of $m_{1}^{\prime}\hookrightarrow_{m}^{j_{13}^{\prime}}m_{3}^{\prime}$ we get $(b_{3},o_{3})\in\text{perm}_{k}(m_{3}^{\prime},p)$ ;

•

If $b_{2}\in m_{2}$ , then $j_{23}(b_{2})=\lfloor(b_{3},o_{3})\rfloor$ from $\text{inject-sep}(j_{23},j^{\prime}_{23},m_{2},m_{3})$ . We further divide whether $(b_{2},o_{2})\in\text{out-of-reach}(j_{12},m_{1})$ using the same algorithm in step (2).

–

If $(b_{2},o_{2})\in\text{out-of-reach}(j_{12},m_{1})$ . According to Lemma A.9, we can derive $(b_{2},o_{2})\allowbreak\in\text{unchanged-on}(m_{2},m_{2}^{\prime})$ and $(b_{2},o_{2})\in\text{perm}_{k}(m_{2},p)$ . From $m_{2}\hookrightarrow_{m}^{j_{23}}m_{3}$ we can derive $(b_{3},o_{3})\in\text{perm}_{k}(m_{3},p)$ . By Lemma A.6, $(b_{3},o_{3})\in\text{out-of-reach}(\allowbreak j_{13},m_{1})$ . Therefore $(b_{3},o_{3})\in\text{unchanged-on}(m_{3},m_{3}^{\prime})$ and $(b_{3},o_{3})\in\text{perm}_{k}(\allowbreak m_{3}^{\prime},p)$ .

–

If $(b_{2},o_{2})\notin\text{out-of-reach}(j_{12},m_{1})$ , the permission of public position $(b_{2},o_{2})$ in $m_{2}^{\prime}$ is copied from $m_{1}^{\prime}$ in step (3). Thus $(b_{1},o_{1})\in\text{perm}_{k}(m_{1}^{\prime},p)$ and $j_{13}^{\prime}(b_{1})=\lfloor(b_{3},o_{3}-o_{1})\rfloor$ . From property (1) of $m_{1}^{\prime}\hookrightarrow_{m}^{j_{13}^{\prime}}m_{3}^{\prime}$ we get $(b_{3},o_{3})\in\text{perm}_{k}(m_{3}^{\prime},p)$ . 2. (2)

The proof is similar to (1). Lemma A.5 ensures that the constructed value $v_{2}$ in $m_{2}^{\prime}$ can be related to the value in $m_{3}^{\prime}$ as $v_{2}\hookrightarrow_{v}^{j_{23}^{\prime}}m_{3}^{\prime}[b_{3},o_{3}]$ . Note that if $(b_{2},o_{2})$ is read-only in $m_{2}$ , the property (1) of $m_{2}\hookrightarrow_{m}^{j_{23}}m_{3}$ provides that mapped position $(b_{3},o_{3})$ is also read-only in $m_{3}$ . 3. (3)

Given $b_{2}\notin m_{2}^{\prime}$ , we have $b_{2}\notin m_{2}$ and $j_{23}(b_{2})=\emptyset$ . Also $b_{2}$ is not added into the domain of $j_{23}^{\prime}$ in step (1), so $j_{23}^{\prime}(b_{2})=\emptyset$ . 4. (4)

Given $j^{\prime}_{23}(b_{2})=\lfloor(b_{3},o_{3})\rfloor$ . Similarly $b_{3}$ is either an old block in $m_{3}$ ( $j_{23}(b_{2})=\lfloor(b_{3},o_{3})\rfloor$ ) or a new block in $m_{3}^{\prime}$ ( $j_{23}(b_{2})=\emptyset$ ). Therefore $b_{3}\in m_{3}^{\prime}$ . 5. (5)

Given $j_{23}^{\prime}(b_{2})=\lfloor(b_{3},o_{3}-o_{2})\rfloor\land(b_{2},o_{2})\in\text{perm}_{\text{max}}(m_{2}^{\prime},\text{NA})$ and $j_{23}^{\prime}(b_{2}^{\prime})=\lfloor(b_{3}^{\prime},o_{3}^{\prime}-o_{2}^{\prime})\rfloor\land(b_{2}^{\prime},o_{2}^{\prime})\in\text{perm}_{\text{max}}(m_{2}^{\prime},\text{NA})$ where $b_{2}\neq b_{2}^{\prime}$ . We need to prove that $(b_{3},o_{3})\neq(b_{3}^{\prime},o_{3}^{\prime})$ by cases of whether $b_{2}$ and $b_{2}^{\prime}$ are mapped by old injection $j_{23}$ . Note that $j_{23}\subseteq j_{23}^{\prime}$ , so $j_{23}(b)$ is either $\emptyset$ or the same as $j_{23}^{\prime}(b)$ .

•

$j_{23}(b_{2})=j_{23}(b_{2}^{\prime})=\emptyset$ . The $j_{23}^{\prime}$ mappings of them are added in step (1). It is obvious that newly added mappings in $j_{23}^{\prime}$ never map different blocks in $m_{2}^{\prime}$ into the same block in $m_{3}^{\prime}$ . Therefore $b_{3}\neq b_{3}^{\prime}$ .

•

$j_{23}(b_{2})=\lfloor(b_{3},o_{3}-o_{2})\rfloor,j_{23}(b_{2}^{\prime})=\emptyset$ . we can derive that $b_{3}\in m_{3}$ By property (4) of $m_{2}\hookrightarrow_{m}^{j_{23}}m_{3}$ . While $b_{3}^{\prime}\notin m_{3}$ can be derived from $\text{inject-sep}(j_{23},j_{23}^{\prime},m_{2},m_{3})$ . Therefore $b_{3}\neq b_{3}^{\prime}$ .

•

$j_{23}(b_{2})=\emptyset,j_{23}(b_{2}^{\prime})=\lfloor(b_{3}^{\prime},o_{3}^{\prime}-o_{3}^{\prime})\rfloor$ . Similarly we have $b_{3}\neq b_{3}^{\prime}$ .

•

$j_{23}(b_{2})=\lfloor(b_{3},o_{3}-o_{2})\rfloor,j_{23}(b_{2}^{\prime})=\lfloor(b_{3}^{\prime},o_{3}^{\prime}-o_{2}^{\prime})\rfloor$ . We can prove $(b_{3},o_{3})\neq(b_{3},o_{3}^{\prime})$ using the property (5) in $m_{2}\hookrightarrow_{m}^{j_{23}}m_{3}$ by showing $(b_{2},o_{2})\in\text{perm}_{\text{max}}(m_{2},\text{NA})$ and $(b_{2}^{\prime},o_{2}^{\prime})\in\text{perm}_{\text{max}}(m_{2},\text{NA})$ . This follows from $\text{max-perm-dec}(m_{2},m_{2}^{\prime})$ (Lemma A.10). 6. (6)

Given $j_{23}^{\prime}(b_{2})=\lfloor(b_{3},o_{3}-o_{2})\rfloor\land(b_{3},o_{3})\in\text{perm}_{k}(m_{3}^{\prime},p)$ . Similarly we prove $(b_{2},o_{2})\in\text{perm}_{k}(m_{2}^{\prime},p)$ or $(b_{2},o_{2})\not\in\text{perm}_{\text{max}}(m_{2}^{\prime},\text{NA})$ by cases of $j_{23}(b_{2})$ :

•

If $j_{23}(b_{2})=\emptyset$ , then $b_{2}$ and $b_{3}$ are new blocks by $\text{inject-sep}(j_{23},j_{23}^{\prime},m_{2},m_{3})$ . According to step (1), we know that $\exists b_{1}\;o_{1},j_{13}^{\prime}(b_{1})=\lfloor(b_{3},o_{3}-o_{1})\rfloor$ . At the same time, we also know that the permission of $(b_{2},o_{2})$ in new block of $m_{2}^{\prime}$ is copied from $(b_{1},o_{1})$ in $m_{1}^{\prime}$ . Now from property (6) of $m_{1}^{\prime}\hookrightarrow_{m}^{j_{13}^{\prime}}m_{3}^{\prime}$ we can derive that $(b_{1},o_{1})\in\text{perm}_{k}(m_{1}^{\prime},p)\lor(b_{1},o_{1})\notin\text{perm}_{\text{max}}(m_{1}^{\prime},\text{NA})$ , therefore $(b_{2},o_{2})\in\text{perm}_{k}(m_{2}^{\prime},p)\lor(b_{2},o_{2})\notin\text{perm}_{\text{max}}(m_{2}^{\prime},\text{NA})$ .

•

If $j_{23}(b_{2})=\lfloor(b_{3},o_{3}-o_{2})\rfloor$ , then $b_{2}$ and $b_{3}$ are old blocks. We further divide $b_{2}$ into two cases:

–

If $(b_{2},o_{2})\in\text{out-of-reach}(j_{12},m_{1})$ , we have $(b_{2},o_{2})\in\text{unchanged-on}(m_{2},m_{2}^{\prime})$ (Lemma A.9). If $(b_{2},o_{2})\in\text{perm}_{\text{max}}(m_{2}^{\prime},\text{NA})$ (otherwise the conclusion holds trivially), then $(b_{2},o_{2})\in\text{perm}_{\text{max}}(m_{2},\text{NA})$ holds ( $\text{max-perm-dec}\allowbreak(m_{2},m_{2}^{\prime})$ ). According to Lemma A.6, we get $(b_{3},o_{3})\in\text{unchanged-on}(m_{3},m_{3}^{\prime})$ and $(b_{3},o_{3})\in\text{perm}_{k}(m_{3},p)$ . Then we can derive that $(b_{2},o_{2})\in\text{perm}_{k}(m_{2},p)\lor(b_{2},o_{2})\not\in\text{perm}_{\text{max}}(m_{2},\text{NA})$ by property (6) of $m_{2}\hookrightarrow_{m}^{j_{23}}m_{3}$ . Finally we can prove that

[TABLE]

–

If $(b_{2},o_{2})\notin\text{out-of-reach}(j_{12},m_{1})$ , we know that $\exists b_{1},j_{13}^{\prime}(b_{1})=\lfloor b_{3},o_{3}-o_{1}\rfloor$ . From $m_{1}^{\prime}\hookrightarrow_{m}^{j_{13}^{\prime}}m_{3}^{\prime}$ we can derive that $(b_{1},o_{1})\in\text{perm}_{k}(m_{1}^{\prime},p)\lor(b_{1},o_{1})\notin\text{perm}_{\text{max}}(m_{1}^{\prime},p)$ . Meanwhile, the permission of $(b_{2},o_{2})\in\text{pub-tgt-mem}(j_{12},m_{1})\allowbreak\cap\text{pub-src-mem}(j_{23})$ is copied from $m_{1}^{\prime}$ in step (3). Therefore

[TABLE]

∎

Lemma A.15.

$(j_{12},m_{1},m_{2})\leadsto_{\text{injp}}(j_{12}^{\prime},m_{1}^{\prime},m_{2}^{\prime})$ **

Proof.

According to Definition A.3, most of the properties of $(j_{12},m_{1},m_{2})\leadsto_{\text{injp}}(j_{12}^{\prime},m_{1}^{\prime},m_{2}^{\prime})$ have been proved in Lemma A.8, Lemma A.9 and Lemma A.12. From $(j_{13},m_{1},m_{3})\leadsto_{\text{injp}}(j_{13}^{\prime},m_{1}^{\prime},m_{3}^{\prime})$ we can get $\text{mem-acc}(m_{1},m_{1}^{\prime})$ and $\text{unmapped}(j_{13})\subseteq\text{unchanged-on}(m_{1},m_{1}^{\prime})$ . To get the last leaving property $\text{unmapped}(j_{12})\subseteq\text{unchanged-on}(m_{1},m_{1}^{\prime})$ we only need to show

[TABLE]

where $j_{13}=j_{23}\cdot j_{12}$ . This relations holds simply because of $\forall b,j_{12}(b)=\emptyset\Rightarrow j_{23}\cdot j_{12}(b)=\emptyset$ . In other word, more regions in $m_{1}$ is protected in $(j_{13},m_{1},m_{3})\leadsto_{\text{injp}}(j_{13}^{\prime},m_{1}^{\prime},m_{3}^{\prime})$ than in $(j_{12},m_{1},m_{2})\leadsto_{\text{injp}}(j_{12}^{\prime},m_{1}^{\prime},m_{2}^{\prime})$ . ∎

Lemma A.16.

$(j_{23},m_{2},m_{3})\leadsto_{\text{injp}}(j_{23}^{\prime},m_{2}^{\prime},m_{3}^{\prime})$ **

Proof.

Similarly, we only need to show

[TABLE]

Given $(b_{3},o_{3})\in\text{out-of-reach}(j_{23},m_{2})$ , i.e.

[TABLE]

We need to prove $(b_{3},o_{3})\in\text{out-of-reach}(j_{23}\cdot j_{12},m_{1})$ . as follows. If $j_{23}\cdot j_{12}(b_{1})=\lfloor(b_{3},o_{3})\rfloor$ ,i.e. $\exists b_{2},j_{12}(b_{1})=\lfloor(b_{2},o_{2})\rfloor\land j_{23}(b_{2})=\lfloor(b_{3},o_{3})\rfloor$ , we can derive that $(b_{2},o_{2})\notin\text{perm}_{\text{max}}(m_{2},\text{NA})$ . By property (1) of $m_{1}\hookrightarrow_{m}^{j_{12}}m_{2}$ , we can get $(b_{1},o_{1})\notin\text{perm}_{\text{max}}(m_{1},\text{NA})$ . Therefore $(b_{3},o_{3})\in\text{out-of-reach}(j_{23}\cdot j_{12},m_{1})$ . ∎

Since we have proved all 4 required properties ( Lemma A.13 to Lemma A.16) of the constructed memory state $m_{2}^{\prime}$ , Lemma 4.3 is proved.

A.4. Proof of Lemma 4.4

We prove Lemma 4.4 in this section:

[TABLE]

Proof.

Given $m_{1}\hookrightarrow_{m}^{j_{13}}m_{3}$ , take $j_{12}=\{(b,(b,0))|j_{13}(b)\neq\emptyset\}$ , $j_{23}=j_{13}$ and $m_{2}=m_{1}$ . As a result, $m_{2}\hookrightarrow_{m}^{j_{23}}m_{3}$ holds trivially. We show $m_{1}\hookrightarrow_{m}^{j_{12}}m_{1}$ as follows:

(1)

Given $j_{12}(b_{1})=\lfloor(b_{2},o_{2}-o_{1})\rfloor\land(b_{1},o_{1})\in\text{perm}_{k}(m_{1},p)$ , according to the definition of $j_{12}$ we know that $b_{2}=b_{1}$ and $o_{2}=o_{1}$ . Therefore $(b_{2},o_{2})\in\text{perm}_{k}(m_{1},p)$ . 2. (2)

Given $j_{12}(b_{1})=\lfloor(b_{2},o_{2}-o_{1})\rfloor\land(b_{1},o_{1})\in\text{perm}_{\text{cur}}(m_{1},p)$ , similar to (1) we know $b_{2}=b_{1}$ and $o_{2}=o_{1}$ . Therefore $m_{1}[b_{1},o_{1}]=m_{1}[b_{2},o_{2}]$ . If $m_{1}[b_{1},o_{1}]$ is not in the form of $\text{Vptr}(b_{1}^{\prime},o_{1}^{\prime})$ , $m_{1}[b_{1},o_{1}]\hookrightarrow_{v}^{j_{12}}m_{1}[b_{2},o_{2}]$ holds trivially.

If $m_{1}[b_{1},o_{1}]=\text{Vptr}(b_{1}^{\prime},o_{1}^{\prime})$ , from $j_{12}(b_{1})=\lfloor(b_{1},0)\rfloor$ we get $j_{13}(b_{1})\neq\emptyset$ . According to property (2) of $m_{1}\hookrightarrow_{m}^{j_{13}}m_{3}$ , $\exists v_{3},\text{Vptr}(b_{1}^{\prime},o_{1}^{\prime})\hookrightarrow_{m}^{j_{13}}v_{3}$ . Which means that $j_{12}(b_{1}^{\prime})=\lfloor(b_{1}^{\prime},0)\rfloor$ , therefore $\text{Vptr}(b_{1}^{\prime},o_{1}^{\prime})\hookrightarrow_{v}^{j_{12}}\text{Vptr}(b_{1}^{\prime},o_{1}^{\prime})$ . 3. (3)

Given $b_{1}\notin m_{1}$ , we can derive that $j_{13}(b_{1})=\emptyset$ by $m_{1}\hookrightarrow_{m}^{j_{13}}m_{3}$ . Therefore $j_{12}(b_{1})=\emptyset$ holds by definition. 4. (4)

Given $j_{12}(b_{1})=\lfloor(b_{2},\delta)\rfloor$ , we know that $j_{13}(b_{1})\neq\emptyset$ . Therefore $b_{1}\in m_{1}$ by $m_{1}\hookrightarrow_{m}^{j_{13}}m_{3}.$ Since $b_{1}=b_{2}$ , $b_{2}\in m_{1}$ . 5. (5)

Given $b_{1}\neq b_{1}^{\prime}$ , $j_{12}(b_{1})=\lfloor b_{2},o_{2}-o_{1}\rfloor$ and $j_{12}(b_{1}^{\prime})=\lfloor b_{2}^{\prime},o_{2}^{\prime}-o_{1}^{\prime}\rfloor$ . It is straightforward that $b_{2}=b_{1}$ , $b_{2}^{\prime}=b_{1}^{\prime}$ therefore $b_{2}\neq b_{2}^{\prime}$ . 6. (6)

Given $j_{12}(b_{1})=\lfloor b_{2},o_{2}-o_{1}\rfloor$ and $(b_{2},o_{2})\in\text{perm}_{k}(m_{1},p)$ . Similarly we have $b_{2}=b_{1}$ , $o_{2}=o_{1}$ and $(b_{1},o_{1})\in\text{perm}_{k}(m_{1},p)$ .

After external calls, given preconditions $(j_{12},m_{1},m_{2})\leadsto_{\text{injp}}(j_{12}^{\prime},m_{1}^{\prime},m_{2}^{\prime})$ , $(j_{23},m_{2},m_{3})\leadsto_{\text{injp}}(j_{23}^{\prime},m_{2}^{\prime},m_{3}^{\prime})$ , $m_{1}^{\prime}\hookrightarrow_{m}^{j_{12}^{\prime}}m_{2}^{\prime}$ and $m_{2}^{\prime}\hookrightarrow_{m}^{j_{23}^{\prime}}m_{3}^{\prime}$ . We can get $m_{1}^{\prime}\hookrightarrow_{m}^{j_{13}^{\prime}}m_{3}^{\prime}$ directly by Lemma A.4. For $(j_{13},m_{1},m_{3})\leadsto_{\text{injp}}(j_{13}^{\prime},m_{1}^{\prime},m_{3}^{\prime})$ ,

(1)

We can easily show $j_{13}=j_{23}\cdot j_{12}$ by the definition of $j_{12}$ . Since $j_{12}\subseteq j_{12^{\prime}}$ , $j_{23}\subseteq j_{23}^{\prime}$ , we can conclude that $j_{23}\cdot j_{12}\subseteq j_{23}^{\prime}\cdot j_{12}^{\prime}$ , i.e. $j_{13}\subseteq j_{13}^{\prime}$ . 2. (2)

[TABLE]

By definition of $j_{12}$ , we have $\text{unmapped}(j_{12})=\text{unmapped}(j_{13})$ . Therefore the result comes directly from $(j_{12},m_{1},m_{2})\leadsto_{\text{injp}}(j_{12}^{\prime},m_{1}^{\prime},m_{2}^{\prime})$ . 3. (3)

[TABLE]

Since $j_{23}=j_{13}$ and $m_{2}=m_{1}$ , the result comes directly from $(j_{23},m_{2},m_{3})\leadsto_{\text{injp}}(j_{23}^{\prime},m_{2}^{\prime},m_{3}^{\prime})$ . 4. (4)

$\text{mem-acc}(m_{1},m_{1}^{\prime})$ comes from $(j_{12},m_{1},m_{2})\leadsto_{\text{injp}}(j_{12}^{\prime},m_{1}^{\prime},m_{2}^{\prime})$ . 5. (5)

$\text{mem-acc}(m_{3},m_{3}^{\prime})$ comes from $(j_{23},m_{2},m_{3})\leadsto_{\text{injp}}(j_{23}^{\prime},m_{2}^{\prime},m_{3}^{\prime})$ . 6. (6)

[TABLE]

If $j_{13}(b_{1})=\emptyset$ and $j_{13}^{\prime}(b_{1})=\lfloor(b_{3},o_{3}-o_{1})\rfloor$ , we get

[TABLE]

by $\text{inject-sep}(j_{12},j_{12}^{\prime},m_{1},m_{2})$ we get $b_{1}\notin m_{1}$ and $b_{2}\notin m_{2}$ . By property (3) of $m_{2}\hookrightarrow_{m}^{j_{23}}m_{3}$ we can derive that $j_{23}(b_{2})=\emptyset$ . Finally we get $b_{3}\notin m_{3}$ by $\text{inject-sep}(j_{23},j_{23}^{\prime},m_{2},m_{3})$ .

∎

Appendix B Verification of the Encryption Server and Client Example

B.1. Refinement of the Hand-written Server

The following is the proof for Theorem 6.2.

Proof.

At the top level, $\mathbb{C}$ is expanded to $\text{ro}\cdot\text{wt}\cdot\text{CAinjp}\cdot\text{asm}_{\text{injp}}$ . As the invariant ro and wt in $\mathcal{C}$ level are commutative, i.e., $\text{ro}\cdot\text{wt}\equiv\text{wt}\cdot\text{ro}$ as stated in Lemma 5.7, we can change their order in $\mathbb{C}$ . By the vertical compositionality, we first prove $L_{\texttt{S}}\leqslant_{\text{wt}}L_{\texttt{S}}$ and $[\![\text{server\_opt.s}]\!]\leqslant_{\text{asm}_{\text{injp}}}[\![\text{server\_opt.s}]\!]$ , which are both self simulation and straightforward (the latter one is provided by the adequacy theorem). Since the relation between source and target programs involves an optimization of constant propagation of the variable key, we need to use ro together with CAinjp to establish the simulation ${L_{\texttt{S}}}\leqslant_{\text{ro}\cdot\text{CAinjp}}[\![\text{server\_opt.s}]\!]$ . Note that for the unoptimized version we can prove ${L_{\texttt{S}}}\leqslant_{\text{CAinjp}}[\![\text{server.s}]\!]$ and prove ro using self simulation like wt.

The key of this proof is to establish a relation $R\in\mathcal{K}_{W_{\text{ro}\cdot\text{CAinjp}}}(S_{S},\text{regset}\times\text{mem})$ satisfying the simulation diagram Fig. 20. Given $w\in W_{\text{ro}}\times W_{\text{CAinjp}}=((se,m_{0}),((j,m,\mathit{tm}),\mathit{sg},\mathit{rs}))$ , if $\mathit{sg}\neq\text{int}\to\text{ptr}\to\text{void}\lor m_{0}\neq m$ then $R(w)=\emptyset$ . Assume $\mathit{sg}=\text{int}\to\text{ptr}\to\text{void}\land m_{0}=m$ (these conditions are provided by $I$ of $L_{\texttt{S}}$ and related incoming queries), then $R(w)$ is defined as follows:

[TABLE]

By definition the relation between internal states of ${L_{\texttt{S}}}$ and assembly states evolve in four stages:

(a)

Right after the initial call to encrypt, (a.1) indicates that the argument $i$ and function pointer are stored in RDI and RSI, the program counter is at $(b_{e},0)$ (pointing to the first assembly instruction of server_opt.s in Fig. 3(c)). The $\text{ro-valid}(se,m)$ comes from $\mathbb{R}_{\text{ro}}^{q}$ and ensures that value of key in $m_{1}$ is $42$ . 2. (b)

Right before the external call, (b.1) indicates that the argument is stored in RDI, which is a pointer $\text{Vptr}(b_{s},8)$ . Here $b_{s}$ is the stack block of the target assembly and $sp$ is injected to $\text{Vptr}(b_{s},8)$ as depicted in Fig. 7. (b.2) indicates that the return address is set to the 5th assembly instruction in Fig. 3(c) (right after Pcall RSI) and the function pointer of p $\text{Vptr}(b_{p},0)$ is related to PC by $j$ . (b.3) indicates that callee-save registers are not modified since the initial call. (b.4) maintains the injp accessibility and ro-valid for external call. (b.5), (b.6) and (b.7) indicate that the stored values (return address and previous stack block) on the stack are frame unchanged, protected and freeable. 3. (c)

Right after the external call, we keep the necessary conditions from (b), except that the program counter PC now points to the value in RA before the external call. Note that the injection function is updated to $j^{\prime\prime}$ by the external call. 4. (d)

Right before returning from encrypt, (d.1) indicates that the stack pointer is restored and the return address is set. (d.2) indicates all callee-save registers are restored and (d.3) indicates that guarantee condition injp is met.

To prove $R$ is indeed an invariant to establish the simulation, we follow the diagram in Fig. 20. The most important points of the above proof is that ro and injp play essential roles in establishing the invariant (relevant conditions are displayed in red and blue in the invariant, respectively). Initially, the target semantics enters the function from $\mathit{rs}(\text{PC})$ which is related to the function pointer in $q_{\mathcal{C}}^{I}$ as mentioned in $\mathbb{R}_{\text{injp}}^{q}$ . The condition (a) follows from $\mathbb{R}_{\text{injp}}^{q}$ and $\mathbb{R}_{\text{ro}}^{q}$ and hence holds at the initial states. Right before the execution calls, (b) holds by execution of the instructions from Pallocframe to Pcall. Note that $\text{ro-valid}(se,m)$ obtained from ro in (b.5) is essential for proving that the value of $key$ read from $m$ is $42$ , thus matches the constant in server_opt.s. Then, we need to show (c) holds after the source and target execution perform the external call and returns. This is the most interesting part where the memory protection provided by injp is essential. It is achieved by combining properties (b.1–7) with the rely-condition provided by CAinjp of the external call. For example, because we know the protected regions of the stack frame $b_{s}$ is out-of-reach before the call, by the protection enforced by injp in $\mathbb{R}_{\text{CAinjp}}^{r}$ , all values in $\mathit{tm}_{1}[b_{s},o]\ s.t.\ 0\leq o<8\vee 16\leq o<24$ are unchanged, therefore if $\mathit{tm}_{1}[b_{s},0]=\mathit{rs}(\text{RSP})$ and $\mathit{tm}_{1}[b_{s},16]=\mathit{rs}(\text{RA})$ (condition (b.5)) holds before the call, they also hold after it (condition (c.5) holds). Besides using injp, we can derive (c.3) from (b.3) by the protection over callee-save registers enforced in $\mathbb{R}_{\text{CAinjp}}^{r}$ . $\mathit{rs}_{2}(\text{PC})=(b_{g},5)$ in (c.2) is derived from $\mathit{rs}_{1}(\text{RA})=(b_{g},5)$ in (b.2) via the relation between PC and RA stated in $\mathbb{R}_{\text{CAinjp}}^{r}$ . After the external call, condition (d) can be derived from (c) by following internal execution. Since $L_{\texttt{S}}$ frees $sp$ , $[\![\text{server\_opt.s}]\!]$ can free the corresponding region $(b_{s},8)$ to $(b_{s},16)$ . The remaining part are also freeable by condition (c.7). Finally, the target semantics returns after executing Pret and condition (d) provides the updated injp world $(j^{\prime\prime},m_{3},\mathit{tm}_{3})$ with the accessibility from the initial world $w$ and other properties needed by $\mathbb{R}_{\text{CAinjp}}^{r}$ . The injp accessibility also implies $\text{mem-acc}(m,m_{3})$ for $\mathbb{R}_{\text{ro}}^{r}$ . Therefore, we are able to establish the guarantee condition and prove that ${L_{\texttt{S}}}\leqslant_{\mathbb{C}}[\![\text{server\_opt.s}]\!]$ .

∎

B.2. End-to-end Correctness Theorem

The top-level specification ${L_{\texttt{CS}}}$ is defined as follows:

Definition B.1.

LTS of ${L_{\texttt{CS}}}$ :

[TABLE]

There are four internal states in ${L_{\texttt{CS}}}$ as depicted in its transition diagram Fig. 22, among which three states correspond to the three functions (request, process and encrypt). We use $\mathit{flag}$ in Calle (Callp) to indicate whether it is called internally by request (encrypt) or called by the environment. The $\mathit{retv}$ is the return value which is either an integer when ${L_{\texttt{CS}}}$ is invoked by request or empty by other functions. $I_{T}$ contains three possible initial states corresponding to calling the three entry functions. $\to_{T}$ describes the big steps starting from each call states to another call state (e.g., Callr to Calle) or the return state. If the stack block $\mathit{sp}$ is allocated by encrypt, it should be freed in the Return state. There are two final states in $F_{T}$ : $r_{e}$ returning from request with an integer as the return value and $r_{\mathit{rp}}$ returning from request or process with no return value. Note that ${L_{\texttt{CS}}}$ can only be called but cannot perform external calls.

The remaining proof follows §6. We have shown how end-to-end refinement is derived using the optimized server. For unoptimized server, the proof is almost the same. The only difference is that the symbol table accompanying ${L_{\texttt{S}}}$ does not mark key as read-only, and the simulation invariant for Theorem 6.2 does not contain ro-valid conditions as they play no role without optimizations.

B.3. Verification of Mutually Recursive Client and Server

We introduce an variant of the running example with mutual recursions in Fig. 23. The server remains the same while client.c is modified so that its function request is also the callback function and a sequence of encrypted values are stored in a global array.

To perform the same end-to-end verification for this example, we only need to define a new top-level specification ${L_{\texttt{CS}}}^{\prime}$ to capture the semantics of the multi-step encryption and prove ${L_{\texttt{CS}}}^{\prime}\leqslant_{\text{ro}\boldsymbol{\cdot}\text{wt}\boldsymbol{\cdot}\text{c}_{\text{injp}}}[\![\text{client}]\!]\oplus{L_{\texttt{S}}}$ . Other proofs are either unchanged (e.g., the refinement of server) or can be derived from the verified compiler and the horizontal compositionality. In the following paragraphs, we briefly talk about this new top-level specification and the updated proofs.

Definition B.2.

The LTS of ${L_{\texttt{CS}}}^{\prime}$ :

[TABLE]

As we remove the function process, the new ${L_{\texttt{CS}}}$ has only two call states and one return state as depicted in Fig. 24. sps in Callr and Calle is a list of blocks, each of which stores an encrypted result. We record these blocks in the program states because we need to de-allocate them before returning. As described in $\to_{T}$ , there are three internal transitions for Callr, corresponding to three conditional branches in the source code. The transitions from Callr to Calle perform different memory operations depending on the value of i according to the code presented in Fig. 23. The transition from Callr to Return will de-allocate all the stack blocks in sps. The transition from Calle to Callr allocate a new block sp to store the encrypted result and add it to sps.

Given this new top-level specification, we need to prove Theorem 6.2 where the ${L_{\texttt{CS}}}$ is replaced by ${L_{\texttt{CS}}}^{\prime}$ . The key of this proof is that the simulation invariant must relate the call stack in the target LTS (i.e., the semantics linking of client.c and ${L_{\texttt{S}}}$ ) and sps, because each element in sps is allocated by a call toencrypt and stores the result of encryption. The complete proofs can be found in our Coq development.

Appendix C A Mutual Recursive Example for Summation

In this section, we present the application of our method to an example borrowed from CompCertM — two programs that mutually invoke each other to finish a summation task.

It consists of a Clight module $M_{C}$ and a hand-written assembly module $M_{A}$ . The code of $M_{A}$ and $M_{C}$ is shown in Fig. 25. Note that we have also shown a version of $M_{A}$ at the C level for reference and given its function the name g; this program do not actually exist in our example. We note that f and g collaborate to implement the summation from [math] to $i$ given an integer $i$ . We shall use $\text{int}\to\text{int}$ to denote their signature. f perform caching of results for any $i$ in a global array while g only caches for the most recent $i$ . When they need to compute a fresh result, they mutually recursively call each other with a smaller argument. The assembly program uses pseudo X86 assembly instructions defined in CompCert where every instruction begins with a letter P. The only real pseudo instructions are Pallocframe and Pfreeframe. Pallocframe 24 16 0 allocates a stack block $b_{s}$ of 24 bytes (3 integers on 64-bit x86), saves RSP and RA to $(b_{s},0)$ and $(b_{s},16)$ and set RSP to $\text{Vptr}(b_{s},0)$ . Pfreeframe 24 16 0 recovers RSP and RA from $(b_{s},0)$ and $(b_{s},16)$ and frees the stack block $b_{s}$ . By the calling convention and the signature of g, RDI is used to pass the only argument $i$ . RBX is a callee-saved register that stores $i$ during internal execution. It is saved to $(b_{s},8)$ at the beginning of g and restored at the end. Therefore, the sole purpose of $b_{s}$ is to save and restore RSP, RA and RBX.

The outline of the verification is presented in Fig. 26. It is similar to Fig. 4 except for the additional $L_{C}$ which will be discussed soon. Firstly, as what we do in the client-server example, we write down the specification for the assembly module $M_{A}$ which is called $L_{A}$ defined in Definition C.1 and prove the simulation between $L_{A}$ and $[\![M_{A}]\!]$ which is declared in Theorem C.2. Secondly, we define a top-level specification $L_{CA}$ to abstract the semantics of the composition of $M_{C}$ and $M_{A}$ , which is shown and Definition C.5. Intuitively, $L_{CA}$ says that the output is the summation from zero to the input. In this example, we additionally define a C-level specification for $M_{C}$ called $L_{C}$ (defined in Definition C.3) and prove $L_{C}\leqslant_{\text{ro}\cdot\text{wt}\cdot\text{c}_{\text{injp}}}[\![M_{C}]\!]$ (in Theorem C.4). We can compose this proof with the compiler correctness by utilizing Theorem 6.5 to prove $L_{C}\leqslant_{\mathbb{C}}[\![\text{CompCert}(M_{C})]\!]$ . With $L_{C}$ , it is simpler to prove the source refinement declared in Theorem C.6. Finally, we combine these proofs to obtain the single refinement between top-level specification and the target linked program as declared in Theorem C.7.

Definition C.1.

The open LTS of $L_{A}$ is defined as follows:

[TABLE]

By this definition, there are four kinds of internal states: Callg is at right after the initial call to g; Callf is right before the external call to f; Returnf is right after returning from f; and Returng is right before returning from g. The definitions of transition relations directly match the C-level version of g in Fig. 25, albeit in a big-step style. Note that when transiting internally from $\text{Callg}\;i\;m$ to $\text{Callf}\;v_{f}\;i\;m$ , find-func-pointer is used to query the global symbol table for the function pointer to f. Also note that in $L_{A}$ the memory state $m$ is not changed from Callg to Callf, while in the assembly code $M_{A}$ a new stack frame is allocated by Pallocframe. This indicates the stack frame is out-of-reach at the external call to f and should be protected during its execution. This point is also manifested in the proof below.

Theorem C.2.

$L_{A}\leqslant_{\mathbb{C}}[\![M_{A}]\!]$ **

Proof.

The key is to identify a relation $R\in\mathcal{K}_{W_{\text{CAinjp}}}(S_{A},\text{regset}\times\text{mem})$ satisfying all the properties in Definition 3.1. Given $w\in W_{\text{CAinjp}}=((j,m_{1},m_{2}),\mathit{sg},\mathit{rs})$ , if $\mathit{sg}\neq\text{int}\to\text{int}$ then $R(w)=\emptyset$ . Assume $\mathit{sg}=\text{int}\to\text{int}$ , then $R(w)$ is defined as follows:

[TABLE]

By definition the relation between internal states of $L_{A}$ and assembly states evolve in four stages:

(a)

Right after the initial call to g, (a.1) indicates that the argument $i$ is stored in RDI and the program counter is at $(b_{g},0)$ (pointing to the first assembly instruction in Fig. 25); 2. (b)

Right before the external call to f, (b.1) indicates $i$ is stored in RBX, the return address is set to the 13th assembly instruction in Fig. 25 (right after Pcall f) and $v_{f}$ matches with the program counter. (b.2) indicates callee saved registers—except for RBX–are not modified since the initial call. (b.3) indicates the entire stack frame $b_{s}$ is out-of-reach. (b.4) maintains properties in injp. (b.5) indicates values on the stack frame is not modified since the initial call. 3. (c)

Right after the external call to f, we have almost the same conditions as above, except that the program counter points to the return address set at the call to f. 4. (d)

Right before returning from g, (d.1) indicates the return value is in RAX, the stack pointer is restored and the return address is set. (d.2) indicates all callee-saved registers are restored and (d.3) indicates the guarantee condition injp is met.

To prove $R$ is indeed an invariant, we first show that condition (a) holds at the initial state. We then show by internal execution we can prove (b) holds right before the call to f. Now, the source and target execution proceed by calling and returning from f, after which we need to shown (c) holds. This is the most interesting part: it is achieved by combining properties (b.1–5) with the rely-condition provided by CAinjp for calling f. For example, because we know the stack frame $b_{s}$ is out-of-reach at the call to f, by the accessibility enforced by injp in $\mathbb{R}_{\text{CAinjp}}^{r}$ in Definition 5.9, all values in $m_{2}^{\prime}[b_{s},o]$ are unchanged, therefore if $m_{2}^{\prime}[b_{s},0]=\mathit{rs}(\text{RSP})\land m_{2}^{\prime}[b_{s},8]=\mathit{rs}(\text{RBX})\land m_{2}^{\prime}[b_{s},16]=\mathit{rs}(\text{RA})$ (condition (b.5)) holds before calling f, they also hold after (hence condition (c.5) holds). Similarly, we can derive (c.2) from (b.2) by the protection over callee-saved registers enforced in $\mathbb{R}_{\text{CAinjp}}^{r}$ . Moreover, $({\color[rgb]{1,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{1,0,0}\text{PC}})=(b_{g},13)$ in (c.1) is derived from $({\color[rgb]{1,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{1,0,0}\text{RA}})=(b_{g},13)$ in (b.1) via the relation between PC and RA stated in $\mathbb{R}_{\text{CAinjp}}^{r}$ . After the external call, we show condition (d) can be derived from (c) by following internal execution. We note that condition (d) provides exactly the guarantee-condition needed by $\mathbb{R}_{\text{CAinjp}}^{r}$ for the incoming call to g. Therefore, we successfully show $L_{A}\leqslant_{\mathbb{C}}[\![M_{A}]\!]$ indeed holds.

∎

Definition C.3.

The C-level specification $L_{C}$ is defined as follows:

[TABLE]

The four internal states capture the execution of function f in $M_{C}$ . From the initial state Callf, depending on the value of $i$ and the cached value in $sum[b_{m},4*i]$ where $b_{m}$ is the memory block of memoized, Callf may return 0 if i is equal to 0, may return the cached value if it is not zero, and may enter Callg state to invoke function g. At Callg state, it emits the query to the environment and when it receives the reply which contains the summation of $i$ , it would enter Returng state. Finally, Returng state would calculate the sum of $i$ , cache it and enter the final state Returnf.

Theorem C.4.

$L_{C}\leqslant_{\text{ro}\cdot\text{wt}\cdot\text{c}_{\text{injp}}}[\![M_{C}]\!]$ **

By decomposing $\text{ro}\cdot\text{wt}\cdot\text{c}_{\text{injp}}$ , the key is to prove that $L_{C}\leqslant_{\text{c}_{\text{injp}}}[\![M_{C}]\!]$ . Because we do not have to concern about the interleaving execution of $L_{C}$ and its environment, the proof is straightforward.

Definition C.5.

The top-level specification $L_{CA}$ is defined below:

[TABLE]

The specification contains two call states representing invocation of function f and g, and one return state. The internal transitions are big step of the execution, which omit the details of mutual recursion. The return value contained in the return state is determined by $sum(i,m)$ , meaning that the return value depends on the contents of the memory, i.e., the contents of the initial contents of memorized in $M_{C}$ and s in $M_{A}$ . The memory of the return state is updated by $\mathit{cache}$ , which cached the values generated during the execution to memorized and s.

Theorem C.6.

$L_{CA}\leqslant_{\text{ro}\cdot\text{wt}\cdot\text{c}_{\text{injp}}}L_{C}\oplus L_{A}$ **

The key of this proof is to relate the memory operations of $\mathit{cache}$ and the operations of $L_{C}\oplus L_{A}$ . We achieve this by the induction of the input value $i$ . Detailed proofs can be found in our supplementary code.

Theorem C.7.

$L_{CA}\leqslant_{\mathbb{C}}[\![\text{CompCert}(M_{C})+M_{A}]\!]$ **

Proof.

Firstly, applying vertical compositionality, we decompose the proof to $L_{CA}\leqslant_{\text{ro}\cdot\text{wt}\cdot\text{c}_{\text{injp}}}L_{C}\oplus L_{A}$ and $L_{C}\oplus L_{A}\leqslant_{\mathbb{C}}[\![\text{CompCert}(M_{C})+M_{A}]\!]$ by expanding $\mathbb{C}$ to $\text{ro}\cdot\text{wt}\cdot\text{c}_{\text{injp}}\cdot\mathbb{C}$ with Lemma 6.5. The former one is proved in Theorem C.6. For the latter one, we first apply the horizontal compositionality to verify it modularly. The verification of $L_{A}\leqslant_{\mathbb{C}}[\![M_{A}]\!]$ is proved in Theorem C.2. The verification of $L_{C}\leqslant_{\mathbb{C}}\text{CompCert}(M_{C})$ is proved by applying Lemma 6.5, Theorem C.4 and the compiler correctness theorem. ∎

Appendix D Comparison using the Running Example

In this section, we carefully compare the difference between our direct refinement with other refinements in existing approaches to VCC using the running example. The comparison is mainly based on the (estimated) effort required to prove the running example. We also compare their results in the form of final theorems of semantics preservation.

Since CompComp does not support open semantics using $\mathcal{A}$ interface and the adequacy theorem for assembly code, we only introduce details about approaches using sum of refinements (CompCertM) and product of refinements (CompCertO) for further comparison here.

D.1. CompCertM

CompCertM uses Refinement Under Self-related Contexts (RUSC) to achieve vertical and horizontal composition of refinements. We roughly describe the framework of RUSC for presenting their verification and comparing it with ours (Appendix C).

Their open simulations are parameterized by different memory relations which mirror our use of KMRs. They do not need to lift memory relations to different simulation conventions because the semantics of all languages from $\mathcal{C}$ to $\mathcal{A}$ can perform C-style calls and returns. The RUSC refinement is parameterized over a fixed set of memory relations $\mathcal{R}=\{R_{1},R_{2},...,R_{n}\}$ . $p\succcurlyeq_{\mathcal{R}}p^{\prime}$ is defined as for any context program $c$ , it $c$ is self-related by all memory relations $R\in\mathcal{R}$ , then $\text{Beh}(c\oplus p)\supseteq\text{Beh}(c\oplus p^{\prime})$ . Note that the behavior Beh() is only defined for closed program. Thanks to this definition, RUSC refinements under a fixed $\mathcal{R}$ can be easily composed vertically and horizontally if the end modules are self-related by all memory relations in $\mathcal{R}$ . However, the disadvantage of RUSC-based approach is exactly the fixed $\mathcal{R}$ and the requirement that end programs are self-related by $\mathcal{R}$ . Next, we demonstrate the impact of this restriction on VCC through the mutual summation example.

In CompCertM, they use the example of mutual summation to illustrate the source-level verification using RUSC. The structure of the proof is the same as depicted in Fig. 26. However, they actually use a different set of memory relations ( $\mathcal{R}_{e}$ ) for source level verification with the set for compiler correctness ( $\mathcal{R}_{c}$ ). At top level, they prove that

[TABLE]

Note that the verification here is slightly different with ours. The memory relations in $\mathcal{R}_{e}$ here include specific invariant which ensures that the variables memorized1 and memorized2 have desired values in the incoming memories. However, the values of these static variables cannot be determined at the invocation of a function. In other words, they describe and verify the behavior of this program in an ideal (instead of arbitrary) memory environment, i.e. the program always reads the correct summation from input $i$ . On the other hand, we prove the preservation of open semantics for all possible memories. The return value of the program is determined by both the input $i$ and the incoming memory as denoted by $\mathit{sum}(i,m)$ in Definition C.5. Given these differences, CompCertM use 5764 LoC to define the top-level memory relations, specification and complete the proof. We use 3124 LoC to complete our proof and link it with the result of VCC to achieve end-to-end direct refinement.

For further comparison, we now discuss how to compose the source verification with compiler correctness and adequacy of assembly linking within the framework of CompCertM. If $M_{C}$ is compiled to $\text{CompCert}(M_{C})$ , then the correctness of CompCertM states that $[\![M_{C}]\!]\succcurlyeq_{\mathcal{R}_{c}}[\![\text{CompCert}(M_{C})]\!]$ The adequacy property for linking assembly modules is stated as $\text{Beh}([\![\text{CompCert}(M_{C})]\!]\oplus[\![M_{A}]\!])\supseteq\text{Beh}([\![\text{CompCert}(M_{C})]\!]+[\![M_{A}]\!])$ There are two approaches to the end-to-end semantics preservation. Note that these two approaches are speculative because CompCertM did not present such composition.

Firstly, we can compose the three parts vertically in the form of behavior refinement. However, this approach only makes sense when the composed modules have closed semantics.

[TABLE]

Since the behavior refinement is transitively composable, we need to show $\text{Beh}([\![M_{C}\oplus M_{A}]\!])\supseteq\text{Beh}([\![\text{CompCert}(M_{C})]\!]\oplus[\![M_{A}]\!])$ . According to the definition of $\succcurlyeq_{\mathcal{R}_{c}}$ , it suffices to prove that $[\![M_{A}]\!]$ is self-related by $\mathcal{R}_{c}$ which consists of six different memory relations, In CompCertM, all Clight and assembly modules can be self-related by all the relations in $\mathcal{R}_{c}$ . Thus the result above can be easily obtained. However, we find that it is confusing to claim that any hand-written assembly program, even those do not obey the calling convention of CompCert, can satisfy $\mathcal{R}_{c}$ and safely be linked with programs compiled by CompCertM. The ability of CompCertM to prove such a property may come from its open semantics of assembly programs which can perform C-style calls and returns. The interaction between open modules through external calls in assembly-style is also limited by their “repaired semantics”. In this regard, our direct refinement can better describe the desired properties of the assembly modules to be safely linked with compiled modules.

Secondly, for open semantics preservation in the form of RUSC refinement, one need to union the memory relations as $\mathcal{R}_{e}\cup\mathcal{R}_{c}$ . Since the adequacy theorem is provided only in the form of behavior refinement, the conclusion is

[TABLE]

To achieve this refinement, one need to show that the end modules are self-related by each $R\in\mathcal{R}_{e}\cup\mathcal{R}_{c}$ . Excluding for the parts that have already been proved, we still need to show that 1) $L_{CA}$ is self-related by $\mathcal{R}_{c}$ and 2) $[\![\text{CompCert}(M_{C})]\!]$ and $[\![M_{A}]\!]$ are self-related by $\mathcal{R}_{e}$ . These conditions are not demonstrated in CompCertM and we do not know whether they hold or not.

In other words, one need to show that 1) the specification of source program satisfies all memory relations used in the compilation and 2) the assembly modules satisfy the memory relations used for the source level verification. These limitations can increase the difficulty and reduce the extensionality of the proofs. Our direct refinement approach overcomes these obstacles.

D.2. CompCertO

In CompCertO, the simulation convention of the overall simulation for the compiler is stated as $\mathbb{C}_{\text{CCO}}=\mathcal{R}^{*}\cdot\text{wt}\cdot\text{CL}\cdot\text{LM}\cdot\text{MA}\cdot\text{asm}_{\text{vainj}}$ where $\mathcal{R}=\text{c}_{\text{injp}}+\text{c}_{\text{inj}}+\text{c}_{\text{ext}}+\text{c}_{\text{vainj}}+\text{c}_{\text{vaext}}$ is a set of simulation conventions parameterized over used KMRs which is similar to CompCertM. $\mathcal{R}^{*}$ basically means that $\mathcal{R}$ can be used for zero or arbitrary times. Which means that the source verification can also be absorbed into $\mathbb{C}_{\text{CCO}}$ as what we do in §6.

As discussed in §1.2, the main difficulty in proving the running example using CompCertO is that the simulation convention is dependent on the details of compilation. We take server.s as an example for hand-written assembly and try to link it with $L_{\texttt{S}}$ using $\mathbb{C}_{\text{CCO}}$ .

Firstly, for $\mathcal{R}^{*}$ we need to prove that $L_{\texttt{S}}$ is self-related using $\mathcal{R}$ , thus the self-simulation can be duplicated for arbitrary times.

[TABLE]

This simulation means that $L_{s}$ can take queries related by each of the KMRs, and choose one of them for its external calls. vainj and vaext are used to capture the consistency between static analyzer and the dynamic memories are we mentioned in §5.1. This simulation is conceptually correct but quite complex to prove.

Moreover, we need to define two extra specifications $L_{\mathcal{L}}$ and $L_{\mathcal{M}}$ for intermediate language interfaces and prove $L_{\texttt{S}}\leqslant_{\text{CL}}L_{\mathcal{L}}$ , $L_{\mathcal{L}}\leqslant_{\text{LM}}L_{\mathcal{M}}$ and $L_{\mathcal{M}}\leqslant_{\text{MA}}[\![\text{server.s}]\!]$ . This approach not only requires a significant amount of effort but also presents a technical challenge which is how to achieve the protection of memory and registers for the target program. In Fig. 7, we use injp together with the structural simulation convention $\text{CA}\equiv\text{CL}\cdot\text{LM}\cdot\text{MA}$ . For example, the saved values of registers are protected as out-of-reach in the memory (injp) such that these registers can be correctly restored to satisfy the calling convention. In $\mathbb{C}_{\text{CCO}}$ , LM requires that callee-save registers are protected, MA requires that RSP and RA are protected. While they do not provide any memory protection for saved values of these registers on the stack. This makes it extremely challenging to establish a simulation between $L_{\texttt{S}}$ and $[\![\text{server.s}]\!]$ through intermediate specifications. In fact, we came up with the idea of direct refinement during our attempts to prove this.

Bibliography32

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1(1)
2Ahmed (2006) Amal J. Ahmed. 2006. Step-Indexed Syntactic Logical Relations for Recursive and Quantified Types. In Proc. 15th European Symposium on Programming (ESOP’06) (LNCS, Vol. 3924) , Peter Sestoft (Ed.). Springer, Cham, 69–83. https://doi.org/10.1007/11693024_6 · doi ↗
3Besson et al . (2015) Frédéric Besson, Sandrine Blazy, and Pierre Wilke. 2015. A Concrete Memory Model for Comp Cert. In Proc. 6th Interactive Theorem Proving (ITP’15) (LNCS, Vol. 9236) , Christian Urban and Xingyuan Zhang (Eds.). Springer, Cham, 67–83. https://doi.org/10.1007/978-3-319-22102-1_5 · doi ↗
4Chappe et al . (2023) Nicolas Chappe, Paul He, Ludovic Henrio, Yannick Zakowski, and Steve Zdancewic. 2023. Choice Trees: Representing Nondeterministic, Recursive, and Impure Programs in Coq. Proc. ACM Program. Lang. 7, POPL, Article 61 (January 2023), 31 pages. https://doi.org/10.1145/3571254 · doi ↗
5Gu et al . (2015) Ronghui Gu, Jérémie Koenig, Tahina Ramananandro, Zhong Shao, Xiongnan(Newman) Wu, Shu-Chun Weng, Haozhong Zhang, and Yu Guo. 2015. Deep Specifications and Certified Abstraction Layers. In Proc. 42nd ACM Symposium on Principles of Programming Languages (POPL’15) , Sriram K. Rajamani and David Walker (Eds.). ACM, New York, NY, USA, 595–608. https://doi.org/10.1145/2775051.2676975 · doi ↗
6Gu et al . (2018) Ronghui Gu, Zhong Shao, Jieung Kim, Xiongnan (Newman) Wu, Jérémie Koenig, Vilhelm Sjober, Hao Chen, David Costanzo, and Tahnia Ramananandro. 2018. Certified Concurrent Abstraction Layers. In Proc. 2018 ACM Conference on Programming Language Design and Implementation (PLDI’18) , Jeffrey S. Foster and Dan Grossman (Eds.). ACM, New York, NY, USA, 646–661. https://doi.org/10.1145/3192366.3192381 · doi ↗
7He et al . (2021) Paul He, Eddy Westbrook, Brent Carmer, Chris Phifer, Valentin Robert, Karl Smeltzer, Andrei Ştefănescu, Aaron Tomb, Adam Wick, Matthew Yacavone, and Steve Zdancewic. 2021. A Type System for Extracting Functional Specifications from Memory-Safe Imperative Programs. Proc. ACM Program. Lang. 5, OOPSLA, Article 135 (October 2021), 29 pages. https://doi.org/10.1145/3485512 · doi ↗
8Hur et al . (2012 a) Chung-Kil Hur, Derek Dreyer, Georg Neis, and Viktor Vafeiadis. 2012 a. The Marriage of Bisimulations and Kripke Logical Relations. In Proc. 39th ACM Symposium on Principles of Programming Languages (POPL’12) , John Field and Michael Hicks (Eds.). ACM, New York, NY, USA, 59–72. https://doi.org/10.1145/2103656.2103666 · doi ↗

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Fully Composable and Adequate Verified Compilation with Direct Refinements between Open Modules

Abstract.

1. Introduction

1.1. Full Compositionality and Adequacy in Verified Compilation

1.2. Problems with the Existing Approaches to Refinements

Constant Refinement

Sum of Refinements

Product of Refinements

Summary

1.3. Challenges for Direct Refinement of Open Modules

1.4. Our Contributions

1.5. Structure of the Paper

2. Key Ideas

2.1. Refinement Supporting Adequacy, Heterogeneity and Horizontal Composition

2.1.1. Adequacy and Heterogeneity via Direct Refinement

2.1.2. Horizontal Composition via Kripke Memory Relations

2.2. Uniform and Transitive KMR for Vertical Composition of Direct Refinements

3. Background and Challenges

3.1. Background

3.1.1. Block-based Memory Model

3.1.2. A Framework for Open Simulations

Definition 3.1.

Definition 3.2.

Definition 3.3 (Kripke Relation with Memory Protection).

3.2. Challenges for Vertically Composing Open Simulations

Theorem 3.4 (V. Comp).

Theorem 3.5.

4. A Uniform and Transitive Kripke Memory Relation

4.1. Uniformity of injp

4.1.1. Public and Private Memory via Memory Injections

Definition 4.1.

Lemma 4.2.

4.1.2. injp as a Uniform Rely Condition

4.1.3. injp as a Uniform Guarantee Condition

4.2. Transitivity of injp

4.2.1. injp⊑injp⋅injp\text{injp}\sqsubseteq\text{injp}\boldsymbol{\cdot}\text{injp}injp⊑injp⋅injp

Lemma 4.3.

4.2.2. injp⋅injp⊑injp\text{injp}\boldsymbol{\cdot}\text{injp}\sqsubseteq\text{injp}injp⋅injp⊑injp

Lemma 4.4.

5. Derivation of the Direct Refinement for CompCert

5.1. Open Simulation of Individual Passes

5.1.1. Simulation Conventions and Semantic Invariants

5.1.2. Open Simulation of Optimizations

Lemma 5.1.

Definition 5.2.

5.2. Properties for Refining Simulation Conventions

5.2.1. Commutativity of KMRs and Structural Conventions.

Lemma 5.3.

5.2.2. Absorption of KMRs into injp

Lemma 5.4.

5.2.3. Composition of Semantic Invariants

Lemma 5.5.

Lemma 5.6.

Lemma 5.7.

5.3. Proving the Direct Open Simulation for CompCert

Theorem 5.8.

Definition 5.9.

Theorem 5.10.

6. End-to-End Verification of Heterogeneous Modules

6.1. Refinement for the Hand-written Server

Definition 6.1.

Theorem 6.2.

6.2. End-to-end Correctness Theorem

Lemma 6.3.

Lemma 6.4.

Lemma 6.5.

Theorem 6.6.

6.3. Verification of the Mutually Recursive Client and Server

7. Generality and Limitations of Our Approach

7.1. Supporting Different Memory Models and Compilers

7.2. Supporting Additional Optimization Passes

Proposition 7.1.

7.3. Limitations

8. Evaluation and Related Work

4.2.1. $\text{injp}\sqsubseteq\text{injp}\boldsymbol{\cdot}\text{injp}$

4.2.2. $\text{injp}\boldsymbol{\cdot}\text{injp}\sqsubseteq\text{injp}$

A.3.1. Construction and properties of $j_{12}^{\prime}$ , $j_{23}^{\prime}$ and $m_{2}^{\prime}$