Concurrent Computing with Shared Replicated Memory

Klaus-Dieter Schewe; Andreas Prinz; Egon B\"orger

arXiv:1902.04789·cs.DC·February 14, 2019

Concurrent Computing with Shared Replicated Memory

Klaus-Dieter Schewe, Andreas Prinz, Egon B\"orger

PDF

TL;DR

This paper models concurrent systems with shared replicated memory using concurrent ASM, analyzing different replication policies and messaging to understand their impact on system consistency.

Contribution

It introduces a method to specify and analyze replicated memory management in concurrent systems using refined concurrent ASM models.

Findings

01

Different replication policies affect system consistency.

02

Messaging between data centers influences system behavior.

03

Refined models help in understanding trade-offs in replication strategies.

Abstract

The behavioural theory of concurrent systems states that any concurrent system can be captured by a behaviourally equivalent concurrent Abstract State Machine (cASM). While the theory in general assumes shared locations, it remains valid, if different agents can only interact via messages, i.e. sharing is restricted to mailboxes. There may even be a strict separation between memory managing agents and other agents that can only access the shared memory by sending query and update requests to the memory agents. This article is dedicated to an investigation of replicated data that is maintained by a memory management subsystem, whereas the replication neither appears in the requests nor in the corresponding answers. We show how the behaviour of a concurrent system with such a memory management can be specified using concurrent communicating ASMs. We provide several refinements of a…

Equations21

x := 1 ∣ y := 1 ∣ P r in t (x); P r in t (y) ∣ P r in t (y); P r in t (x)

x := 1 ∣ y := 1 ∣ P r in t (x); P r in t (y) ∣ P r in t (y); P r in t (x)

v a l_{S} (f (t_{1}, \dots, t_{a r_{f}})) = f_{S} (v a l_{S} (t_{1}), \dots, v a l_{S} (t_{a r_{f}})) .

v a l_{S} (f (t_{1}, \dots, t_{a r_{f}})) = f_{S} (v a l_{S} (t_{1}), \dots, v a l_{S} (t_{a r_{f}})) .

{v ∣ \exists j, d^{'}, j^{'} .\exists t . p_{i, j, d^{'}, j^{'}} (k) = (v, t)}

{v ∣ \exists j, d^{'}, j^{'} .\exists t . p_{i, j, d^{'}, j^{'}} (k) = (v, t)}

local \/ \equiv \exists d .\forall d^{'}, j^{'} . (d^{'}, j^{'}) \in G_{i, j} \Rightarrow d^{'} = d

local \/ \equiv \exists d .\forall d^{'}, j^{'} . (d^{'}, j^{'}) \in G_{i, j} \Rightarrow d^{'} = d

sufficient \/ (\textsc A l l)

sufficient \/ (\textsc A l l)

with γ_{i, j} = ∣ {(d^{'}, j^{'}) ∣ copy \/ (i, j, d^{'}, j^{'})} ∣

sufficient \/ (\textsc O n e)

sufficient \/ (\textsc T w o)

sufficient \/ (\textsc T h r ee)

sufficient \/ (\textsc Q u or u m (q))

sufficient \/ (\textsc E a c h_Q u or u m (q))

with δ_{i, j, d} = ∣ {j^{'} ∣ copy \/ (i, j, d, j^{'})} ∣

sufficient \/ (\textsc L oc a l_Q u or u m (q, d))

sufficient \/ (\textsc L oc a l_O n e (d))

Δ_{ℓ} = {((p_{i, j, d, j^{'}}, k), (v, t_{ℓ})) ∣ ((p_{i}, k), v) \in Δ_{ℓ}^{'} \land copy \/ (i, j, d, j^{'}) \land h_{i} (k) \in range \/_{j}}

Δ_{ℓ} = {((p_{i, j, d, j^{'}}, k), (v, t_{ℓ})) ∣ ((p_{i}, k), v) \in Δ_{ℓ}^{'} \land copy \/ (i, j, d, j^{'}) \land h_{i} (k) \in range \/_{j}}

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

11institutetext: Zhejiang University, UIUC Institute, China, 11email: [email protected] 22institutetext: University of Agder, Department of ICT, Agder, Norway, 22email: [email protected] 33institutetext: Università di Pisa, Dipartimento di Informatica, Pisa, Italy, 33email: [email protected]

Concurrent Computing with Shared Replicated Memory††thanks: The research reported in this article was partly supported by the Austrian Science Fund for the project Behavioural Theory and Logics for Distributed Adaptive Systems (FWF: [P26452-N15]).

Rigorous Specification and Analysis Using Concurrent Communicating Abstract State Machines

Klaus-Dieter Schewe 11

Andreas Prinz 22

Egon Börger 33

Abstract

The behavioural theory of concurrent systems states that any concurrent system can be captured by a behaviourally equivalent concurrent Abstract State Machine (cASM). While the theory in general assumes shared locations, it remains valid, if different agents can only interact via messages, i.e. sharing is restricted to mailboxes. There may even be a strict separation between memory managing agents and other agents that can only access the shared memory by sending query and update requests to the memory agents. This article is dedicated to an investigation of replicated data that is maintained by a memory management subsystem, whereas the replication neither appears in the requests nor in the corresponding answers. We show how the behaviour of a concurrent system with such a memory management can be specified using concurrent communicating ASMs. We provide several refinements of a high-level ground model addressing different replication policies and internal messaging between data centres. For all these refinements we analyse their effects on the runs such that decisions concerning the degree of consistency can be consciously made.

1 Introduction

Abstract State Machines (ASMs) have been used since their discovery in the 1990s to model sequential, parallel and concurrent systems (see [10, Ch. 6, 9]). For sequential systems the celebrated sequential ASM thesis [14] provides an elegant theoretical underpinning showing that every sequential algorithm as stipulated by three simple, intuitive postulates are captured by sequential ASMs. This was generalised in [11] for (synchronous) parallel systems111The behavioural theory proven by Ferrarotti et al. simplifies the previously developed parallel ASM thesis [4, 5] by exploiting the idea of multiset comprehension terms in bounded exploration witnesses, which was stimulated by previous research on non-deterministic database transformations [23, 24]., and in [7] for asynchronous concurrent systems222This closed the gap in the behavioural theory of concurrent systems, as the definition of partially ordered runs in [13] was generally considered to be insufficient. There are examples of concurrent systems satisfying the intuitively clear property of sequential consistency [16] without having partially ordered runs., in which a concurrent system is defined by a family of algorithms assigned to agents that is subject to a concurrency postulate333Though the proof of the concurrent ASM thesis was first only conducted for families of sequential algorithms, the generalisation to families of parallel algorithms does not cause serious difficulties as sketched in [22]..

This characterisation can be applied to many different models of concurrent computation (see e.g. [1, 3, 12, 18, 25, 26]). While the thesis has been grounded on the assumption that shared locations are used, it was shown in [8] that the theory remains valid, if different agents can only interact via messages, i.e. sharing is restricted to mailboxes. This includes the case of a strict separation between memory managing agents and other agents that can only access the shared memory by sending query and update requests to the memory agents. Naturally, the expectation of an agent $a$ sending a request to a memory management agent $b$ is that the result is the same as if there had been a concurrent run with agent $a$ executing its request directly on the shared locations.

However, as observed in [20] this expectation can be violated by relaxed shared memory management, in particular in combination with conflicting updates or data replication. Regarding conflicting updates, if agents $a$ and $a^{\prime}$ try simultaneously to update the value at a location $\ell$ to some new values $v$ and $v^{\prime}$ , respectively, with $v\neq v^{\prime}$ , a concurrent run would be discarded. Instead of this, a memory management agent $b$ receiving update requests from $a$ and $a^{\prime}$ might randomly choose one of the possible values $v$ or $v^{\prime}$ , so for one of the agents there would be a lost update and its latest step would not have been executed properly.

Therefore, in [7] it is requested that “if the effects of such techniques are behaviourally relevant for the problem …, then this should be described by a precise model through which a programmer can understand, check and justify that the code does what it is supposed to do”. For the random selection such a precise model can be easily defined by a memory agent $m$ with the following rule444Here a shared (set-valued) function symbol $\Delta$ denoting an update set is used.:

[TABLE]

If this behaviour is known, one can specify what an agent should do in case of possibly lost updates, e.g. use strict transactions to avoid the situation to arise555See e.g. [9] for an ASM specification of concurrent systems with transactional behaviour., or integrate algorithms for mutual exclusion [2, 17], or ensure that an agent only continues after the memory agent $m$ (considered as part of the environment [10]) has terminated its step, etc.

In this article we investigate the case of replicated data maintained by a memory management subsystem, where the replication should not appear in the requests nor in the corresponding answers, which is a standard requirement in distributed databases with replication [19, Chap.13, pp.459ff.]. Consider first an example of undesirable behaviour with four agents $a_{1},\ldots,a_{4}$ having the following respective programs (where ; denotes sequential execution, and $Print(x)$ means to read the value $x$ and to copy it to some output):

[TABLE]

Then there is no concurrent run where (1) initially $x=y=0$ , (2) each agent makes once each possible move, (3) $a_{3}$ prints $x=1,y=0$ , and (4) $a_{4}$ prints $x=0,y=1$ . However, if $x$ and $y$ are replicated, say that there are always two copies, and an update by the programs $a_{1}$ or $a_{2}$ affects only a single copy, such a behaviour will indeed be enabled.

Therefore, our objective is that such behaviour must be understandable from the specification so that the developer of the system can see, whether the consequences of such behaviour can be tolerated or additional consistency assurance measures have to be included. We assume a rather simple model, where shared data is logically organised in relations with primary keys, and data can only be accessed by means of the primary key values. We further assume relations to be horizontally fragmented according to values of a hash function on the primary key values, and these fragments are replicated. Replicas are assigned to different nodes, and several nodes together form a data centre, i.e. that are handled by one dedicated data management agent666This is similar to the data organisation in the noSQL database system Cassandra [21], but otherwise the Cassandra system is of no importance for this article. We may think of nodes as storage units and of data centres as physical machines managing them..

In addition, values in replicas carry logical timestamps set by the data centres and stored in the records in the nodes of the memory management subsystem. We further adopt Lamport’s simple approach for the maintenance of timestamps [15], which basically advances a clock, if its data centre receives a record with a future timestamp. This allows us to formulate and investigate policies that guarantee certain to-be-specified levels of consistency (as known from replication in relational databases).

For retrieval of a set of records a specified number of replicas has to be read, and for each record always the one with the latest timestamp will be returned. Depending on how many replicas are accessed the returned records may be (in the strict sense) outdated or not. Likewise, for the update of a set of records timestamps will be created, and a specified number of replicas of the records will be stored. Success of retrieval or update will be returned according to specified read- and write-policies.

In Section 2 we will first specify the behaviour of a concurrent system with shared data requiring that all agents interact with this subsystem for data retrieval and updates using appropriate SEND and RECEIVE actions. The memory management subsystem will be specified by a separate collection of agents. In Section 3 we investigate a refinement concerning policies how many replicas are to be read or updated, respectively. We show that some combinations of replication policies enable view compatibility, i.e. data consistency, which formalises the expectation above. In Section 4 we refine our specification taking the communication between data centres into account, and address the enforcement of the read and write policies. We obtain a complete, though not necessarily correct refinement, and as a consequence view compatibility cannot be guaranteed anymore. In fact, we even show that view compatibility implies view serialisability. That is, without exploiting the possibility of transactions—at least for single read or write requests—consistency cannot be preserved. Finally, we conclude with a brief summary and outlook.

2 Ground Model for Shared Memory Management with Replication

We assume some familiarity with Abstract State Machines (ASMs), which can be understood as a form of pseudo-code with well-founded semantics777Here we do not repeat the formal definition of the semantics of ASMs—detailed definitions can be found in the textbook [10, Sect.2.2/4].. The signature $\Sigma$ of an ASM is a finite set of function symbols $f$ , each associated with an arity $ar_{f}$ . A state $S$ is a set of functions $f_{S}$ of arity $ar_{f}$ over some fixed base set $B$ , given by interpretations of the corresponding function symbol $f$ . Each pair $(f,(v_{1},\dots,v_{ar_{f}}))$ comprising a function symbol and arguments $v_{i}\in B$ is called a location, and each pair $(\ell,v)$ of a location $\ell$ and a value $v\in B$ is called an update. A set of updates is called an update set. The evaluation of terms is defined as usual by

[TABLE]

We say that $f_{S}(val_{S}(t_{1}),\dots,val_{S}(t_{ar_{f}}))$ is the value at location $(f,(val_{S}(t_{1}),\dots,val_{S}(t_{ar_{f}})))$ in state $S$ . ASM rules $r$ are composed using

assignments.

$f(t_{1},\dots,t_{ar_{f}}):=t_{0}$ (with terms $t_{i}$ built over $\Sigma$ ),

branching.

IF $\varphi$ THEN $r_{+}$ ELSE $r_{-}$ ,

parallel composition.

FORALL $x$ WITH $\varphi(x)$ $r(x)$ ,

bounded parallel composition.

$r_{1}\dots r_{n}$ ,

choice.

CHOOSE $x$ WITH $\varphi(x)$ IN $r(x)$ , and

let.

LET $x=t$ IN $r(x)$ .

Each rule yields an update set $\Delta(S)$ in state $S$ . If this update set is consistent, i.e. it does not contain two updates $(\ell,v),(\ell,v^{\prime})$ with the same location $\ell$ and different values $v\neq v^{\prime}$ , then applying this update set defines a successor state $S+\Delta(S)$ .

Regarding the function symbols in $\Sigma$ we further distinguish static, dynamic and derived functions. Only dynamic function symbols can appear as the outermost symbol on the left hand side of an assignment.

2.1 Concurrent Communicating Abstract State Machines

A concurrent ASM (cASM) $\mathcal{CM}$ is defined as a family $\{(a,\textit{asm\/}_{a})\}_{a\in\mathcal{A}}$ of pairs consisting of an agent $a$ and an ASM $\textit{asm\/}_{a}$ .

Let $\Sigma_{a}$ denote the signature of the ASM $\textit{asm\/}_{a}$ . Taking the union $\Sigma=\bigcup_{a\in\mathcal{A}}\Sigma_{a}$ we distinguish between $\mathcal{CM}$ -states built over $\Sigma$ and local states for agent $a$ built over $\Sigma_{a}$ ; the latter ones are simply projections of the former ones on the subsignature.

Definition 1

A concurrent $\mathcal{CM}$ -run of a concurrent ASM $\{(a,\textit{asm\/}_{a})\}_{a\in\mathcal{A}}$ is a sequence $S_{0},S_{1},S_{2},\dots$ of $\mathcal{CM}$ -states, such that for each $n\geq 0$ there is a finite set $A_{n}\subseteq\mathcal{A}$ of agents such that $S_{n+1}$ results from simultaneously applying update sets $\Delta_{a}(S_{j(a)})$ for all agents $a\in A_{n}$ that have been built by $\textit{asm\/}_{a}$ in some preceding state $S_{j(a)}$ ( $j(a)\leq n$ depending on $a$ ), i.e. $S_{n+1}=S_{n}+\bigcup_{a\in A_{n}}\Delta_{a}(S_{j(a)})$ and $a\notin\bigcup_{i=j(a)}^{n-1}A_{i}$ .

Dynamic functions in $\Sigma_{a}$ can further be private or shared. In the latter case they can be updated also by other agents and thus appear in at least one other signature $\Sigma_{a^{\prime}}$ .

In order to isolate agents responsible for a memory management subsystem we exploit communicating concurrent ASMs (ccASM) [8]. In a ccASM the only shared function symbols take the form of mailboxes. Sending of a message $m$ from $a$ to $b$ means to update the out-mailbox of $a$ by inserting $m$ into it. This mailbox is a set-valued shared location with the restriction that only the sender can insert messages into it and only the environment—in this case understood as the message processing system—can read and delete them. The message processing system will move the message $m$ to the in-mailbox of the receiver $b$ . Receiving a message $m$ by $b$ means in particular that $b$ removes $m$ from its in-mailbox and performs some local operation on $m$ .

Therefore, in ccASMs the language of ASM rules above is enriched by the following constructs888As explained in [8], instead of describing the details of the local RECEIVE action of an agent we only use the corresponding RECEIVED $(m)$ predicate; it means that $m$ is in the mailbox of the agent who (or in general whose mailbox manager) RECEIVED it (when the message processing system has inserted $m$ into $b$ ’s in-mailbox and deleted $m$ from $a$ ’s out-mailbox).:

Send.

SEND( $\langle$ message $\rangle$ , from: $\langle$ sender $\rangle$ , to: $\langle$ receiver $\rangle$ ),

Receive.

RECEIVE( $\langle$ message $\rangle$ , from: $\langle$ sender $\rangle$ , to: $\langle$ receiver $\rangle$ ),

Received.

RECEIVED( $\langle$ message $\rangle$ , from: $\langle$ sender $\rangle$ , to: $\langle$ receiver $\rangle$ ), and

Consume.

CONSUME( $\langle$ message $\rangle$ , from: $\langle$ sender $\rangle$ , to: $\langle$ receiver $\rangle$ ).

Let us consider the situation, where all shared data is organised in relations with a unique primary key. We can model this by a set of function symbols $\Sigma_{\text{mem}}=\{p_{1},\dots,p_{k}\}$ , where each $p_{i}$ has a fixed arity $a_{i}$ , and a fixed “co-arity” $c_{i}$ , such that in each state $S$ we obtain partial functions $p_{i}^{S}:B^{a_{i}}\rightarrow B^{c_{i}}$ , which are almost everywhere undefined.

When dealing with a memory management subsystem we have to specify a subsystem with separate agents that maintain the locations $(p_{i},\vec{k})$ . A read access by an agent $a\in\mathcal{A}$ aims at receiving a subset of relation $p_{i}$ containing those records with key values satisfying a condition $\varphi$ , i.e. the subsystem has to evaluate a term of the form $p_{i}[\varphi]\;=\;\{(\vec{k},\vec{v})\mid\varphi(\vec{k})\wedge\vec{v}\neq\textit{undef\/}\wedge p_{i}(\vec{k})=\vec{v}\}$ . As $p_{i}$ is not in the signature $\Sigma_{a}$ , the agent $a$ , instead of using the term $p_{i}[\varphi]$ in a rule, must send a read-request and wait for a response, i.e. it executes

[TABLE]

(the message just contains the name $p_{i}$ of the function symbol and the formula $\varphi$ , and $\textit{home\/}(a)$ denotes a not further specified receiver, which for agent $a$ represents the memory management subsystem), and waits until, once RECEIVED( $\text{answer}(p_{i},\varphi)$ ,from:home( $a$ ),to: $a$ ) becomes true, it can

[TABLE]

the requested value from the memory management subsystem. We abstract from the details of the communication but assume the communication to be reliable (no message gets lost or damaged). Where clear from the context for reasons of succinctness we notationally omit the sender and receiver parameters in SEND and RECEIVE.

Naturally, the received answer corresponds to a unique previously sent read-request. As in the sequel we concentrate on the handling of single message by the memory management subsystem, the correct association of possibly several answers to several requests is of minor importance for us.

The answer in the message must be a relation of arity $a_{i}+c_{i}$ satisfying the key property above. The agent can store such an answer using a non-shared function $p_{i}^{a}$ or process it in any other way, e.g. aggregate the received values. This is part of the ASM rule in $\textit{asm\/}_{a}$ , which we do not consider any further.

In the SEND and RECEIVE rules we use a fixed agent home( $a$ ), with which the agent $a$ communicates. It will be unknown to the agent $a$ , whether this agent home( $a$ ) processes the read-request or whether it communicates with other agents to produce the answer.

Analogously, for bulk write access an agent $a$ may want to execute the operation $p_{i}\;\mbox{:\&}\;p$ to update all records999Note that in this way we capture a deletion of a record in $p_{i}$ with key $\vec{k}$ by having $(\vec{k},\textit{undef\/})\in p$ . Also insertions are subsumed by this operation: if $(\vec{k},\vec{v})\in p$ holds, but $val_{S}(p_{i},\vec{k})=\textit{undef\/}$ , then a new record in $p_{i}$ with key $\vec{k}$ is inserted. with a key defined in $p$ to the new values given by $p$ . While this would correspond to the ASM rule FORALL $(\vec{k},\vec{v})\in p$ $p_{i}(\vec{k}):=\vec{v}$ , the agent $a$ must send a write-request and wait for a response, i.e. it executes

[TABLE]

(again, the message only contains the name of the function symbol $p_{i}$ and a relation $p$ ), and waits to receive an acknowledgement to the write-request101010Naturally, the remark above concerning the association of an answer to a unique previously sent request, extends analogously to write requests., i.e. to

[TABLE]

We use the notation $\mathcal{CM}_{0}=\{(a,\textit{asm\/}^{c}_{a})\}_{a\in\mathcal{A}}\cup\{(db,\textit{asm\/}_{db})\}$ for the concurrent communicating ASM. Here for the sake of completeness we may think of a single memory agent $db$ —in particular, we have $\textit{home\/}(a)=db$ for all $a\in\mathcal{A}$ —that receives read and write requests and executes them in one step111111Note that the answer to a read request is a set, which may be empty.. Thus, the rule of $\textit{asm\/}_{db}$ looks as follows:

[TABLE]

2.2 Memory Organisation with Replication

In the previous subsection we assumed that logically (from the users’ point of view) all data is organised in relations $p_{i}$ with a unique primary key. However, as we want to emphasise replication, instead of a location $(p_{i},\vec{k})$ there will always be several replicas, and at each replica we may have a different value. We use the notion cluster to refer to all (replicated) locations associated with a logical relation $p_{i}$ .

Each cluster is associated with several data centres, and each data centre comprises several nodes. The nodes are used for data storage, and data centres correspond to physical machines maintaining several such storage locations. So let $\mathcal{D}$ denote the set of data centres, and let $\mathcal{D}_{i}$ ( $i=1,\dots,k$ ) be the sets of data centres for maintaining the relations $p_{i}$ , i.e. $\mathcal{D}=\bigcup_{i=1}^{k}\mathcal{D}_{i}$ .

First let us assume that each relation $p_{i}$ is fragmented according to the values of a hash-key. That is, for each $i=1,\dots,k$ we can assume a static hash-function $h_{i}:B^{a_{i}}\rightarrow[m,M]\subseteq\mathbb{Z}$ assigning a hash-key to each key value in $B^{a_{i}}$ , i.e. those keys at which in the memory management system possibly some value may be defined. We further assume a partition $[m,M]=\bigcup_{j=1}^{q_{i}}\textit{range}_{j}$ of the interval of hash-key values such that $\textit{range}_{j_{1}}<\textit{range}_{j_{2}}$ holds for all $j_{1}<j_{2}$ , so each range will again be an interval. These range intervals are used for the horizontal fragmentation into $q_{i}$ fragments of the to-be-represented function $p_{i}$ : $\textit{Frag}_{j,i}=\{\vec{k}\in B^{a_{i}}\mid h_{i}(\vec{k})\in\textit{range}_{j}\}$ .

All these fragments will be replicated and their elements associated with a value (where defined by the memory management system), using a fixed replication factor $r_{i}$ for each cluster. That is, each fragment $\textit{Frag}_{j,i}$ will be replicated $r_{i}$ -times for each data centre $d\in\mathcal{D}_{i}$ . A set of all pairs $(\vec{k},\vec{v})$ with key $\vec{k}\in\textit{Frag}_{j,i}$ and an associated value $\vec{v}$ in the memory management system is called a replica of $\textit{Frag}_{j,i}$ .

More precisely assume that each data centre $d\in\mathcal{D}_{i}$ consists of $n_{i}$ nodes, identified by $d$ and a number $j^{\prime}\in\{1,\dots,n_{i}\}$ . Then we use a predicate $\textit{copy\/}(i,j,d,j^{\prime})$ to denote that the node with number $j^{\prime}$ in the data centre $d\in\mathcal{D}_{i}$ contains a replica of $\textit{Frag}_{j,i}$ . To denote the values in replicas we use dynamic functions $p_{i,j,d,j^{\prime}}$ of arity $a_{i}$ and co-arity $c_{i}+1$ (functions we call again replicas). That is, instead of the logical function symbol $p_{i}$ we use function symbols $p_{i,j,d,j^{\prime}}$ with $j\in\{1,\dots,q_{i}\}$ , $d\in\mathcal{D}_{i}$ and $j^{\prime}\in\{1,\dots,n_{i}\}$ , and we request $h_{i}(\vec{k})\in range_{j}$ for all $\vec{k}\in B^{a_{i}}$ whenever $copy\/(i,j,d,j^{\prime})$ holds and $p_{i,j,d,j^{\prime}}(\vec{k})$ is defined. For the associated values we have $p_{i,j,d,j^{\prime}}(\vec{k})=(\vec{v},t)$ , where $t$ is an added timestamp value, and values $\vec{v}$ may differ from replica to replica, i.e. there can be different values $\vec{v}$ with different timestamps in different replicas of $\textit{Frag}_{j,i}$ .

Each data centre $d$ maintains a logical clock $clock_{d}$ that is assumed to advance (without this being further specified), and $clock_{d}$ evaluates to the current time at data centre $d$ . Timestamps must satisfy the following requirements:

Timestamps are totally ordered. 2. 2.

Timestamps set by different data centres are different from each other121212This requirement can be fulfilled by assuming that a timestamp created by data centre $d$ has the form $n+o_{d}$ with a positive integer $n$ and an offset $o_{d}\in(0,1)$ , such that offsets of different data centres are different. Equivalently, one might use integer values for timestamps plus a total order on data centres that is used for comparisons in case two timestamps are otherwise equal.. 3. 3.

Timestamps respect the inherent order of message passing, i.e. when data with a timestamp $t$ is created at data centre $d$ and sent to data centre $d^{\prime}$ , then at the time the message is received the clock at $d^{\prime}$ must show a time larger than $t$ .

When the condition 3 is not met, a data centre may also adjust its clock for logical time synchronisation according to Lamport’s algorithm in [15]. For clock adjustment let us define $\text{adjust\_clock}(d,t)\equiv clock_{d}:=t^{\prime}$ , where $t^{\prime}$ is the smallest possible timestamp at data centre $d$ with $t\leq t^{\prime}$ .

2.3 Internal Request Handling for Replicated Memory

When dealing with a memory management subsystem with replication the request messages sent by agents $a$ remain the same, but the internal request handling by the memory management subsystem changes. This will define a refined concurrent communicating ASM $\mathcal{CM}_{1}=\{(a,\textit{asm\/}^{c}_{a})\}_{a\in\mathcal{A}}\cup\{(d,\textit{asm\/}_{d})\}_{d\in\mathcal{D}}$ .

Each agent $a$ possesses a private version of the shared location $(p_{i},\vec{k})$ parameterised by $a$ itself131313An exact definition for this is given by the notion of ambient dependent function in [6].. Consider a read request $\text{read}(p_{i},\varphi)$ received from agent $a$ by data centre $d$ —let $i$ be fixed for the rest of this section. Due to the fact that data is horizontally fragmented, we need to evaluate several requests $\text{read}(p_{i,j},\varphi)$ concerning keys $\vec{k}$ with $h_{i}(\vec{k})\in\textit{range}_{j}$ , one request for each fragment index $j$ , and then build the union so that $p_{i}[\varphi]=\bigcup_{j=1}^{q_{i}}p_{i,j}[\varphi]$ .

In order to evaluate $p_{i,j}[\varphi]$ several replicas of $\textit{Frag}_{j,i}$ will have to be accessed. Here we will leave out any details on how these replicas will be selected and accessed—this will be handled later by means of refinement. We only request that the selection of replicas complies with a read-policy. Such a policy will also be left abstract for the moment and defined later141414We also leave out the treatment of nodes that are not reachable. It can be tacitly assumed that if node $j$ at data centre $d$ cannot be reached, then all operations affecting data at this node will be logged and executed once the node becomes available again..

When reading actual data, i.e. evaluating $p_{i,j,d,j^{\prime}}(\vec{k})$ for selected key values $\vec{k}$ , we obtain different time-stamped values $(\vec{v},t)$ , out of which a value $\vec{v}$ with the latest timestamp is selected and sent to $a$ as the up-to-date value of $p_{i}(\vec{k})$ 151515When there is no record in relation $p_{i,j,d,j^{\prime}}$ with key $\vec{k}$ , we would normally write $p_{i,j,d,j^{\prime}}(\vec{k})=\textit{undef\/}$ in an ASM, but in the replication context it will be simpler to write $p_{i,j,d,j^{\prime}}(\vec{k})=(\textit{undef\/},-\infty)$ instead, i.e. all non-existing data are considered to carry the smallest possible timestamp denoted by $-\infty$ . Furthermore, if a record is deleted, we keep a deletion timestamp, so we may also find $p_{i,j,d,j^{\prime}}(\vec{k})=(\textit{undef\/},t)$ with $t>-\infty$ . As we will see, such a deletion timestamp $t$ becomes obsolete, once the value $(\textit{undef\/},t)$ has been assigned to all replicas, i.e. the assumption of deletion timestamps does not disable physical removal of data from the database.. The requirement that timestamps set by different data centres differ implies that for given $\vec{k}$ the value $\vec{v}$ with the latest timestamp is unique. All records obtained this way will be returned as the result of the read request to the issuing agent $a$ . Thus, we obtain the following ASM rule AnswerReadReq to-be-executed by any data centre $d$ upon receipt of a read request from an agent $a$ :

[TABLE]

Note that the unique value $\vec{v}$ with $p_{i,j,d^{\prime},j^{\prime}}(\vec{k})=(\vec{v},t_{\max}(\vec{k}))$ may be undef and that the returned $\text{answer}(p_{i},\varphi)$ may be the empty set.

For a write request write( $p_{i},p$ ) sent by agent $a$ to data centre $d$ we proceed analogously. In all replicas of $\textit{Frag}_{j,i}$ selected by a write-policy the records with a key value in $p$ will be updated to the new value—this may be undef to capture deletion—provided by $p$ , and a timestamp given by the current time $\textit{clock}_{d}$ . However, the update will not be executed, if the timestamp of the existing record is already newer. In addition, clocks that “are too late” will be adjusted, i.e. if the new timestamp received from the managing data centre $d$ is larger than the timestamp at data centre $d^{\prime}$ , the clock at $d^{\prime}$ is set to the received timestamp. Thus, we obtain the following ASM rule PerformWriteReq161616Again, we dispense with the handling of write-requests at nodes that are not available. For this we can assume an exception handling procedure that logs requests and executes them in the order of logging, once a node has become alive again. This could give rise to a refinement, which we omit here. to-be-executed by any data centre $d$ upon receipt of an update request from an agent $a$ :

[TABLE]

The clock adjustment is necessary to ensure that timestamps respect the inherent order of message passing as requested above. Write requests with old timestamps may be lost in case a value with a newer timestamp already exists. Then depending on the read-policy a lost update on a single replica may be enough for the value never to appear in an answer to a read-request. In the following we use the notions of complete and correct refinement171717Note that the notion of refinement for ASMs is more general than data refinement as discussed in [10, p.113]. In particular, correct refinement does not imply the preservation of invariants. as defined in [10, pp.111ff.].

Proposition 1

The concurrent communicating ASM $\mathcal{CM}_{1}=\{(a,\textit{asm\/}^{c}_{a})\}_{a\in\mathcal{A}}\cup\{(d,\textit{asm\/}_{d})\}_{d\in\mathcal{D}}$ is a complete refinement of the concurrent communicating ASM $\mathcal{CM}_{0}=\{(a,\textit{asm\/}^{c}_{a})\}_{a\in\mathcal{A}}\cup\{(db,\textit{asm\/}_{db})\}$ .

Proof

The only differences between the two communicating concurrent ASMs are the following:

•

Instead of having $\textit{home\/}(a)=db$ for all $a\in\mathcal{A}$ in the abstract specification, $\textit{home\/}(a)\in\mathcal{D}$ differs in the refinement. Nonetheless, in both cases the handling of a read or write request is done in a single step.

•

The rule fragment in the abstract specification

[TABLE]

dealing with a read request corresponds to a rule fragment

[TABLE]

in the rule AnswerReadReq in the refinement.

•

The rule fragment in the abstract specification

[TABLE]

dealing with a write request corresponds to a rule fragment

[TABLE]

in the rule PerformWriteReq in the refinement.

Thus, each run of the abstract communicating concurrent ASM defines in a natural way a run of the refined communicating concurrent ASM181818Actually, in this case the refinement is a (1,1)-refinement.. ∎

Note that without further knowledge about the read- and write-policies it is not possible to prove that the refinement is also correct.

3 Refinement Using Replication Policies

We define view compatibility, a notion for consistency that formalises the intuitive expectation of the agents sending requests that the answers in case of replication remain the same as without, because replication is merely an internal mechanism of the memory management subsystem to increase availability, which is completely hidden from the requesting agents. We then refine the ASMs for internal request handling by concrete read- and write-policies, and show that for particular combinations of read- and write-policies our abstract specification $\mathcal{CM}_{1}$ guarantees view compatibility, which further implies that the refinement of $\mathcal{CM}_{0}$ by $\mathcal{CM}_{1}$ is correct.

3.1 View Compatibility

Informally, view compatibility is to ensure that the system behaves in a way that whenever an agent sends a read- or write-request the result is the same as if the read or write had been executed in a state without replication or timestamps and without any internal processing of the request191919Note that view compatibility is a rather weak consistency requirement, as it only ensures that despite replication up-to-date values are read. However, as we will show later in Section 4, even this weak concistency requirement requires some form of transaction management.. However, it may be possible that parallel requests are evaluated in different states.

For a formal definition of this notion we have to relate runs of the concurrent communicating ASM with the memory management subsystem with runs, where in each state a virtual location $(p_{i},\vec{k})$ has only one well-defined value $\vec{v}$ instead of computing such a value from the different replicas that may even originate from different states. For this we first introduce the technical notion of a flattening: we simply reduce the multiple values associated with replicas of a location $\ell$ to a single value.

Formally, consider runs of the concurrent ASM $\mathcal{CM}_{1}=\{(a,\textit{asm\/}^{c}_{a})\}_{a\in\mathcal{A}}\cup\{(d,\textit{asm\/}_{d})\}_{d\in\mathcal{D}}$ , where the ASMs $\textit{asm\/}^{c}_{a}$ sends read and write requests that are processed by the data centre agents $d\in\mathcal{D}$ , which return responses.

Definition 2

If $S_{0},S_{1},S_{2},\dots$ is a run of $\mathcal{CM}_{1}=\{(a,\textit{asm\/}^{c}_{a})\}_{a\in\mathcal{A}}\cup\{(d,\textit{asm\/}_{d})\}_{d\in\mathcal{D}}$ , then we obtain a flattening $S_{0}^{\prime},S_{1}^{\prime},S_{2}^{\prime}$ , … by replacing in each state all locations $(p_{i,j,d^{\prime},j^{\prime}},\vec{k})$ by a single location $(p_{i},\vec{k})$ and letting the value associated with $(p_{i},\vec{k})$ in the considered state be one of the values in the set

[TABLE]

in the considered state.

Obviously, a flattening is a sequence of states of the concurrent ASM $\{(a,\textit{asm\/}^{c}_{a})\}_{a\in\mathcal{A}}$ , but in most cases it will not be a run. So the question is, under which conditions we can obtain a run. As we stated above that the system shall behave as if there is no replication, we have to ensure that for each agent the following property holds: If the agent sends a write-request that would update the value of a location $\ell$ to $\vec{v}$ , the effect would be the same in the flattening. Analogously, if the agent sends a read-request for location $\ell$ , the answer it receives would be also the same as if a read is executed in the flattening.

In order to formalise this property we define agent views, which emphasise only the moves of a single agent $a\in\mathcal{A}$ , while merging all other moves into activities of the environment as defined in [10].

Let us first consider the agent view of a concurrent run $S_{0},S_{1},\dots$ of our communicating ASMs $\{(a,\textit{asm\/}^{c}_{a})\}_{a\in\mathcal{A}}$ . Let $a\in\mathcal{A}$ be an arbitrary agent. Its view of the run is the subsequence of states $S_{a,0},S_{a,1},\dots$ in which $a$ makes a move (restricted to the signature of $a$ ). Thus $S_{a,0}$ is $a$ ’s initial state, the state $S_{j}$ (where $j$ depends on $a$ ) in which $a$ performs its first step in the given run. Given any state $S_{k}=S_{a,n}$ , its successor state in the $a$ -view sequence depends on the move $a$ performs in $S_{a,n}$ .

If $a$ in $S_{a,n}$ performs a Send step—a write- or read-request to $\textit{{home\/}\/}(a)$ —it contributes to the next state $S_{k+1}$ by an update set which includes an update of its out-mailbox, which in turn is assumed to eventually yield an update of the mailbox of $\textit{{home\/}\/}(a)$ . But $S_{k+1}$ is not yet the next $a$ -view state, in which $a$ will perform its next move. This move is determined by the following assumption:

Atomic Request/Reply Assumption for agent/db runs: If in a run an agent performs a Send step to the memory management system, then its next step in the run is the corresponding Receive step, which can be performed once the answer to the query sent by the memory management system has been Received.

By this assumption the next $a$ -view state $S_{a,n+1}=S_{l}$ is determined by (what appears to $a$ as) an environment action which enables the Receive step by inserting the reply message into $a$ ’s mailbox and thereby making the $Received$ predicate true in $S_{l}$ for some $l>k$ . 2. 2.

If $a$ in $S_{a,n}=S_{k}$ performs a Receive or an internal step, then besides the mailbox update to Consume the received message it yields only updates to non-shared locations so that its next $a$ -view state is the result of applying these updates together with updates other agents bring in to form $S_{k+1}=S_{a,n+1}$ .

Note that by the Atomic Request/Reply Assumption any agent can make finitely many internal steps after and only after each Receive step. For the ease of exposition but without loss of generality we synchronize internal steps of an agent with the global steps other agents perform during such an internal computation segment so that the result of internal moves becomes visible in the global run view.

We now define a notion of flat agent view of a run $S_{0},S_{1},\dots$ of the concurrent agent ASM $\{(a,\textit{asm\/}^{c}_{a})\}_{a\in\mathcal{A}}\cup\{(d,\textit{asm\/}_{d})\}_{d\in\mathcal{D}}$ , including the memory management subsystem.

Take an arbitrary subsequence $S_{j_{0}}^{\prime},S_{j_{1}}^{\prime},\dots$ of an arbitrary flattening $S_{0}^{\prime},S_{1}^{\prime},\dots$ (restricted to the signature of the agent $a$ ) of $S_{0},S_{1},\dots$ . Then $S_{j_{0}}^{\prime},S_{j_{1}}^{\prime},\dots$ is called a flat view of agent $a$ of the run $S_{0},S_{1},\dots$ if the following conditions hold:

•

Whenever $a$ performs a request in state $S_{k}$ there is some $S_{j_{i}}^{\prime}$ such that $k=j_{i}$ . If the corresponding reply is received in state $S_{m}$ for some $m>k$ (so that $a$ makes a Receive move in state $S_{m}$ ), then $S_{j_{i+1}}^{\prime}=S_{m}$ . Furthermore there exists some $n$ with $k<n\leq m$ such that the following holds:

–

If the request is a write-request, then for each location $\ell$ with value $v$ in this request $val_{S_{n}^{\prime}}(\ell)=v$ holds, provided there exists an agent reading202020Otherwise the update at location $\ell$ will be lost. For instance, this may happen in case of outdated timestamps. the value $v$ .

–

If the request is a read-request, then for each location $\ell$ with value $v$ in the answer $val_{S_{n}^{\prime}}(\ell)=v$ holds.

•

Whenever $a$ performs a Receive or an internal move in state $S_{k}$ there is some $j_{i}$ such that $S_{k}=S_{j_{i}}^{\prime}$ and $S_{k+1}=S_{j_{i+1}}^{\prime}$

Definition 3

We say that $\{(a,\textit{asm\/}^{c}_{a})\}_{a\in\mathcal{A}}\cup\{(d,\textit{asm\/}_{d})\}_{d\in\mathcal{D}}$ is view compatible with the concurrent ASM $\{(a,\textit{asm\/}^{c}_{a})\}_{a\in\mathcal{A}}\cup\{(db,\textit{asm\/}_{db})\}$ iff for each run $\mathcal{R}=S_{0},S_{1},S_{2},\dots$ of $\{(a,\textit{asm\/}^{c}_{a})\}_{a\in\mathcal{A}}\cup\{(d,\textit{asm\/}_{d})\}_{d\in\mathcal{D}}$ there exists a subsequence of a flattening $\mathcal{R}^{\prime}=S_{0}^{\prime},S_{1}^{\prime},S_{2}^{\prime},\dots$ that is a run of $\{(a,\textit{asm\/}^{c}_{a})\}_{a\in\mathcal{A}}\cup\{(db,\textit{asm\/}_{db})\}$ such that for each agent $a\in\mathcal{A}$ the agent $a$ -view of $\mathcal{R}^{\prime}$ coincides with a flat view of $\mathcal{R}$ by $a$ .

Note that in this definition of view compatibility we relate agent views with flat agent views, and in both these views we consider the restriction to the signature of the agent $a$ . Technically, the mailboxes associated with $a$ belong to this signature, so every SEND and RECEIVE state $S_{i}$ (for $a$ ) and its flattening $S_{i}^{\prime}$ appears in the views. We can understand the agent view and the flat agent view in such a way, that the environment reads the request (i.e. the message in the out-mailbox) in state $S_{j_{i}}^{\prime}$ , evaluates it in state $S_{n}^{\prime}$ , and places the answer into the in-mailbox of $a$ in state $S_{j_{i+1}}^{\prime}$ (for $j_{i}<n<j_{i+1}$ as requested in the definition). This can also formally be considered as a move by a single agent $m$ representing the memory management subsystem, which executes the ASM rules that are associated with each request.

The notion of view compatibility is closely related to sequential consistency as defined by Lamport. In Lamport’s work an execution is a set of sequences of operations, one sequence per process. An operation is a pair (operation, value), where the operation is either read or write and value is the value read or written. The execution is sequentially consistent iff there exists an interleaving of the set of sequences into a single sequence of operations that is a legal execution of a single-processor system, meaning that each value read is the most recently written value. (One also needs to specify what value a read that precedes any write can obtain.) Our definition of view compatibility generalises this notion, as we always consider bulk requests and also permit parallel operations.

3.2 Specification of Replication Policies

In the specification of ASM rules handling read and write requests by a fixed data centre $d$ we used sets $G_{i,j}\subseteq C_{i,j}$ with $C_{i,j}=\{(d^{\prime},j^{\prime})\mid\textit{copy\/}(i,j,d^{\prime},j^{\prime})\}$ ( $i,j$ are given by the request and the fragmentation) as well as an abstract predicate $\textit{complies\/}(G_{i,j},\text{policy})$ . It is allowed to specify different policies for read and write, but only one policy is used for all read requests, and only one for all write requests.

Let us now define different such policies and refine the ASMs for the data centre agents $d$ by means of these definitions. Basically, we distinguish policies that are global and those that are local, the latter ones requesting that only replicas in the handling data centre $d$ are considered for the set $G_{i,j}$ . Thus we can use the following definition:

[TABLE]

In addition the different policies differ only in the number of replicas that are to be accessed. Major global policies are All, One, Two, Three, Quorum, and Each_Quorum, while the corresponding local policies are Local_One and Local_Quorum.

All.

As the name indicates, the predicate $\textit{complies\/}(G_{i,j},\textsc{All})$ can be defined by $G_{i,j}=C_{i,j}$ , i.e. all replicas are to be accessed.

One, Two, Three.

Here, at least one, two or three replicas are to be accessed, which defines $\textit{complies\/}(G_{i,j},\textsc{One})$ , $\textit{complies\/}(G_{i,j},\textsc{Two})$ and $\textit{complies\/}(G_{i,j},\textsc{Three})$ by $|G_{i,j}|\geq 1$ , $|G_{i,j}|\geq 2$ , and $|G_{i,j}|\geq 3$ , respectively.

Local_One.

Analogously, $\textit{complies\/}(G_{j},\textsc{Local\_One})$ is defined by the conjunction $\textit{local\/}\;\wedge\;\textsc{ONE}$ .

Quorum( $q$ ).

For a value $q$ with $0<q<1$ $\textit{complies\/}(G_{i,j},\textsc{Quorum}(q))$ is defined by $q\cdot|C_{i,j}|<|G_{i,j}|$ . By default, the value $q=\frac{1}{2}$ is used, i.e. a majority of replicas has to be accessed. Note that All could be replaced by Quorum( $1$ ).

Local_Quorum( $q$ ).

Analogously, $\textit{complies\/}(G_{i,j},\textsc{Local\_Quorum}(q))$ can be defined by $\textit{local\/}\;\wedge\;\textsc{Quorum}(q)$ .

Each_Quorum( $q$ ).

For this we have to consider each data centre separately, for which we need $C_{i,j,\bar{d}}=\{(d^{\prime},j^{\prime})\in C_{i,j}\mid d^{\prime}=\bar{d}\}$ and $G_{i,j,\bar{d}}=G_{i,j}\cap C_{i,j,\bar{d}}$ . Then the definition of $\textit{complies\/}(G_{i,j},\textsc{Each\_Quorum}(q))$ becomes $\forall\bar{d}\in\mathcal{D}_{i}.q\cdot|C_{i,j,\bar{d}}|<|G_{i,j,\bar{d}}|$ .

3.3 Consistency Analysis

In the following we will analyse in more detail the effects of the replication policies. For this we need appropriate combinations of read- and write-policies:

•

If the write policy is ALL, then the combination with any read policy is appropriate.

•

If the read policy is ALL, then the combination with any write policy is appropriate.

•

If the write-policy is Quorum $(q)$ or Each_Quorum $(q)$ and the read-policy is Quorum $(q^{\prime})$ or Each_Quorum $(q^{\prime})$ with $q+q^{\prime}\geq 1$ , then the combination is appropriate.

Proposition 2

Let $\mathcal{CM}_{1}=\{(a,\textit{asm\/}^{c}_{a})\}_{a\in\mathcal{A}}\cup\{(d,\textit{asm\/}_{d})\}_{d\in\mathcal{D}}$ be the concurrent communicating ASM with a memory management subsystem using data centres $d\in\mathcal{D}$ as specified in the previous section. If the combination of the read and write policies is appropriate, then the system is view compatible with the concurrent ASM $\{(a,\textit{asm\/}^{c}_{a})\}_{a\in\mathcal{A}}$ .

Proof

We exploit that in our abstract specification of the handling of write- and read-requests all selected replicas are written and read in parallel. Thus, if the write-policy is All, then in each state always the same value is stored in all replicas, so the proposition is obvious for the policy All.

If the write-policy is Quorum $(q)$ or Each_Quorum $(q)$ , then for each location $\ell$ the multiplicity of replica considered to determine the value with the largest timestamp is at least $\lceil\frac{m+1}{2}\rceil$ with $m$ being the total number of copies. For this it is essential that updates with a smaller timestamp are rejected, if a value with a larger timestamp already exists.

Consequently, each read access with one of the policies Quorum $(q^{\prime})$ , Each_Quorum $(q^{\prime})$ (with $q+q^{\prime}\geq 1$ ) or All will read at least once this value and return it, as the value with the largest timestamp will be used for the result. That is, in every state only the value with the largest timestamp for each location uniquely determines the run, which defines the equivalent concurrent run.∎

Corollary 1

If $\mathcal{CM}_{1}=\{(a,\textit{asm\/}^{c}_{a})\}_{a\in\mathcal{A}}\cup\{(d,\textit{asm\/}_{d})\}_{d\in\mathcal{D}}$ is view compatible, then it is also a correct refinement of $\mathcal{CM}_{0}=\{(a,\textit{asm\/}^{c}_{a})\}_{a\in\mathcal{A}}\cup\{(db,\textit{asm\/}_{db})\}$ .

Proof

Let $\mathcal{R}=S_{0},S_{1},\dots$ be a run of $\mathcal{CM}_{1}$ . According to the definition of view compatibility there exists a flattened subrun $\mathcal{R}^{\prime}=S_{i_{0}}^{\prime},S_{i_{1}}^{\prime},\dots$ that is also a concurrent run of $\mathcal{CM}_{0}=\{(a,\textit{asm\/}^{c}_{a})\}_{a\in\mathcal{A}}\cup\{(db,\textit{asm\/}_{db})\}$ such that for each agent $a\in\mathcal{A}$ the projections of $\mathcal{R}$ and $\mathcal{R}^{\prime}$ coincide.∎

A stronger notion of consistency would be global consistency, for which we even require that any run of the ASM with the memory management behaves in the same way as a concurrent ASM, i.e. different from the view compatibility we even require that parallel read- and write requests can be seen as referring to the same state. Formally, this requires to strengthen the requirements for a flat view such that for parallel request the index $n$ (with $k<n\leq m$ ) is always the same. Then the specification must be refined to handle also all parallel requests in parallel by the data cente agent.

4 Refinement with Internal Communication

We will now address a refinement of our specification of the memory management subsystem. So far in $\mathcal{CM}_{1}$ the home data centre agent $d$ associated with an agent $a$ manages in one step the retrieval of data from or update of data of sufficiently many replicas as specified by the read- and write-policies, respectively. In doing so we abstracted from any internal communication between data centres.

However, in reality data centres refer to different physical machines, so the gist of the refinement is to treat the handling of a request as a combination of direct access to local nodes, remote access via messages to the other relevant data centres, and collection and processing of return messages until the requirements for the read- or write-policy are fulfilled. That is, the validation of the policy accompanies the preparation of a response message and is no longer under control of the home agent.

We first show again that the refinement is a complete ASM refinement. Then we investigate again view compatibility and show that it is not preserved by the refinement. In fact, view compatibility is linked to the refinement being also correct, i.e. for each run of the concrete concurrent system we find a corresponding run of the abstract concurrent system. In general, however, this is not the case. We can even show that view compatibility implies view serialisability, which means that consistency (if desired) can only be guaranteed, if transactions (at least for single requests) are used. On the other hand, transactions together with appropriate read- and write-policies trivially imply view compatibility.

4.1 Request Handling with Communicating Data Centres

In Section 2 we specified how a data centre $d$ (treated as an agent) handles a request received from an agent $a$ with $\textit{home\/}(a)=d$ . We distinguished read requests read( $p_{i},\varphi$ ) subject to a read-policy and write requests subject to a write-policy. Now, we first specify an abstract rule which manages external requests, i.e. coming from an agent $a$ and received by any data centre $d$ , where request is one of these read or write requests. Essentially an external request is forwarded as internal request to all other data centres $d^{\prime}\in\mathcal{D}_{i}$ , where it is handled and answered locally (see the definition of HandleLocally below), whereas collecting (in $\text{answer}(p_{i},\varphi)$ ) and sending the overall answer to the external agent $a$ is delegated to a new agent $a^{\prime}$ . SELF denotes the data centre agent $d$ which executes the rule.

[TABLE]

To Initialize a delegate $a$ it is equipped with a set $\text{answer}_{a}$ , where to collect the values arriving from the asked data centres and with counters $\textit{count\/}_{a}(j,d)$ (for the number of inspected replicas of the $j$ -th fragment at data centre $d$ ) and $\textit{count\/}_{a}(j)$ (for the number of inspected replicas of the $j$ -th fragment). The counters serve to check compliance with policies. The $mediator$ and $requestor$ information serves to retrieve sender and receiver once the delegate could complete the answer to the request it has been created for.

In this way the request handling agent $d$ simply forwards the request to all other data centre agents and in parallel handles the request locally for all nodes associated with $d$ . The newly created agent (a ‘delegate’) will take care of collecting all response messages and preparing the response to the issuing agent $a$ . Request handling by any other data centre $d^{\prime}$ is simply done locally using the following rule.

[TABLE]

That is we equip each data centre (agent) $d$ (of a cluster $\mathcal{D}_{i}$ related to $p_{i}$ ) with the following ASM program, where the components HandleLocally and the two versions of CollectRespond are defined below.

[TABLE]

For local request handling we preserve most of what was specified in Section 2 with the difference that checking the policy is not performed by the data centre agent but by the delegate of the request; check the predicates all_messages_received and sufficient(policy) below. We use a predicate alive to check, whether a node is accessible or not. A possible interpretation of this liveness concept is that a node is inaccessible if it does not reply fast enough. For a read request we specify HandleLocally $(\text{read}(p_{i},\varphi),a^{\prime},t_{\text{current}})$ as follows.

[TABLE]

Here we evaluate the request locally, but as the determined maximal timestamp may not be globally maximal, it is part of the returned relation. Also the number of replicas that are alive and thus contributed to the local result is returned, such that the delegate $a^{\prime}$ responsible for collection and final evaluation of the request can check the satisfaction of the read-policy. For this the created partial result is not returned to the agent $d$ that issued this local request, but instead to the delegate (the collection agent).

For the write requests we proceed analogously. For the handling of an update request the rule HandleLocally $(\text{write}(p_{i},p),a^{\prime},t_{\text{current}})$ is specified as follows.

[TABLE]

Again, the partial results acknowledging the updates at the nodes associated with data centre $d^{\prime}$ are sent to the collecting agent $a^{\prime}$ , which will verify the compliance with the write-policy. Note that even a non-successful update at a location $(p_{i,j,d^{\prime},j^{\prime}},\vec{k})$ —this may result, if already a value with a larger timestamp exists—will be counted for $|G_{i,j,d^{\prime}}|$ .

For the delegate $a^{\prime}$ that has been created by $d$ for a request with the task to collect partial responses and to create the final response to the agent $a$ issuing the request we need predicates sufficient(policy) to check, whether the read and write policies are fulfilled, in which case the response to $a$ is prepared and sent. These predicates are defined for each $i$ as follows.

[TABLE]

It remains to specify the delegate rules CollectRespondToRead and CollectRespondToWrite, the programs associated with the agent $a^{\prime}$ , which was created upon receiving a $request$ from an agent $a$ . The CollectRespond action is performed until all required messages have been received and splits into two rules for read and write requests, respectively.

While being alive, the delegate $a^{\prime}$ collects the messages it receives from the data centres $d^{\prime}$ to which the original $request$ had been forwarded to let them HandleLocally $(request,a^{\prime})$ . If the set of collected answers suffices to respond, the delegate sends an answer to the original requester and kills itself.212121We leave it to the garbage collector to deal with later arriving messages, that is messages addressed to a delegate which has been deleted already. Thus each of the rules of CollectRespond has a Collect and a Respond subrule with corresponding parameter for the type of expected messages.

[TABLE]

The analogous collection of messages for write requests is simpler, as the final response is only an acknowledgement.

[TABLE]

Note that in our specification we do not yet deal with exception handling. We may tacitly assume that eventually all requested answers will be received by the collecting agent.

4.2 Analysis of the Refinement

First we investigate the interaction of one agent $a$ with the database. By $\mathcal{CM}_{1}$ we already denoted the abstract concurrent ASM $\{(a,\textit{asm\/}^{c}_{a})\}_{a\in\mathcal{A}}\cup\{(d,\textit{asm\/}_{d})\}_{d\in\mathcal{D}}$ from Section 2. Now let $\mathcal{CM}_{2}$ denote the refined concurrent ASM $\{(a,\textit{asm\/}^{c}_{a})\}_{a\in\mathcal{A}}\cup\{(d,\textit{asm\/}_{d}^{\prime})\}_{d\in\mathcal{D}\cup\mathcal{E}xt}$ together with the dynamic set $\mathcal{E}xt$ of delegates from Subsection 4.1. Note that the delegates are created on-the-fly by the agents $d\in\mathcal{D}$ , needed for collecting partial responses for each request and preparing the final responses.

Let $S_{0},S_{1},\dots$ be a concurrent run of $\mathcal{CM}_{1}$ , which we first look at from the perspective of a single agent $a\in\mathcal{A}$ . In each of its non purely local steps $\textit{asm\/}^{c}_{a}$ produces an update set, which together with update sets produced by other agents is applied to some state (one of the $S_{j}$ ). Let $S$ (another $S_{i}$ with $j<i$ ) denote the resulting state. Apart from updates to non-shared locations with function symbols in $\Sigma_{a}$ the updates brought into the state $S$ are read and write requests sent to the memory management subsystem. Then agent $a$ continues in some state $S^{\prime}=S_{i+x}$ evaluating the responses to the read and write requests it had sent. Furthermore, there is another state $S^{\prime\prime}=S_{i+y}$ ( $y\geq x$ ), where all updates issued by write requests of $a$ have been propagated to all replicas.

The transition from $S$ to $S^{\prime}$ is achieved by means of a step of $\textit{asm\/}_{d}$ for $d=\text{{home\/}}(a)$ , and the same holds for the transition from $S^{\prime}$ to $S^{\prime\prime}$ . Therefore, there exist corresponding states $\bar{S}$ , $\bar{S}^{\prime}$ for $\mathcal{CM}_{2}$ , in which the unchanged $\textit{asm\/}^{c}_{a}$ brings in the same read and write requests and receives the last response, respectively. In a concurrent run for $\mathcal{CM}_{2}$ the transition from $\bar{S}$ to $\bar{S}^{\prime}$ results from several steps by the subsystem $\{(d,\textit{asm\/}_{d}^{\prime})\}_{d\in\mathcal{D}\cup\mathcal{E}xt}$ .

With respect to each of the requests received from $a$ the agent $d=\text{{home\/}}(a)$ contributes to a state $\bar{S}_{1}$ with requests for each agent $d^{\prime}\in\mathcal{D}$ , $d^{\prime}\neq d$ , the creation and initialisation of a response collecting agent $a^{\prime}$ , and the local handling of the request at nodes associated with data centre $d$ . Then each agent $d^{\prime}\in D$ contributes to some state $\bar{S}_{k}$ ( $k>1$ ), in which the partial response to the request sent to $d^{\prime}$ is produced. Concurrently the collection agent $a^{\prime}$ on receipt of a partial response updates its own locations—these comprise the read response $\varrho$ , the acknowledgement responses and several counters—and thus contributes to some state $\bar{S}_{j}$ ( $j>1$ ). Finally, $a^{\prime}$ will also produce and send the response to $a$ . This response will be the same as the one in state $S^{\prime}$ , if the refined run from $\bar{S}$ to $\bar{S}^{\prime}$ uses the same selection of copies for each request referring to $p_{i}$ and each fragment $\textit{Frag}_{j,i}$ .

Furthermore, some partial responses may arrive in states $\bar{S}_{j}$ with $j>k$ for $\bar{S}^{\prime}=\bar{S}_{k}$ . These are not processed by $a^{\prime}$ any more, but by the garbage collector. Thus the state $\bar{S}^{\prime\prime}$ , when all updates have been propagated, corresponds to state $S^{\prime\prime}$ . Taking these considerations together we have shown the following lemma.

Lemma 1

The concurrent ASM $\mathcal{CM}_{2,a}=\{(a,\textit{asm\/}^{c}_{a})\}\cup\{(d,\textit{asm\/}_{d}^{\prime})\}_{d\in\mathcal{D}\cup\mathcal{E}xt}$ is a complete refinement of $\mathcal{CM}_{1,a}=\{(a,\textit{asm\/}^{c}_{a})\}\cup\{(d,\textit{asm\/}_{d})\}_{d\in\mathcal{D}}$ .

Now consider the actions of all agents $a\in\mathcal{A}$ together. The arguments above remain valid, except that for the sequence $\bar{S},\bar{S}_{1},\bar{S}_{2},\dots,\bar{S}^{\prime},\dots,\bar{S}^{\prime\prime}$ there are many intermediate states, in which updates of other agents and other data centres for other requests are brought in. A partial response for a request by $a$ and the effects on values in the replicas may in general differ, as different partial responses may result depending on the order of request handling. However, there exists a run of $\mathcal{CM}_{2}$ such that the projection to $\mathcal{CM}_{2,a}$ is exactly the sequence $\bar{S},\bar{S}_{1},\bar{S}_{2},\dots,\bar{S}^{\prime},\dots,\bar{S}^{\prime\prime}$ investigated before. This gives us the following result.

Proposition 3

$\mathcal{CM}_{2}=\{(a,\textit{asm\/}^{c}_{a})\}_{a\in\mathcal{A}}\cup\{(d,\textit{asm\/}_{d}^{\prime})\}_{d\in\mathcal{D}\cup\mathcal{E}xt}$ * is a complete refinement of $\mathcal{CM}_{1}=\{(a,\textit{asm\/}^{c}_{a})\}_{a\in\mathcal{A}}\cup\{(d,\textit{asm\/}_{d})\}_{d\in\mathcal{D}}$ .*

4.3 Consistency Analysis

While we have obtained a complete refinement, $\mathcal{CM}_{2}$ permits more possibilities to perform updates to replicas and reading values from replicas, respectively, in different orders. Unfortunately, view compatibility (as in Proposition 2) cannot be preserved, as the following simple counterexample shows:

Consider a single location $(p_{i},\vec{k}))$ for a fixed $\vec{k}$ . For simplicity forget about the relation symbols $p_{i}$ and let this location simply be $x$ . Assume that there are at least two replicas (say $x_{1}$ and $x_{2}$ ), which both are set to the initial value [math] (with some timestamp $t_{0}$ ). Let the write policy be All and the read policy be One (which according to Proposition 2 is an appropriate combination of policies that leads to view compatibility for the abstract specification $\mathcal{CM}_{1}$ ).

Let $a_{1}$ be an agent issuing a write request SEND(write( $x$ ,1)), and let $a_{2}$ be another agent issuing a sequence of two read requests SEND(read( $x$ )). Then in $\mathcal{CM}_{2}$ the following sequence is possible:

The write request by $a_{1}$ gives rise to the update of replica $x_{1}$ , so the value is set to $(1,t)$ with some timestamp $t>t_{0}$ . 2. 2.

For the first read request by $a_{2}$ only the replica $x_{1}$ is evaluated, and thus the returned answer is $1$ . 3. 3.

For the following second read request by $a_{2}$ only the replica $x_{2}$ is evaluated, and thus the returned answer is [math]. 4. 4.

All other updates required for the completion of the write request by $a_{1}$ happen after the completion of the two read requests.

Obviously, there cannot be a run of the communicating concurrent ASM without replication that produces the same answers to the read and write requests.

From the counterexample we can even conclude that there is no reasonable weaker definition of consistency such that an appropriate combination of read- and write-policies alone, even under strong assumptions that no bulk requests are considered, suffices to guarantee consistency. For such a definition the run in the counterexample would have to be called “consistent”, which does not make much sense.

The question is whether additional conditions could be enforced in the specification that would lead again to view compatibility. As we will see, any such condition already implies serialisability. Therefore, we conclude our analysis by showing this relationship between view compatibility and (view) serialisability, which implies that the best and well explored way to ensure consistency is to exploit transactions (at least for single read and write requests).

Let us first define the notion of serialisability. For this consider an arbitrary run $\mathcal{R}=S_{0},S_{1},\dots$ of $\mathcal{CM}_{2}$ . A request is either a read request of the form $(\text{read}(p_{i},\varphi),\text{from:}a)$ or a write request of the form $(\text{write}(p_{i},p),\text{from:}a)$ . If $S_{j}$ is the first state, in which RECEIVED( $r$ ) holds for such a request $r$ , we call $S_{j}$ the state, in which the request $r$ is issued, and write $\sigma(r)=j$ as well as $ag(r)=a$ . Analogously, a response $r$ takes either the form $(\text{answer}(p_{i},\varphi),\text{to:}a)$ or $(\text{acknowledge}(\text{write},p_{i}),\text{to:}a)$ . We call the state $S_{k}$ , in which SEND( $r$ ) becomes effective, the state, in which the response $r$ is issued, and write also $\sigma(r)=j$ as well as $ag(r)=a$ . Then $\sigma$ defines a partial order $\preceq$ on the set of requests and responses in the run as well as an equivalence relation $\sim$ for simultaneous requests and responses. Furthermore, there is a bijection ans that maps each request to its corresponding response.

In a serial run a request is immediately followed by its corresponding response without any other request or response in between, but requests may be handled in parallel. Thus, formally we call the run $\mathcal{R}$ serial iff the following two conditions hold:

If $r$ is a request and $r^{\prime}$ is another request or response with $\sigma(r)\leq\sigma(r^{\prime})\leq\sigma(\textit{ans\/}(r))$ , then either $r\sim r^{\prime}$ or $r^{\prime}\sim\textit{ans\/}(r)$ hold. 2. 2.

For two simultaneous requests $r\sim r^{\prime}$ we also have $\textit{ans\/}(r)\sim\textit{ans\/}(r^{\prime})$ .

Definition 4

Two runs $\mathcal{R}$ and $\mathcal{R}^{\prime}$ are called view equivalent iff the following two conditions hold:

$\mathcal{R}$ and $\mathcal{R}^{\prime}$ contain exactly the same requests and responses. In particular, in a response $(\text{answer}(p_{i},\varphi),\text{to:}a)$ to a read request $(\text{read}(p_{i},\varphi),\text{from:}a)$ the sets $\text{answer}(p_{i},\varphi)$ of tuples are identical in both runs. 2. 2.

For each agent $a$ the sequence of its requests and responses is identical in both runs, i.e. if $\sigma(r)<\sigma(r^{\prime})\wedge ag(r)=ag(r^{\prime})=a$ holds in $\mathcal{R}$ , it also holds in $\mathcal{R}^{\prime}$ .

Then a run $\mathcal{R}$ is called view serialisable iff there exists a serial run $\mathcal{R}^{\prime}$ that is view equivalent to $\mathcal{R}$ .

Informally phrased, a run is view serialisable if there exists a serial run, in which for each agent the requests and corresponding responses appear in the same order and the same results are produced.

Proposition 4

If $\mathcal{CM}_{2}$ is view compatible with the concurrent ASM $\mathcal{CM}_{0}=\{(a,\textit{asm\/}^{c}_{a})\}_{a\in\mathcal{A}}\cup\{(db,\textit{asm\/}_{db})\}$ , then every run $\mathcal{R}$ of $\mathcal{CM}_{2}$ is view serialisable.

Proof

Let $\mathcal{R}=S_{0},S_{1},\dots$ be a run of $\mathcal{CM}_{2}$ . The definition of view compatibility implies that there exists a subsequence of a flattening $\mathcal{R}^{\prime}=S_{0}^{\prime},S_{1}^{\prime},S_{2}^{\prime},\dots$ that is a run of $\{(a,\textit{asm\/}^{c}_{a})\}_{a\in\mathcal{A}}\cup\{(db,\textit{asm\/}_{db})\}$ such that for each agent $a\in\mathcal{A}$ the agent $a$ -view of $\mathcal{R}^{\prime}$ coincides with a flat view of $\mathcal{R}$ by $a$ . Let $\Delta_{\ell}^{\prime}$ be the update set defined by $S_{\ell}^{\prime}+\Delta_{\ell}^{\prime}=S_{\ell+1}^{\prime}$ . Define

[TABLE]

using timestamps $t_{0}<t_{1}<t_{2}<\dots$ . This defines a run $\bar{\mathcal{R}}=\bar{S}_{0},\bar{S}_{1},\dots$ of $\mathcal{CM}_{2}$ with $\bar{S}_{0}=S_{0}$ and $\bar{S}_{\ell+1}=\bar{S}_{\ell}+\Delta_{\ell}$ . In this run the answer to a read request is executed in a single step and all updates requested by a write request are executed in a single step. Thus $\mathcal{R}^{\prime}$ is serial.

Furthermore, as for each agent $a\in\mathcal{A}$ the agent $a$ -view of $\mathcal{R}^{\prime}$ coincides with a flat view of $\mathcal{R}$ by $a$ , the runs runs $\mathcal{R}$ and $\bar{\mathcal{R}}$ contain exactly the same requests and responses, and for each agent $a$ the sequence of its requests and responses is identical in both runs. That is, $\mathcal{R}$ and $\bar{\mathcal{R}}$ are view equivalent.∎

We can finally also show the inverse of the implication in Proposition 4, provided that an appropriate combination of read- and write-policies is used.

Proposition 5

If all runs of $\mathcal{CM}_{2}$ are view serialisable and an appropriate combination of a read and a write policy is used, then $\mathcal{CM}_{2}$ is also view compatible with the concurrent ASM $\mathcal{CM}_{0}=\{(a,\textit{asm\/}^{c}_{a})\}_{a\in\mathcal{A}}\cup\{(db,\textit{asm\/}_{db})\}$ .

Proof

Let $\mathcal{R}^{\prime}$ be a run of $\mathcal{CM}_{2}$ and let $\mathcal{R}^{\prime}=S_{0},S_{1},\dots$ be a view equivalent serial run. It suffices to show that that there exists a subsequence of a flattening $\mathcal{R}^{\prime}=S_{0}^{\prime},S_{1}^{\prime},S_{2}^{\prime},\dots$ that is a run of $\{(a,\textit{asm\/}^{c}_{a})\}_{a\in\mathcal{A}}\cup\{(db,\textit{asm\/}_{db})\}$ such that for each agent $a\in\mathcal{A}$ the agent $a$ -view of $\mathcal{R}^{\prime}$ coincides with a flat view of $\mathcal{R}$ by $a$ .

For this we only have to consider states, in which a request or a response is issued to obtain the desired subsequence, and the flattening is defined by the answers to the write requests. As $\mathcal{R}$ is serial, the condition that for each agent $a$ the $a$ -view concides with a flat $a$ -view follows immediately.∎

5 Conclusions

In this paper we demonstrated the maturity of concurrent communicating ASMs (ccASMs) [7, 8] showing how they can be used to specify and analyse concurrent computing systems in connection with shared replicated memory. Using ccASMs we first specified a ground model, in which all access to replicas is handled synchronously in parallel by a single agent.

We then refined our ground model addressing the internal communication in the memory management subsystem. This refinement significantly changes the way requests are handled, as the replicas are not selected a priori in a way that complies with the read- or write-policy, but instead the acknowledgement and return of a response depends on these policies. This adds an additional level of flexibility to the internal request handling.

We used the specification to analyse consistency and showed that consistency, formalised by the notion of view compatibility, cannot be preserved by the refinement. To the contrary, we could show that even such a rather weak notion of consistency can only be obtained by adopting transactions for at least single requests.

The refinements could be taken further to capture more and more details of the physical data organisation. For instance, we did not yet tackle the means for handling inactive nodes and for recovery. Nonetheless, while our refined specification is still rather abstract, it shows the way how concurrent systems interact with replicative memory management subsystems, and permits analysis of which consistency level can be obtained.

It seems straightforward to combine a transactional concurrent system as for example in [9] with the specification of a replicative storage system as done in this paper. In particular, it is well known that replication strategies can be easily combined with transaction management. This would then enable stronger consistency results.

Bibliography26

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] G. Agha. A Model of Concurrent Computation in Distributed Systems . MIT Press, Cambridge, Mass., 1986.
2[2] W. An. Formal specification and analysis of asynchronous mutual exclusion algorithms. Master’s thesis, JKU Linz, Austria, 2016.
3[3] E. Best. Semantics of sequential and parallel programs . Prentice Hall, 1996.
4[4] A. Blass and Y. Gurevich. Abstract State Machines capture parallel algorithms. ACM Trans. Computational Logic , 4(4):578–651, 2003.
5[5] A. Blass and Y. Gurevich. Abstract State Machines capture parallel algorithms: Correction and extension. ACM Trans. Comp. Logic , 9(3), 2008.
6[6] E. Börger, A. Cisternino, and V. Gervasi. Ambient abstract state machines with applications. J. Comput. Syst. Sci. , 78(3):939–959, 2012.
7[7] E. Börger and K.-D. Schewe. Concurrent abstract state machines. Acta Inf. , 53(5):469–492, 2016.
8[8] E. Börger and K.-D. Schewe. Communication in Abstract State Machines. J. Univ. Comp. Sci. , 23(2):129–145, 2017.