This paper introduces a load-buffer semantics for TSO that simplifies safety verification, significantly improves efficiency and scalability, and enables parametric verification for systems with an arbitrary number of processes.
Contribution
It proposes an alternative semantics using load buffers, making TSO safety analysis more efficient and extendable to parametric verification, which was difficult with previous store-buffer based methods.
Findings
01
Simplifies TSO safety analysis
02
Achieves significant efficiency and scalability improvements
03
Enables parametric verification for arbitrary process counts
Abstract
We address the problem of verifying safety properties of concurrent programs running over the Total Store Order (TSO) memory model. Known decision procedures for this model are based on complex encodings of store buffers as lossy channels. These procedures assume that the number of processes is fixed. However, it is important in general to prove the correctness of a system/algorithm in a parametric way with an arbitrarily large number of processes. In this paper, we introduce an alternative (yet equivalent) semantics to the classical one for the TSO semantics that is more amenable to efficient algorithmic verification and for the extension to parametric verification. For that, we adopt a dual view where load buffers are used instead of store buffers. The flow of information is now from the memory to load buffers. We show that this new semantics allows (1) to simplify drastically the…
Tables2
Table 1. Table 1. Comparison between Dual-TSO ( ⊑ ) square-image-of-or-equals (\sqsubseteq) and Memorax :
The columns Safe under SC and Safe under TSO indicate that whether the benchmark is safe under SC and TSO wrt. its reachability problem respectively.
The columns # P # 𝑃 \#P , # T # 𝑇 \#T and # C # 𝐶 \#C give the number of processes, the running time in seconds and the number of generated configurations, respectively. If a tool runs out of time, we put t / o 𝑡 𝑜 t/o in the # T # 𝑇 \#T column and ∙ ∙ \bullet in the # C # 𝐶 \#C column.
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Full text
\lmcsheading
1–LABEL:LastPageFeb. 01, 2017Jan. 23, 2018
\ACMCCS[Software and its engineering]: Software organization and properties—Software functional properties—Formal methods—Software verification
\titlecomment\lsuper
A preliminary version of this paper appeared as
at CONCUR’16 [AABN16].
A Load-Buffer Semantics for Total Store Ordering\rsuper*
We address the problem of verifying safety properties of concurrent programs running over the Total Store Order (TSO) memory model. Known decision procedures for this model are based on complex encodings of store buffers as lossy channels. These procedures assume that the number of processes is fixed. However, it is important in general to prove the correctness of a system/algorithm in a parametric way with an arbitrarily large number of processes.
In this paper, we introduce an alternative (yet equivalent) semantics to the classical one for the TSO semantics that is more amenable to efficient algorithmic verification and for the extension to parametric verification. For that, we adopt a dual view where load buffers are used instead of store buffers. The flow of information is now from the memory to load buffers. We show that this new semantics allows (1) to simplify drastically the safety analysis under TSO, (2) to obtain a spectacular gain in efficiency and scalability compared to existing procedures, and (3) to extend easily the decision procedure to the parametric case, which allows obtaining a new decidability result, and more importantly, a verification algorithm that is more general and more efficient in practice than the one for bounded instances.
Key words and phrases:
Total Store Order, Weak Memory Models, Reachability Problem, Parameterized Systems, Well-quasi-ordering
This work was supported in part by the Swedish Research Council and carried out within the Linnaeus centre of excellence UPMARC, Uppsala Programming for Multicore Architectures Research Center.
1. Introduction
Most modern processor architectures execute instructions in an
out-of-order manner to gain efficiency.
In the context of sequential programming, this
out-of-order execution is transparent to the programmer
since one can still work
under the Sequential Consistency (SC) model [Lam79].
However, this is not true when
we consider concurrent processes that share the memory.
In fact, it turns out that concurrent algorithms
such as mutual exclusion and producer-consumer
protocols may not behave correctly any more.
Therefore, program verification is a relevant
(and difficult) task in order to prove correctness under the new
semantics.
The out-of-order execution of instructions
has led to the invention of new program semantics, so called
Weak (or relaxed) Memory Models (WMMs),
by allowing permutations between certain types of memory operations
[AG96, DSB86, AH90].
Total Store Ordering (TSO) is one of the the most common models, and
it corresponds to the relaxation
adopted by Sun’s SPARC multiprocessors
[WG94] and formalizations of the
x86-TSO memory model [OSS09, SSO*+*10].
These models put an unbounded perfect (non-lossy) store buffer
between each process and the main memory where
a store buffer carries the pending store operations of the process.
When a process performs a store operation, it appends it to the end of
its buffer.
These operations are propagated to the shared memory non-deterministically
in a FIFO manner.
When a process reads a variable, it searches its buffer
for a pending store operation on that variable.
If no such a store operation exists, it fetches the value
of the variable from the main memory.
Verifying programs running on the TSO memory model poses
a difficult challenge since the unboundedness
of the buffers implies that
the state space of the system is infinite even in the case
where the input program is finite-state.
Decidability of safety properties
has been obtained by constructing equivalent models
that replace the perfect store buffer by lossy channels
[ABBM10, ABBM12, AAC*+*12a].
However, these constructions are complicated and
involve several ingredients that lead to
inefficient verification procedures.
For instance, they require each message inside a lossy channel to carry
(instead of a single store operation)
a full snapshot of the memory representing a local view of the memory
contents by the process.
Furthermore, the reductions involve non-deterministic guessing the lossy channel contents.
The guessing is then resolved either by consistency checking
[ABBM10] or by using explicit pointer variables
(each corresponding to one process) inside the buffers
[AAC*+*12a], causing
a serious state space explosion problem.
In this paper, we introduce a novel semantics which we call
the Dual TSO semantics. Our aim is to provide an alternative (and equivalent) semantics that is more amenable for efficient algorithmic verification.
The main idea is to have load buffers that contain
pending load operations (more precisely, values that will potentially be taken by forthcoming load operations) rather than store buffers (that contain store operations).
The flow of information will now be in the reverse direction, i.e.,
store operations are performed by the processes atomically on the main memory,
while values of variables are propagated non-deterministically from the memory to the load
buffers of the processes.
When a process performs a load operation, it can fetch the value of the variable
from the head of its load buffer.
We show that the Dual TSO semantics is equivalent to the original
one in the sense that any given set of processes
will reach the same set of local
states under both semantics.
The Dual TSO semantics allows us to understand
the TSO model
in a totally different way compared to the classical semantics.
Furthermore, the Dual TSO semantics offers several important advantages
from the point of view of formal reasoning and program verification.
First, the Dual TSO semantics allows transforming the load buffers
to lossy channels without adding the
costly overhead that was necessary in the case of store buffers. This means that we can assume w.l.o.g. that any message in the load buffers (except a finite number of messages) can be lost in non-deterministic manner. Hence, we can apply the theory of
well-structured systems [Abd10, ACJT96, FS01] in a straightforward manner leading
to a much simpler proof of decidability of safety properties.
Second, the absence of extra overhead means that we obtain
more efficient algorithms and better scalability
(as shown by our experimental results).
Finally, the Dual TSO semantics allows extending the framework to
perform parameterized verification which is an
important paradigm in concurrent program verification.
Here, we consider systems, e.g., mutual exclusion protocols,
that consist of an arbitrary number
of processes.
The aim of parameterized verification is
to prove correctness of the system regardless
of the number of processes.
It is not obvious how
to perform parameterized verification under the classical semantics.
For instance, extending the framework of
[AAC*+*12a],
would involve an unbounded number of pointer variables, thus
leading to channel systems with unbounded message
alphabets.
In contrast, as we show in this paper, the simple nature
of the Dual TSO semantics allows a straightforward extension
of our verification algorithm to the case of parameterized verification. This is the first time a decidability result is established for the parametrized verification of programs running over WMMs. Notice that this result is taking into account two sources of infinity: the number of processes and the size of the buffers.
Based on our framework, we have implemented a tool and applied
it to a large set of benchmarks.
The experiments demonstrate the efficiency of the Dual TSO semantics compared to the classical one (by two order of magnitude in average), and the feasibility of parametrized verification in the former case. In fact, besides its theoretical generality, parametrized verification is practically crucial in this setting: as our experiments show, it is much more efficient than verification of bounded-size instances (starting from a number of components of 3 or 4), especially concerning memory consumption (which also is a critical resource).
Related Work.
There have been a lot of works related to the analysis of programs running under WMMs (e.g., [LNP*+*12, KVY10, KVY11, DMVY13, AAC*+*12a, BM08, BSS11, BDM13, BAM07, YGLS04, AALN15, AAC*+*12b, AAJL16, DMVY17, TW16, LV16, LV15, Vaf15, HVQF16]).
Some of these works propose precise analysis techniques for checking safety properties or stability of finite-state programs under WMMs (e.g., [AAC*+*12a, BDM13, DM14, AAP15, AALN15]).
Others propose context-bounded analyzing techniques (e.g., [ABP11, TLI*+*16, TLF*+*16, AABN17]) or stateless model-checking techniques (e.g., [AAA*+*15, ZKW15, DL15, HH16]) for programs under TSO and PSO. Different other techniques based on monitoring and testing have also been developed during these last years (e.g., [BM08, BSS11, LNP*+*12]). There are also a number of efforts to design bounded model checking techniques for programs under WMMs (e.g., [AKNT13, AKT13, YGLS04, BAM07]) which encode the verification problem in SAT/SMT.
The closest works to ours are those presented in [AAC*+*12a, ABBM10, AAC*+*13, ABBM12] which provide precise and sound techniques for checking safety properties for finite-state programs running under TSO. However, as stated in the introduction, these techniques are complicated and can not be extended, in a straightforward manner, to the verification of parameterized systems (as it is the case of the developed techniques for the Dual TSO semantics).
In Section 6, we experimentally compare our techniques with Memorax [AAC*+*12a, AAC*+*13] which is the only precise and sound tool for checking safety properties for concurrent programs under TSO.
2. Preliminaries
Let Σ be a finite alphabet. We use Σ∗ (resp. Σ+) to denote the set
of all words (resp. non-empty words) over Σ. Let ϵ be the empty word.
The length of a word w∈Σ∗ is denoted by ∣w∣ (and in particular ∣ϵ∣=0).
For every i:1≤i≤∣w∣,
let w(i) be the symbol at position i in w.
For a∈Σ, we write a∈w if a appears in w, i.e.,
a=w(i) for some i:1≤i≤∣w∣.
Given two words u and v over Σ, we use u⪯v to denote that u is a (not necessarily contiguous) subword of v, i.e., if there is an injection h:{1,…,∣u∣}↦{1,…,∣v∣} such that: (1)h(i)<h(j) for all i,j:1≤i<j≤∣u∣ and (2) for every i:1≤i≤∣u∣, we have u(i)=v(h(i)).
Given a subset Σ′⊆Σ and a word w∈Σ∗, we use w∣Σ′ to denote the projection of w over Σ′, i.e., the word obtained from w by erasing all the symbols that are not in Σ′.
Let A and B be two sets and let f:A↦B be a total function from A to B. We use f[a↩b]
to denote the function g such that g(a)=b and g(x)=f(x) for all x=a.
A transition system T is a tuple (C,Init,Act,∪a∈Acta) where
C is a (potentially infinite) set of configurations;
Init⊆C is a set of initial configurations; Act is a set of actions;
and for every a∈Act, a⊆C×C
is a transition relation.
We use cac′ to denote that
(c,c′)∈a. Let :=∪a∈Acta.
A runπ of T is of the form
c0a1c1a2⋯ancn
where
ciai+1ci+1 for all i:0≤i<n.
Then, we write
c0πcn.
We use target(π) to denote the configuration cn.
The run π is said to be a computation
if c0∈Init.
Two runs
π1=c0a1c1a2⋯amcm and
π2=cm+1am+2cm+2am+3⋯ancn
are compatible if
cm=cm+1.
Then, we write
π1∙π2 to denote the run
[TABLE]
For two configurations c and c′, we use
c∗c′ to denote that
cπc′ for some run π.
A configuration c is said to be
reachable in T if c0∗c for some
c0∈Init, and a set C of configurations
is said to be
reachable in T if some c∈C is reachable in T.
3. Concurrent Systems
In this section, we define the syntax we use for concurrent programs, a model for representing communication of concurrent processes. Communication between processes is performed through a shared memory that consists of a finite number of shared variables
(over finite domains) to which all processes can read and write. Then we recall the classical TSO semantics including the transition system it induces and its reachability problem. Next, we introduce the Dual TSO semantics and its induced transition system. Finally, we state the equivalence between the two semantics; i.e., for a given concurrent program, we can reduce its reachability problem under the classical TSO semantics to its reachability problem under Dual TSO semantics and vice-versa.
3.1. Syntax
Let V be a finite data domain and X be a finite set of variables. We assume w.l.o.g. that V contains the value [math]. Let Ω(X,V) be the smallest set of memory operations that contains with x∈X
and v,v′∈V:
(1)
“no” operation nop,
2. (2)
read operation r(x,v),
3. (3)
write operation w(x,v),
4. (4)
fence operation fence, and
5. (5)
atomic read-write operation
arw(x,v,v′).
A concurrent system (or a concurrent program) is a tuple
P=(A1,A2,…,An) where for every p:1≤p≤n, Ap is a finite-state automaton describing the behavior of the process p.
The automaton Ap is defined as a triple
(Qp,qpinit,Δp) where
Qp is a finite set of local states,
qpinit∈Qp is the initial local state,
and Δp⊆Qp×Ω(X,V)×Qp is a finite set of transitions.
We define P:={1,…,n} to be the set of process IDs, Q:=∪p∈PQp to be the set of all local states and Δ:=∪p∈PΔp to be the set of all transitions.
{exa}
Figure 1 shows an example of a concurrent system P={A1,A2} consisting of two concurrent processes,
called p1 and p2. Communication between processes is performed through two shared variables x and y
to which the processes can read and write.
The automaton
A1
is defined as a triple
({q0,q1,q2},{q0},{(q0,w(x,2),q1),(q1,r(y,0),q2)}).
Similarly,
A2=({q0′,q1′,q2′,q3′},{q0′},{(q0′,w(y,1),q1′),(q1′,w(x,1),q2′),(q2′,r(x,2),q3′)}).
△
3.2. Classical TSO Semantics
In the following, we recall the semantics of concurrent systems under the classical TSO model as formalized in [OSS09, SSO*+*10]. To do that, we define the set of configurations and the induced transition relation. Let P=(A1,A2,…,An) be a concurrent system.
TSO-configurations.
A TSO-configurationc is a triple (q,b,mem)
where:
(1)
q:P↦Q is the global state of P, mapping each process p∈P to a local state in Qp (i.e.,
q(p)∈Qp).
2. (2)
b:P↦(X×V)∗ gives the content of the store buffer of each process.
3. (3)
mem:X↦V defines the value of each shared variable.
Observe that the store buffer of each process contains a sequence of write operations, where
each write operation is defined by a pair, namely a variable
x and a value v that is assigned to x.
The initial TSO-configuration cinit is defined by the tuple (qinit,binit,meminit)
where, for all p∈P and x∈X, we have that
qinit(p)=qpinit,
binit(p)=ϵ and meminit(x)=0.
In other words, each process is in its initial local state, all
the buffers are empty, and all the variables in the shared memory are initialized to [math].
We use
CTSO to denote the set of all TSO-configurations.
TSO-transition Relation.
The transition relationTSO between TSO-configurations is given by a set of rules, described in Figure 2. Here, we informally explain these rules.
A nop transition (q,nop,q′)∈Δp changes only the local state of the process p from q to q′.
A write transition (q,w(x,v),q′)∈Δp adds a new message (x,v) to the tail
of the store buffer of the process p.
A memory update transition updatep can be performed at any time by removing the (oldest) message
at the head of the store buffer of the process p and updating the memory accordingly.
For a read transition (q,r(x,v),q′)∈Δp, if the store buffer of the process p contains some write operations to x, then the read value v must correspond to the value of the most recent such a write operation. Otherwise, the value v of x is fetched from the memory.
A fence transition (q,fence,q′)∈Δp can be performed by the process p only if its store buffer is empty.
Finally, an atomic read-write transition (q,arw(x,v,v′),q′)∈Δp can be performed by the process p only if its store buffer is empty. This transition checks whether the value of x in the memory is v and then changes it to v′.
Let Δ′:={updatep∣p∈P}, i.e., Δ′ contains all memory update transitions.
We use cTSOc′ to denote that
ctTSOc′ for some
t∈Δ∪Δ′.
The transition system induced by P under the classical TSO semantics is then given by
TTSO:=(CTSO,{cinit},Δ∪Δ′,TSO).
The TSO Reachability Problem.
A global state qtarget is said to be reachable in TTSO if and only if
there is a TSO-configuration c of the form (qtarget,b,mem), with b(p)=ϵfor allp∈P, such that c is reachable in TTSO.
The TSO reachability problem for the concurrent system P under the TSO semantics asks,
for a given global state qtarget, whether qtarget is reachable in TTSO.
Observe that, in the definition of the reachability problem, we require that the buffers of the configuration c must be empty instead of being arbitrary. This is only for the sake of simplicity and does not constitute a restriction. Indeed, we can easily show that the “arbitrary buffer” reachability problem is reducible to the “empty buffer” reachability problem.
{exa}
Figure 3 illustrates a TSO-configuration c
that can be reached from the initial configuration cinit
of the concurrent system in Figure 1.
To reach this configuration,
the process p1 has executed the write
transition
(q0,w(x,2),q1)
and
appended
the message
(x,2)
to its store buffer.
Meanwhile,
the process p2 has executed two write
transitions
(q0′,w(y,1),q1′)
and
(q1′,w(x,1),q2′).
Hence, the store buffer of p2 contains the sequence
(x,1)⋅(y,1).
Now, the process p1
can perform
the read transition
(q1,r(y,0),q2).
Since the buffer of p1 does not contain
any pending write message on y,
the read value
is fetched from the memory (represented by the dash arrow in Figure 3).
Then,
p1
and p2
perform
the following sequence
of update transitions
updatep2⋅updatep2⋅updatep1
to empty their buffers
and update the memory to x=2 and y=1.
Finally,
p2 performs the read transition
(q2′,r(x,2),q3′) (by
reading from the memory) to reach to the configuration ctarget given in Figure 4.
Observe that the buffers of both processes are empty in ctarget. Let qtarget be the global state
in ctarget
defined as follows:
qtarget(p1)=q2
and
qtarget(p2)=q3′. Therefore,
we can say that
the global state
qtarget
is reachable
in
TTSO.
△
3.3. Dual TSO Semantics
In this section, we define the Dual TSO semantics. The model has a FIFO load buffer between the main memory and each process.
This load buffer is used to store potential read operations
that will be performed by the process. We allow this buffer to lose messages at any time by deleting the messages at its head in non-deterministic manner. Each message in the load buffer of a process p is either a pair of the form (x,v) or a triple of the form (x,v,own) where x∈X and v∈V. A message of the form (x,v) corresponds to the fact that x has had the value v in the shared memory. Meanwhile, a message of the form (x,v,own) corresponds to the fact that the process p has written the value v to x.
We say that a message (x,v,own) is
an own-message.
A write operation w(x,v) of the process p immediately updates the shared memory and then appends a new own-message (x,v,own) to the tail of the load buffer of p.
Read propagation is then performed by non-deterministically choosing a variable (let’s say x and its value is v in the shared memory) and appending the new message (x,v) to the tail of the load buffer of p. This propagation operation speculates on a read operation of p on x that will be performed later on.
Moreover,
delete operation of the process p can be performed at any time
by removing the
(oldest) message at the head of the load buffer of p.
A read operation r(x,v) of the process p can be executed if the message at the head of the load buffer of p is of the form (x,v) and there is no pending own-message of the form (x,v′,own).
In the case that the load buffer of p contains some own-messages
(i.e., of the form (x,v′,own)), the read value must correspond to the value of the most recent such an own-message.
Implicitly, this allows to simulate the Read-Own-Write transitions in the TSO semantics.
A fence operation means that the load buffer of p
must be empty before p can continue.
Finally, an atomic read-write operation arw(x,v,v′) means that the load buffer of p
must be empty
and the value of the variable x in the memory is v before p can continue.
DTSO-configurations.
A DTSO-configurationc is a triple (q,b,mem)
where:
(1)
q:P↦Q is the global state of P.
2. (2)
b:P↦((X×V)∪(X×V×{own}))∗ is the content of the load buffer of each process.
3. (3)
mem:X↦V gives the value of each shared variable.
The initial DTSO-configuration cinitD is defined by (qinit,binit,meminit)
where, for all p∈P and x∈X, we have that
qinit(p)=qpinit,
binit(p)=ϵ and meminit(x)=0.
We use
CDTSO to denote the set of all DTSO-configurations.
DTSO-transition Relation.
The transition relationDTSO
between DTSO-configurations is given by a set of rules, described in Figure 5.
This relation is induced by members of
Δ∪Δ′′ where
Δ′′:={propagatepx,deletep∣p∈P,x∈X}.
We informally explain the transition relation rules.
The propagate transition propagatepx speculates on a read operation of p over x that will be executed later. This is done by appending a new message (x,v) to the tail of the load buffer of p where v is the current value of x in the shared memory.
The delete transition deletep removes the (oldest) message at the head of the load buffer of the process p.
A write transition (q,w(x,v),q′)∈Δp updates the memory and appends a new own-message (x,v,own) to the tail of the load buffer.
A read transition (q,r(x,v),q′)∈Δp checks first if the load buffer of p contains an own-message of the form (x,v′,own). In that case, the read value v should correspond to the value of the most recent
such an own-message. If there is no such message on the variable x in the load buffer of p, then the value v of x
is fetched from the (oldest) message at the head of the load buffer of p.
We use cDTSOc′ to denote that
ctDTSOc′ for some
t∈Δ∪Δ′′.
The transition system induced by P under the Dual TSO semantics is then given by
TDTSO=(CDTSO,{cinitD},Δ∪Δ′′,DTSO).
The DTSO Reachability Problem.
The DTSO reachability problem for P under the Dual TSO semantics is defined in a similar manner to the case of the TSO semantics. A global state qtarget is said to be reachable in TDTSO if and only if
there is a DTSO-configuration c of the form (qtarget,b,mem), with b(p)=ϵfor allp∈P, such that c is reachable in TDTSO. Then, the DTSO reachability problem consists in checking whether qtarget is reachable in TDTSO.
{exa}
Figure 6
illustrates a DTSO-configuration c′ that can be reached from the initial configuration cinitD of the concurrent system in Figure 1.
To reach this configuration, a propagation operation
is performed by appending the message (y,0)
into the load buffer of p1.
Then, the
process p2
executes two write transitions
(q0′,w(y,1),q1′)
and (q1′,w(x,1),q2′) that
update the shared memory to x=1 and y=1
and add two own-messages
to the tail of the load buffer
of p2.
Hence, the load buffer of p2
contains the sequence (x,1,own)⋅(y,1,own).
Then,
the process p1 executes
the write transition
(q0,w(x,2),q1) which updates
the shared memory and appendes
the own-message (x,2,own)
to the tail of the load buffer of p1.
After that,
a propagation operation appending
the message (x,2)
into the load buffer of
p2 is performed. Hence, the value of x (resp. y ) is 2 (resp. 1) in the shared memory. Furthermore, the load buffer of p1 (resp. p2)
contains the following sequence
(x,2,own)⋅(y,0) (resp,
(x,2)⋅(x,1,own)⋅(y,1,own)).
Now from the configuration c′ (given in Figure 6),
the process
p1 can perform
a read transition (q1,r(y,0),q2).
Since
there is no pending
own-message
of the form (y,v,own) for some v∈V
in the load buffer of p1,
p1 reads from the message at the head of its load buffer, i.e. the message (y,0) (represented by the dash arrow for p1).
Then,
p2 performs
two delete transitions
deletep2
to remove
two own-messages
at the head of its load buffer. Now, the process p2 can perform the read transition
(q2′,r(x,2),q3′) to read from its load buffer.
Finally,
p1 and p2 performs a sequence of delete transitions
deletep1⋅deletep1⋅deletep2 to empty their load buffers,
reaching to the configuration ctarget′ given in Figure 7. Let qtarget be the global state
in ctarget′
defined as follows:
qtarget(p1)=q2
and
qtarget(p2)=q3′. Therefore,
we can say that
the global state qtarget
in ctarget′
is reachable in
TDTSO.
△
3.4. Relation between TSO and DTSO Reachability Problems
The following theorem states the equivalence of the reachability problems under the TSO and Dual TSO semantics.
A global state qtarget is reachable in TTSO iff qtarget is reachable in TDTSO.
Proof 3.1**.**
The proof of this theorem can be found in Appendix A.
{exa}
In the Example 3.2 and Example 3.3,
we have shown that
the global state
qtarget (defined by
qtarget(p1)=q2
and qtarget(p2)=q3′)
is both reachable in
TTSO
and
TDTSO
for the concurrent system given in Figure 1.
△
4. The DTSO Reachability Problem
In this section, we show the decidability of the DTSO reachability problem by making use of the framework of Well-Structured Transition Systems (Wsts) [ACJT96, FS01]. First, we briefly recall the framework of Wsts. Then, we instantiate it to show the decidability of the DTSO reachability problem.
Following Theorem 1, we also obtain the decidability of the TSO reachability problem.
4.1. Well-structured Transition Systems
Let T=(C,Init,Act,∪a∈Acta) be a transition system. Let ⊑ be a well-quasi-ordering on C.
Recall that a well-quasi-ordering on C is a binary relation over C that is reflexive and transitive; and for every
infinite sequence (ci)i≥0 of elements in C,
there exist i,j∈Nature such that i<j and ci⊑cj.
A set U⊆C is called
upward closed if for every c∈U and c′∈C with c⊑c′,
we have c′∈U. It is known that every upward closed set U
can be characterised by a finite minor setM⊆U such that: (i) for every c∈U, there is c′∈M such that c′⊑c; and (ii) if c,c′∈M and c⊑c′, then c=c′. We use min(U) to denote for a given upward closed set U
its
minor set.
Let D⊆C. The upward closure of D is defined
as D↑:={c′∈C∣∃c∈D with c⊑c′}. We also define the set of predecessors of D as PreT(D):={c∣∃c1∈D,a∈Act,cac1}. For a finite set of configurations M⊆C, we use minpre(M) to denote
min(PreT(M↑)∪M↑).
The transition relation is said to be * monotonic* wrt. the ordering ⊑ if, given c1,c2,c3∈C where c1c2 and c1⊑c3, we can compute a configuration c4∈C and a run π such that c3πc4 and c2⊑c4.
The pair (T,⊑) is called a monotonic transition system if is * monotonic* wrt. ⊑.
Given a finite set of configurations M⊆C, the coverability problem of M in
the monotonic transition system (T,⊑)
asks whether the set M↑ is reachable in T; i.e. there exist two configurations c1 and c2 such that c1∈M, c1⊑c2, and c2 is reachable in T.
For the decidability of this problem, the following three conditions are sufficient:
(1)
For every two configurations c1 and c2, it is decidable
whether c1⊑c2.
2. (2)
For every c∈C, we can check whether
{c}↑∩Init=∅.
3. (3)
For every c∈C, the set minpre({c}) is finite and computable.
The solution for the coverability problem as suggested in
[ACJT96, FS01]
is based on a backward analysis approach. It is shown that starting
from a finite set M0⊆C, the sequence (Mi)i≥0
with Mi+1:=minpre(Mi), for i≥0, reaches a
fixpoint and it is computable.
4.2. DTSO-transition System is a **Wsts **
In this section, we instantiate the framework of Wsts to show the following result:
Theorem 2** (Decidability of DTSO reachability problem).**
The DTSO reachability problem is decidable.
Proof 4.1**.**
The rest of this section is devoted to the proof of the above theorem.
Let P=(A1,A2,…,An) be a concurrent system (as defined in Section 3). Moreover, let TDTSO=(CDTSO,{cinitD},Δ∪Δ′′,DTSO) be the transition system induced by P under the Dual TSO semantics (as defined in Section 3.3).
In the following, we will show that the DTSO-transition system TDTSO
is monotonic wrt. an ordering ⊑.
Then, we will show the three sufficient conditions for the decidability of the coverability problem for (TDTSO,⊑)
(as stated in Section 4.1).
(1)
We first define the ordering ⊑ on the set of DTSO-configurations CDTSO (see Section 4.2.1).
2. (2)
Then, we show that the transition system TDTSO induced under the Dual TSO semantics is monotonic wrt. the ordering ⊑ (see Lemma 3).
3. (3)
For the first sufficient condition, we show that ⊑
is a well-quasi-ordering; and that
for every two configurations c1 and c2, it is decidable whether c1⊑c2 (see Lemma 4).
4. (4)
The second sufficient condition (i.e., checking whether the upward closed set {c}↑, with c is a DTSO-configuration, contains the initial configuration cinitD) is trivial. This check boils down to verifying whether c is the initial configuration cinitD.
5. (5)
For the third sufficient condition, we show that we can calculate the set of minimal DTSO-configurations for the set of predecessors of any upward closed set (see Lemma 5).
6. (6)
Finally, we will also show that the DTSO reachability problem for P can be reduced to the coverability problem in the monotonic transition system (TDTSO,⊑) (see Lemma 6). Observe that this reduction is needed since we are requiring that the load buffers are empty when defining the DTSO reachability
problem.
In the following, we define an ordering ⊑ on the set of DTSO-configurations CDTSO. Let us first introduce some notations and definitions.
Consider a word
w∈((X×V)∪(X×V×{own}))∗
representing the content of a load buffer.
We define an operation that divides
w into a number of fragments
according to the most-recent own-message concerning each variable.
We define
[TABLE]
where the following conditions are satisfied:
(1)
xi=xj for all i,j:i=j and 1≤i,j≤m.
2. (2)
If
(x,v,own)∈wi for some i:1<i≤m+1, then
x=xj for some j:1≤j<i, i.e., the most recent own-message on variable xj occurs
at the (2j)th fragment of [w]own.
3. (3)
w=w1⋅(x1,v1,own)⋅w2⋯wm⋅(xm,vm,own)⋅wm+1, i.e.,
the divided fragments correspond to the given word w.
Let w,w′∈((X×V)∪(X×V×{own}))∗ be two words. Let us assume that:
[TABLE]
[TABLE]
We write w⊑w′ to denote that
the following conditions are satisfied:
(1)
r=m,
2. (2)
xi′=xi and vi′=vi for all i:1≤i≤m, and
3. (3)
wi⪯wi′ for
all i:1≤i≤m+1.
Consider two DTSO-configurations c=(q,b,mem) and c′=(q′,b′,mem′), we extend the ordering ⊑ to configurations as follows:
We write
c⊑c′ if and only if the following conditions are satisfied:
(1)
q=q′,
2. (2)
b(p)⊑b′(p) for all process p∈P, and
3. (3)
mem′=mem.
4.2.2. Monotonicity.
Let c1=(q1,b1,mem1),c2=(q2,b2,mem2),c3=(q3,b3,mem3)∈CDTSO such that c1tDTSOc2 for some t∈Δp∪{propagatepx,deletep∣x∈X} with p∈P, and c1⊑c3. We will show that it is possible to compute a configuration c4∈CDTSO and a run π such that c3πDTSOc4 and c2⊑c4.
To that aim, we first show that it is possible from c3 to reach a configuration c3′, by performing a certain number of deletep transitions, such that the process p will have the same last message in its load buffer in the configurations c1 and c3′ while c1⊑c3′. Then, from the configuration c3′, the process p can perform the same transition t as c1 did (to reach c2) in order to reach the configuration c4 such that c2⊑c4. Let us assume that [b1(p)]own is of the form
[TABLE]
and
[b3(p)]own is of the form
[TABLE]
We define the word w∈((X×V)∪(X×V×{own}))∗ to be the longest word such that wm+1′=w′⋅w with wm+1⪯w′. Observe that in this case we have either wm+1=w′=ϵ or w′(∣w′∣)=wm+1(∣wm+1∣). Then, after executing a certain number ∣w∣ of deletep transitions from the configuration c3, one can obtain a configuration c3′=(q3,b3′,mem3) such that b3=b3′[p↩b3′(p)⋅w]. As a consequence, we have c1⊑c3′. Furthermore, since c1 and c3′ have the same global state, the same memory valuation, the same sequence of most-recent own-messages concerning each variable, and the same last message in the load buffers of p, c3′ can perform the transition t and reaches a configuration c4 such that c2⊑c4.
The following lemma shows that (TDTSO,⊑) is a monotonic transition system.
Lemma 3** (DTSO monotonic transition system).**
The transition relation DTSO is monotonic wrt. the ordering ⊑.
We show the first and the third conditions of the three conditions for the decidability of the coverability problem for (TDTSO,⊑) (as stated in Section 4.1). The second condition has been shown to be trivial in the main proof of Theorem 2.
The following lemma shows that the ordering ⊑ is indeed a well-quasi-ordering.
Lemma 4** (Well-quasi-ordering ⊑).**
The ordering ⊑ is a well-quasi-ordering over CDTSO. Furthermore, for every two DTSO-configurations c1 and c2, it is decidable
whether c1⊑c2.
Let qtarget be a global state of a concurrent program P and let Mtarget be the set of DTSO-configurations of the form (qtarget,b,mem) with b(p)=ϵfor allp∈P where
P be the set of process IDs in P.
We recall that qtarget in TDTSO if and only if Mtarget is reachable in TDTSO (see Section 2 for the definition of a reachable set of configurations).
Then by Lemma 6,
we have that the reachability problem of qtarget in TDTSO
can be reduced to the coverability problem of Mtarget in
(TDTSO,⊑).
Lemma 6** (DTSO reachability to coverability).**
Mtarget↑* is reachable in TDTSO iff Mtarget is reachable in TDTSO.*
Proof 4.5**.**
Let us assume that Mtarget↑ is reachable in TDTSO. This means that there is a configuration c∈Mtarget↑ which is reachable in TDTSO. Let us assume that c is of the form (qtarget,b,mem). Then, from the configuration c, it is possible to reach the configuration c′=(qtarget,b′,mem), with b′(p)=ϵfor allp∈P, by performing a sequence of delete transitions to empty the load buffer of each process. It is then easy to see that c′∈Mtarget and so Mtarget is reachable in TDTSO. The other direction of the lemma is trivial since Mtarget⊆Mtarget↑.
5. Parameterized Concurrent Systems
In this section, we give the definitions for
parameterized concurrent systems, a model for representing unbounded number of communicating concurrent processes under the Dual TSO semantics, and its induced transition system.
Then, we define the DTSO reachability problem for the case of parameterized concurrent systems.
5.1. Definitions for Parameterized Concurrent Systems
Let V be a finite data domain and X be a finite set of variables ranging over V. A parameterized concurrent system (or simply
a parameterized system) consists of an unbounded number of identical processes running under the Dual TSO semantics.
Communication between processes is performed
through a shared memory that consists of a finite number of the shared variables X over the finite domain V.
Formally, a parameterized system S is defined by an extended finite-state automaton A=(Q,qinit,Δ) uniformly describing the behavior of each process.
An instance of S is a concurrent system
P=(A1,A2,…,An), for some n∈N, where for each p:1≤p≤n, we have Ap=A. In other words, it consists of a finite set
of processes each running the same code defined
by A. We use Inst(S) to denote all possible instances of S. We use
TP:=(CP,InitP,ActP,P) to denote the transition system induced by an instance P of S under the Dual TSO semantics.
A parameterized configurationα is a pair (P,c) where P={1,…,n}, with n∈N, is the set of process IDs and c is a DTSO-configuration of an instance P=(A1,A2,…,An) of S. The parameterized configuration α=(P,c) is said to be initial if c is an initial configuration of P (i.e., c∈InitP). We use C (resp. Init) to denote the set of all the parameterized configurations (resp. all the initial configurations) of S.
Let Act denote the set of actions of all possible instances of S (i.e., Act=∪P∈Inst(S)ActP).
We define a transition relation on the set C of all parameterized configurations such that
given two configurations
(P,c) and (P′,c′),
we have
(P,c)t(P′,c′) for some action t∈Act iff
P′=P and there is an instance P of S such that t∈ActP and
ctPc′.
The transition system induced by S is given by T:=(C,Init,Act,).
The Parameterized DTSO Reachability Problem
A global state qtarget:P′↦Q is said to be reachable in T if and only if there exists a parameterized configuration α=(P,(q,b,mem)), with b(p)=ϵfor allp∈P, such that α is reachable in T and qtarget(1)⋯qtarget(∣P′∣)⪯q(1)⋯q(∣P∣).
The parameterized DTSO reachability problem consists in checking whether qtarget is reachable in T. In other words, the DTSO reachability problem for parameterized systems asks whether there is an instance of the parameterized system that reaches a configuration with a number of processes in certain given local states.
5.2. Decidability of the Parameterized Reachability Problem
We prove hereafter the following theorem:
Theorem 7** (Decidability of parameterized DTSO reachability problem).**
The parameterized DTSO reachability problem is decidable.
Proof 5.1**.**
Let S=(Q,qinit,Δ) be a parameterized system and T=(C,Init,Act,) be its induced transition system.
The proof of the theorem
is done by
instantiating the framework of Wsts.
In more detail,
we will show that
the parameterized transition system T
is monotonic wrt. an ordering ⊴.
Then, we will show the three sufficient conditions for the decidability of the coverability problem for (T,⊴)
(as stated in Section 4.1).
(1)
We first define the ordering ⊴ on the set C of all parameterized configurations (see Section 5.2.1).
2. (2)
Then, we show that the transition system T is monotonic wrt. the ordering ⊴ (see Lemma 8).
3. (3)
For the first sufficient condition, we show that the ordering ⊴
is a well-quasi-ordering; and that
for every two parameterized configurations α and α′, it is decidable whether α⊑α′ (see Lemma 9).
4. (4)
The second sufficient condition (i.e., checking whether the upward closed set {α}↑, with α is a parameterized configuration, contains an initial configuration) for the decidability of the coverability problem is trivial. This check boils down to verifying whether the configuration α is initial.
5. (5)
For the third sufficient condition,
we show that we can calculate the set of minimal parameterized configurations for the set of predecessors of any upward closed set (see Lemma 10).
6. (6)
Finally, we will show that the parameterized DTSO reachability problem for the parameterized system S can be reduced to the coverability problem in
the monotonic transition system (T,⊴) (see Lemma 11).
Let α=(P,(q,b,mem)) and α′=(P′,(q′,b′,mem′)) be two parameterized configurations. We define the ordering ⊴
on the set C of parameterized configurations as follows: We write α⊴α′ if and only if the following conditions are satisfied:
(1)
mem=mem′.
2. (2)
There is an injection h:{1,…,∣P∣}↦{1,…,∣P′∣} such that
(i)
for all p,p′∈P,
p<p′ implies h(p)<h(p′); and
2. (ii)
for every p∈{1,…,∣P∣}, q(p)=q′(h(p)) and b(p)⊑b′(h(p)).
5.2.2. Monotonicity.
We assume that three configurations
α1=(P,(q1,b1,mem1)), α2=(P,(q2,b2,mem2)) and α3=(P′,(q3,b3,mem3)) are given. Furthermore, we assume that α1⊴α3 and α1tα2 for some transition t.
We will show that it is possible to compute a parameterized configuration α4 and a run π such that α3πα4 and α2⊴α4.
Since α1⊴α3, there is an injection function h:{1,…,∣P∣}↦{1,…,∣P′∣} such that:
(1)
For all p,p′∈P,
p<p′ implies h(p)<h(p′).
2. (2)
For every p∈{1,…,∣P∣}, q1(p)=q3(h(p)) and b1(p)⊑b3(h(p)).
We define the parameterized configuration α′ from α3 by only keeping the local states and load buffers of processes in h(P).
Formally, α′=(P,(q′,b′,mem′)) is defined as follows:
(1)
mem′=mem3.
2. (2)
For every p∈{1,…,∣P∣}, q′(p)=q3(h(p)) and b′(p)=b3(h(p)).
We observe that (q1,b1,mem1)⊑(q′,b′,mem′).
Since the transition relation DTSO is
monotonic wrt. the ordering ⊑ (see Lemma 3), there is a DTSO-configuration (q′′,b′′,mem′′) such that (q′,b′,mem′)DTSO∗(q′′,b′′,mem′′) and (q2,b2,mem2)⊑(q′′,b′′,mem′′).
Consider now the parameterized configuration α4=(P′,(q4,b4,mem4)) such that:
(1)
mem′′=mem4.
2. (2)
For every p∈{1,…,∣P∣}, q′′(p)=q4(h(p)) and b′′(p)=b4(h(p)).
3. (3)
For every p∈({1,…,∣P′∣}∖{h(1),…,h(∣P∣)}), we have q4(p)=q3(p) and b4(p)=b3(p).
It is easy then to see that α2⊴α4 and α3∗α4.
The following lemma shows that (T,⊴) is a monotonic transition system.
We show the first and the third conditions of the three conditions for the decidability of the coverability problem for (T,⊴)
(as stated in Section 4.1). The second condition has been shown to be trivial in the main proof of Theorem 7.
The following lemma states that the ordering ⊴ is indeed a well-quasi-ordering:
5.2.4. From Parameterized DTSO Reachability to Coverability.
Let qtarget:P′↦Q be a global state. Let Mtarget be the set of parameterized configurations of the form α=(P′,(qtarget,b,mem)) with b(p)=ϵfor allp∈P′.
Lemma 11 shows that
the parameterized reachability problem of qtarget in the transition system T
can be reduced to the coverability problem of Mtarget in
(T,⊴).
Lemma 11** (Parameterized DTSO reachability to coverability).**
qtarget* is reachable in T iff
Mtarget↑ is reachable in T.*
Proof 5.5**.**
To prove the lemma, we first show that
Mtarget↑ is reachable in T if and only if there is a parameterized configuration α=(P,(q,b,mem)), with b(p)=ϵfor allp∈P, such that α is reachable in T and qtarget(1)⋯qtarget(∣P′∣)⪯q(1)⋯q(∣P∣).
Then as a consequence,
the lemma holds.
Let us assume that there is a parameterized configuration α=(P,(q,b,mem)), with b(p)=ϵfor allp∈P, such that α is reachable in T and qtarget(1)⋯qtarget(∣P′∣)⪯q(1)⋯q(∣P∣). It is then easy to show that α∈Mtarget↑.
Now let us assume that there is a parameterized configuration α′=(P′′,(q′,b′,mem′))∈Mtarget↑ which is reachable in T. From the configuration α′, it is possible to reach the configuration α′′=(P′′,(q′,b′′,mem′)), with b′′(p)=ϵfor allp∈P′′, by performing a sequence of deletep transitions to empty the load buffer of each process. Since α′∈Mtarget↑, we have qtarget(1)⋯qtarget(∣P′∣)⪯q′(1)⋯q′(∣P′′∣). Hence, α′′ is a witness of the parameterized reachability problem of qtarget in the transition system T.
6. Experimental Results
We have implemented our techniques described in Section 4 and Section 5
in an open-source tool called Dual-TSO111Tool webpage: https://www.it.uu.se/katalog/tuang296/dual-tso.
The tool checks the state reachability problems
(c.f. Section 3.3 and Section 5.1)
for (parameterized) concurrent systems under the Dual TSO semantics.
We emphasize
that
besides checking the reachability for
a global state,
Dual-TSO
can check the reachability
for a set of global states.
Moreover,
Dual-TSO accepts a more general
input class of parameterized concurrent systems.
Instead of requiring that
the behavior of each process is described by a unique extended finite-state automaton as defined in Section 5,
Dual-TSO allows
that the behavior of a process
can be presented by
an extended finite-state automaton from a fixed set of predefined automata.
If the tool finds a witness for a given reachability problem,
we say that
the concurrent system is unsafe (wrt. the reachability problem). After finding the first witness for a given reachability problem, the tool terminates its execution. In the case that no witness is encountered, Dual-TSO declares that the given concurrent program is safe (wrt. the reachability problem) after it reaches a fixpoint in calculation.
Dual-TSO always ends its execution by reporting the running time (in seconds) and the total number of generated configurations. Observe that the number of generated configurations gives a rough estimation of the memory consumption of our tool.
We compare our tool with Memorax [AAC*+*12a, AAC*+*13] which is the only precise and sound tool for deciding the state reachability problem of concurrent systems running under TSO. Observe that Memorax cannot handle the class of parameterized concurrent systems.
We use
Dual-TSO(⊑)
and Dual-TSO(⊴)
to denote Dual-TSO when applied to concurrent systems and parameterized concurrent systems, respectively.
In the following, we present two sets of results. The first set concerns the comparison of Dual-TSO(⊑) with Memorax (see Table 1). The second set shows the benefit of the parameterized verification compared to the use of the state reachability when increasing the number of processes (see Table 2 and Figure 8). Our example programs are from [AAC*+*12a, AMT14, BDM13, AAP15, LNP*+*12].
In all experiments, we set up the time out to 600 seconds (10 minutes).
We perform all experiments on an Intel x86-32 Core2 2.4 Ghz machine and 4GB of RAM.
Verification of Concurrent Systems.
Table 1 presents a comparison between
Dual-TSO(⊑) and Memorax on 20 benchmarks. In all these benchmarks, Dual-TSO(⊑) and Memorax return the same results for the state reachability problems (except 6 examples where Memorax runs out of time). In the benchmarks where the two tools return, Dual-TSO(⊑) out-performs Memorax and generates fewer configurations (and so uses less memory). Indeed, Dual-TSO(⊑) is 600 times faster than Memorax and generates 277 times fewer minimal configurations on average.
The experimental results confirm the correlation between the running time and the memory consumption (i.e., the tool who generates less configurations is often the fastest).
Verification of Parameterized Concurrent Systems.
The second set compares the scalability of Memorax and Dual-TSO while increasing the number of processes. The results are given in
Figure 8.
We observe that although the algorithms implemented by
Dual-TSO(⊑) and Memorax
have the same (non-primitive recursive) lower bound (in theory),
Dual-TSO(⊑)
scales better than Memorax in all these benchmarks.
In fact, Memorax can only handle benchmarks with at most 5 processes while
Dual-TSO can handle benchmarks with more processes.
We conjecture
that this is due to
the important advantages of the
Dual TSO semantics. In fact, the Dual TSO semantics
transforms the load buffers into lossy channels without adding the costly overhead
of memory snapshots
that was necessary in the case of Memorax.
The absence of this extra overhead means that our tool generates less configurations (due to the ordering) and this results in a better performance and scalability.
Table 2 presents the running time and the number of generated configurations when checking the state reachability problem for the parameterized versions of the benchmarks in Figure 8
with Dual-TSO(⊴).
It should be emphasized that
Dual-TSO(⊴)
and
Dual-TSO(⊑)
have the same results
for the reachability problems in these benchmarks.
We observe that the verification of these parameterized systems is much more efficient than verification of bounded-size instances (starting from a number of processes of 3 or 4), especially concerning memory consumption (which is given in terms of number of generated configurations).
The reason behind is that the size of the generated minor sets in the analysis of a parameterized system are usually smaller than the size of the generated minor sets during the analysis of an instance of the system with a large number of processes. In fact, during the analysis of a parameterised concurrent system, the number of considered processes in the generated minimal configurations is usually very small. Observe that, in the case of concurrent systems, the number of considered processes in the generated minimal configurations is equal to the number of processes in the given system.
7. Conclusion
In this paper, we have presented an alternative (yet equivalent) semantics to the classical one for the TSO memory model that is more amenable for efficient algorithmic verification and for extension to parametric verification. This new semantics allows us to understand the TSO memory model in a totally different way compared to the classical semantics. Furthermore, the proposed semantics offers several important advantages
from the point of view of formal reasoning and program verification.
First, the Dual TSO semantics allows transforming the load buffers
to lossy channels (in the sense that
the processes can lose any message situated at the head of any load buffer in non-deterministic manner)
without adding the
costly overhead that was necessary in the case of store buffers.
This means that we can apply the theory of
well-structured systems [Abd10, ACJT96, FS01] in a straightforward manner leading
to a much simpler proof of decidability of safety properties.
Second, the absence of extra overhead means that we obtain
more efficient algorithms and better scalability
(as shown by our experimental results).
Finally, the Dual TSO semantics allows extending the framework to
perform parameterized verification which is an
important paradigm in concurrent program verification.
In the future, we plan to apply our techniques to other memory models
and to combine with predicate abstraction for handling programs with unbounded data domain.
We prove the theorem by showing its if direction and then only if direction.
In the following, for a TSO (DTSO)-configuration
c=(q,b,mem), we
use states(c), buffers(c), and
mem(c) to denote
q, b, and mem respectively.
A.1. From Dual TSO to TSO
We show the if direction of Theorem 1. Consider a DTSO-computation
[TABLE]
where c0=cinitD and ci is of the form (qi,bi,memi) for all i:1≤i≤n with qn=qtarget and bn(p)=ϵ for all p∈P.
We will derive a TSO-computation πTSO such that target(πTSO) is a configuration
of the form (states(cn),b,mem(cn)) where b(p)=ϵ for all p∈P.
First, we define some functions that we will use in the construction
of the computation πTSO.
Then, we define a sequence of TSO-configurations that appear in πTSO.
Finally, we show that the TSO-computation πTSO exists. In particular,
the target configuration target(πTSO) has the same local states as the target cn of the DTSO-computation πDTSO.
Let 1≤i1<i2<⋯<ik≤n be the sequence of indices such that ti1ti2…tik is the sequence of write or atomic read-write operations occurring in the computation πDTSO. In the following, we assume that i0=0.
For each j:0≤j≤n, we associate a mapping function indexj:P→{0,…,k}∗ that associates for each process p∈P and each message
at the position ℓ:1≤ℓ≤∣buffers(cj)(p)∣
in the load buffer buffers(cj)(p)
the index indexj(p)(ℓ), i.e., the index of the last write or atomic write-read operations at the moment this message has been added to the load buffer.
Formally, we define indexj as follows:
(1)
index0(p):=ϵ for all p∈P.
2. (2)
Consider j such that 0≤j<n.
Recall that
cjtj+1DTSOcj+1 with tj+1∈Δp∪Δp′. We define indexj+1 based on indexj:
•
Nop, read, fence, arw: If
tj+1 is of the following forms (q,nop,q′), (q,r(x,v),q′), (q,fence,q′), or (q,arw(x,v,v′),q′), then indexj+1:=indexj.
•
Write: If
tj+1 is of the form (q,w(x,v),q′), then indexj+1:=indexj[p↩r⋅indexj(p)] with ir=j+1.
•
Propagate:
If tj+1 is of the form propagatepx, then indexj+1:=indexj[p↩r⋅indexj(p)] where r:0≤r≤k is the maximal index such that ir≤j+1.
•
Delete:
If tj+1 is of the form deletep, then indexj:=indexj+1[p↩indexj+1(p)⋅r] with r=indexj(p)(∣indexj(p)∣).
Next, we associate for each process p∈P and j:0≤j≤n, the memory view viewp(cj) of the process p in the configuration cj as follows:
(1)
If buffers(cj)(p)=ϵ, then viewp(cj):=r where r:0≤r≤k is the maximal index such that ir≤j.
2. (2)
If buffers(cj)(p)=ϵ, then viewp(cj):=indexj(p)(∣indexj(p)∣).
{exa}
We give an example of how to calculate the functions index and view for a DTSO-computation.
Let consider the following DTSO-computation
[TABLE]
containing only transitions of a process p
with two variables x and y
where
ci=(qi,bi,memi) for all i:0≤i≤n=5
such that:
[TABLE]
We note that
n=5 and πDTSO contains only transitions of the process p.
We also note that
k=1 and
i1=1 is the index of the only write transition
t1 occurring in the computation
πDTSO.
Following the above definitions of index and view, we define the functions index and view as follows:
(1)
For each j:0≤j≤n=5, we define the mapping function indexj(p):
[TABLE]
2. (2)
For each j:0≤j≤n=5, we define the memory view viewp(cj):
[TABLE]
Now, let ≺ be an arbitrary total order on the set of processes and let pmin and pmax be the smallest and largest elements of ≺ respectively. For p=pmax, we define succ(p) to be the successor of p wrt. ≺, i.e., p≺succ(p) and there is no p′ with p≺p′≺succ(p). We define prev(p) for p=pmin analogously.
The computation πTSO will consist of k+1 phases (henceforth referred to as the phases 0,1,2,…,k). In fact, πTSO will have the same sequence of memory updates as πDTSO. At the phase r, the computation πTSOsimulates the movements of the processes where their memory view index is r. The order in which the processes are simulated during phase r is defined by the ordering ≺. First, process pmin will perform a sequence of transitions. This sequence is derived from the sequence of transitions it performs in πDTSO where its memory view index is r,
including
“no”,
write,
read,
fence
transitions.
Then, the next process performs its transitions. This continues until pmax has made all its transitions. When all processes have performed their transitions in phase r,
we execute exactly
one update transition (possibly with a write transition)
or one atomic read-write transition
in order to move to phase
r+1.
We start the
phase r+1 by letting pmin execute its transitions, and so on.
Formally, we define a scheduling functionα(r,p,ℓ)
that gives
for each
r:0≤r≤k, p∈P, and ℓ≥1
a natural number j:0≤j≤n
such that
process p executes the transition
tj as its ℓth transition during phase r.
The scheduling function α is defined as follows where r:0≤r≤k, p∈P, and ℓ≥0:
(1)
α(r,p,ℓ+1) is defined to be the smallest j such that α(r,p,ℓ)<j, tj∈Δp and viewp(cj)=r. Intuitively, the (ℓ+1)th transition of process p during phase r is defined by the next transition from tα(r,p,ℓ) that belongs to Δp. Notice that α(r,p,ℓ+1) is defined only for finitely many ℓ.
2. (2)
If {j∣viewp(cj)=r}=∅, we define
α(r,p,0):=min{j∣viewp(cj)=r}.
Otherwise, we define α(r,p,0):=α(r−1,p,♯(r−1,p)) where
[TABLE]
Intuitively,
phase r starts for process p at the point where its memory view index becomes equal to r. Notice that α(0,p,0)=0 for all p∈P since all processes are initially in phase [math].
{exa}
In the following, we show how to calculate the
scheduling function α(r,p,ℓ) and ♯(r,p)
where r:0≤r≤k, p∈P, and ℓ≥0
for the DTSO-computation πDTSO
given in Example A.1.
We recall that k=1, n=5 and the definitions of the two functions index and view are given in Example A.1.
We also recall that πDTSO contains only transitions of the process p.
The constructed TSO-computation πTSO from πDTSO will consist of k+1=2 phases, referred as the phase [math] and the phase 1.
In order to define the transitions of the process p in different phases,
for each r:0≤r≤k=1 and ℓ≥0, the
scheduling function α(r,p,ℓ) and ♯(r,p) is defined as follows:
[TABLE]
In order to define πTSO, we first define the set of configurations that will appear in πTSO. In more detail, for each r:0≤r≤k, p∈P, and ℓ:0≤ℓ≤♯(r,p), we define a TSO-configuration dr,p,ℓ based on the DTSO-configurations that are appearing in πDTSO. We will define dr,p,ℓ by defining its local states, buffer contents, and memory state.
(1)
We define the local states of the processes as follows:
•
states(dr,p,ℓ)(p):=states(cα(r,p,ℓ))(p).
After process p has performed its ℓth transition during phase r,
its local state is identical to its local state in the corresponding
DTSO-configuration cα(r,p,ℓ).
•
If p′≺p then
states(dr,p,ℓ)(p′):=states(cα(r,p′,♯(r,p′)))(p′),
i.e. the state of p′ will not change while
p is making its moves.
This state is given by the local state of p′ after it made its last move
during phase r.
•
If p≺p′ then
states(dr,p,ℓ)(p′):=states(cα(r,p′,0))(p′),
i.e. the local state of p′ will not change while
p is making its moves.
This state is given by the local state of p′ when it entered phase r
(before it has made any moves during phase r).
2. (2)
To define the buffer contents, we give more definitions.
For a DTSO-message a of the form (x,v), we define
DTSO2TSO(a) to be
ϵ. For a DTSO-message a of the form (x,v,\textscown), we define
DTSO2TSO(a) to be
(x,v).
From that, we define DTSO2TSO(ϵ)=ϵ and
DTSO2TSO(a1a2⋯an):=DTSO2TSO(a1)⋅DTSO2TSO(a2)⋯DTSO2TSO(an), i.e.,
we concatenate the results of applying the operation
individually on each ai with 1≤i≤n.
We define DTSO2TSO+(w) for a word w∈((X×V)∪(X×V×{\textscown}))∗ as follows: If ∣w∣=0 then DTSO2TSO+(w):=ϵ, else DTSO2TSO+(w):=DTSO2TSO(w(1)w(2)⋯w(∣w∣−1)).
In the following, we give the definition of the buffer contents of dr,p,ℓ:
•
buffers(dr,p,ℓ)(p):=DTSO2TSO+(buffers(cα(r,p,ℓ))(p)). After process p has performed its ℓth transition during phase r, the content of its buffer is defined by considering
the buffer of the corresponding DTSO-configuration cα(r,p,ℓ) and
only messages belong to p (i.e., of the form (x,v,\textscown)).
•
If p′≺p then
buffers(dr,p,ℓ)(p′):=buffers(cα(r,p′,♯(r,p′)))(p′).
In a similar manner to the case of states,
if p′≺p then the buffer of p′ will not change while
p is making its moves.
•
If p≺p′ then
buffers(dr,p,ℓ)(p′):=buffers(cα(r,p′,0))(p′).
In a similar manner to the case of states,
if p≺p′ then the buffer of p′ will not be changed while
p is making its moves.
3. (3)
We define the memory state as follows:
•
mem(dr,p,ℓ):=mem(cir).
This definition is consistent with the fact that
all processes have identical views of the memory when they are in the same phase r.
This view is defined by the memory component of cir.
{exa}
In the following, we give the configurations dr,p,ℓ
for all r:0≤r≤k, p∈P, and ℓ:0≤ℓ≤♯(r,p)
that will appear in the constructed TSO-computation
πTSO from πDTSO
given in Example A.1.
We call that k=1, n=5,
and πDTSO contains only transitions of the process p.
We also recall that the
scheduling function α(r,p,ℓ) and ♯(r,p)
are given in Example A.1.
For each r:0≤r≤k=1 and ℓ:0≤ℓ≤♯(r,p), we define the TSO-configurations dr,p,ℓ=(qr,p,ℓ,br,p,ℓ,memr,p,ℓ) based on the DTSO-configurations that are appearing in πDTSO as follows:
[TABLE]
Finally, we construct the TSO-computation
[TABLE]
where
[TABLE]
and:
[TABLE]
Since there is only one update transition in both two computations πDTSO and πTSO,
it is easy to see that
πTSO has the same sequence of memory updates as πDTSO.
It is also easy to see that
d0,p,0=cinit
and
d1,p,♯(1,p)=d1,p,1=(states(c5),b,mem(c5))
where
b(p)=buffers(c5)(p)=ϵ.
Therefore, πTSO is a witness of the construction.
△
The following lemma
shows the existence of a TSO-computation πTSO that starts from the initial
TSO-configuration and whose target has the same local state definitions
as the target cn of the DTSO-computation πDTSO.
This concludes the proof of the if direction of Theorem 1.
Lemma 12**.**
d0,pmin,0πTSOTSOdk,pmax,♯(k,pmax)* for
some TSO-computation πTSO.
Furthermore,
d0,pmin,0 is the initial TSO-configuration
and*
[TABLE]
where
buffers(p)=buffers(cn)(p)=ϵ for all p∈P.
Proof A.1**.**
Lemmas 16–19
show that the existence of the computation πTSO.
Lemma 21 and Lemma 20
show the conditions on the initial and target configurations.
First, we start by establishing Lemma 13, Lemma 14, and Lemma 15 that we will use later.
Lemma 13**.**
For every j:0≤j≤n and process p∈P, the following properties hold:
(1)
∣indexj(p)∣=∣buffers(cj)(p)∣.
2. (2)
For every ℓ1,ℓ2:1≤ℓ1≤ℓ2≤∣indexj(p)∣, indexj(p)(ℓ2)≤indexj(p)(ℓ1)≤j.
3. (3)
For every ℓ1,ℓ2:1≤ℓ1<ℓ2≤∣indexj(p)∣ such that buffers(cj)(p)(ℓ1) is of the form (x,v,\textscown),
indexj(p)(ℓ2)<indexj(p)(ℓ1).
4. (4)
For every ℓ:1≤ℓ≤∣buffers(cj)(p)∣, if r=indexj(p)(ℓ) and buffers(cj)(p)(ℓ) is of the form (x,v,\textscown), then tir∈Δp and it is of the form (q,w(x,v),q′).
5. (5)
For every ℓ:1≤ℓ≤∣buffers(cj)(p)∣, if r=indexj(p)(ℓ) and buffers(cj)(p)(ℓ) is of the form (x,v), then mem(cir)(x)=v.
6. (6)
For every r1,r2:0≤r1≤r2≤k such that r1=\textscmin{indexj(p)(ℓ)∣ℓ:1≤ℓ≤∣indexj(p)∣}, tr2∈Δp, and tr2 is of the form (q,w(x,v),q′), then there is an index ℓ:1≤ℓ≤∣indexj(cj)(p)∣ such that indexj(p)(ℓ)=r2 and buffers(cj)(p)(ℓ)=(x,v,\textscown).
Proof A.2**.**
The lemma holds following an immediate consequence of the definition of indexj.
Lemma 14**.**
For every process p∈P and index j:0≤j<n, viewp(cj)≤viewp(cj+1). Furthermore, iviewp(cj)≤j and iviewp(cj+1)≤j+1.
Proof A.3**.**
The lemma holds following an immediate consequence of the definitions of viewp and indexj.
Lemma 15**.**
For every natural number j such that α(r,p,ℓ)≤j<α(r,p,ℓ+1)−1,
DTSO2TSO+(buffers(cj)(p))=DTSO2TSO+(buffers(cj+1)(p)).
Proof A.4**.**
The proof is done by contradiction. Let us assume that there is some j:α(r,p,ℓ)≤j<α(r,p,ℓ+1)−1 such that
[TABLE]
Observe that the only three operations that can change the content of the load buffer of the process p are write, delete and propagation operations. Since tj∈/Δp (and so no write operation has been performed) and propagation will append messages of the form (x,v), this implies that tj is a delete transition of the process p (i.e., tj=deletep). Now, the only case when DTSO2TSO+(buffers(cj)(p))=DTSO2TSO+(buffers(cj+1)(p)) is where buffers(cj)(p) is of the form w⋅(y,v′,\textscown)⋅m with m∈{(x,v),(x,v,\textscown)∣x∈X,v∈V}. This implies that buffers(cj+1)(p)=w⋅(y,v′,\textscown). Now we can use the third case of Lemma 13 to prove that viewp(cj+1)>viewp(cj). This contradicts the fact that viewp(cj+1)≤viewp(cα(r,p,ℓ+1)) (see Lemma 14) since we have viewp(cα(r,p,ℓ))=viewp(cα(r,p,ℓ+1))=r (by definition), viewp(cj)≥viewp(cα(r,p,ℓ)) (see Lemma 14) and viewp(cj+1)>viewp(cj).
Now we can start proving the existence of the computation πTSO by showing that we can move from the configuration dr,p,ℓ to dr,p,ℓ+1 using the transition tα(r,p,ℓ+1).
Lemma 16**.**
If ℓ<♯(r,p) then
dr,p,ℓtα(r,p,ℓ+1)TSOdr,p,ℓ+1.
Proof A.5**.**
We recall that tα(r,p,ℓ+1)∈Δp by definition.
Therefore, tα(r,p,ℓ+1) is not
a propagation transition nor a delete transition. Furthermore, suppose that tα(r,p,ℓ+1) is an atomic read-write transition.
It leads to the fact that viewp(cα(r,p,ℓ+1))>viewp(cα(r,p,ℓ)), contradicting to the assumption that we are in phase r.
Hence, tα(r,p,ℓ+1) is not an atomic read-write transition.
Let tα(r,p,ℓ+1)∈Δp be of the form
(q,op,q′).
To prove the lemma, we will prove the following properties:
(1)
states(dα(r,p,ℓ))(p)=q* and
states(dr,p,ℓ+1)=states(dα(r,p,ℓ))[p↩q′],*
2. (2)
states(dr,p,ℓ+1)(p′)=states(dr,p,ℓ)(p′)*
for p′=p,*
3. (3)
buffers(dr,p,ℓ+1)(p′)=buffers(dr,p,ℓ)(p′)*
for p′=p,*
4. (4)
mem(dr,p,ℓ)=mem(dr,p,ℓ+1)=mem(cir),
5. (5)
The contents of buffers(dr,p,ℓ)(p)
and buffers(dr,p,ℓ+1)(p) are compatible with the transition tα(r,p,ℓ+1).
In means that with the properties (1)–(4), the property (5) allows that dr,p,ℓtα(r,p,ℓ+1)TSOdr,p,ℓ+1.
We prove the property (1).
We see from definition of α
that tj∈Δp for all
j:α(r,p,ℓ)<j<α(r,p,ℓ+1).
It follows that
states(cj)(p)=states(cα(r,p,ℓ))(p) for all
j:α(r,p,ℓ)<j<α(r,p,ℓ+1).
In particular, we have
states(cα(r,p,ℓ+1)−1)(p)=states(cα(r,p,ℓ))(p).
Then, from the fact that
cα(r,p,ℓ+1)−1tα(r,p,ℓ+1)DTSOcα(r,p,ℓ+1)
and the definitions of dr,p,ℓ and dr,p,ℓ+1,
we know that states(dr,p,ℓ)(p)=states(cα(r,p,ℓ))(p)=states(cα(r,p,ℓ+1)−1)(p)=q.
It follows that
[TABLE]
This concludes the property (1).
We prove the property (2).
We see from the definitions of dα(r,p,ℓ) and dα(r,p,ℓ+1)
that
if p′≺p
then
states(dr,p,ℓ+1)(p′)=states(cα(r,p′,♯(k,p′)))(p′)=states(dr,p,ℓ)(p′).
Moreover, we have
if p≺p′
then
[TABLE]
This concludes the property (2).
We prove the properties (3) and (4).
In a similar manner to the case of states, we can show
the property (3).
By the definitions of dα(r,p,ℓ) and dα(r,p,ℓ+1) and the fact that
ℓ<ℓ+1≤♯(r,p), we have
mem(dr,p,ℓ)=mem(cir)=mem(dr,p,ℓ+1).
This concludes the property (4).
Now, it remains to prove the property (5).
We consider the cases where op is a write or a read operation.
The other cases can be treated in a similar way.
•
op=w(x,v):
We see from Lemma 15 that for all:
j:α(r,p,ℓ)<j<α(r,p,ℓ+1)
[TABLE]
In particular, we have
[TABLE]
Then,
since
cα(r,p,ℓ+1)−1tα(r,p,ℓ+1)DTSOcα(r,p,ℓ+1),
we have
buffers(cα(r,p,ℓ+1))(p)=(x,v,\textscown)⋅buffers(cα(r,p,ℓ+1)−1)(p).
We will show that buffers(cα(r,p,ℓ+1)−1)(p)=ϵ by contradiction. Let us suppose that buffers(cα(r,p,ℓ+1)−1)(p)=ϵ. By definition, we have viewp(cα(r,p,ℓ+1))=r′ such that ir′=α(r,p,ℓ+1). Furthermore, by applying Lemma 14 to cα(r,p,ℓ), we know that ir≤α(r,p,ℓ).
Then, since α(r,p,ℓ)<α(r,p,ℓ+1) by definition, we have ir<ir′. This contradicts to the fact that viewp(cα(r,p,ℓ+1))=r by definition.
Therefore, we have buffers(cα(r,p,ℓ+1)−1)(p)=ϵ.
As a consequence of the fact that buffers(cα(r,p,ℓ+1)−1)(p)=ϵ, we know that
[TABLE]
Then, since
[TABLE]
it follows that
buffers(dr,p,ℓ+1)=(x,v)⋅buffers(dr,p,ℓ). Hence this implies that dr,p,ℓtα(r,p,ℓ+1)TSOdr,p,ℓ+1.
•
op=r(x,v):
We see from
Lemma 15 that for all
j:α(r,p,ℓ)<j<α(r,p,ℓ+1)
[TABLE]
In particular, we have
[TABLE]
Then,
since
cα(r,p,ℓ+1)−1tα(r,p,ℓ+1)DTSOcα(r,p,ℓ+1),
we have
buffers(cα(r,p,ℓ+1))(p)=buffers(cα(r,p,ℓ+1)−1)(p). Therefore, buffers(dr,p,ℓ+1)(p)=buffers(dr,p,ℓ)(p).
We consider two cases about the type of the operation op:
–
Read own write*:
We see that there is an i:1≤i<∣buffers(cα(r,p,ℓ+1)−1)(p)∣ such that
buffers(cα(r,p,ℓ+1)−1)(p)(i)=(x,v,\textscown), and that
there are no j:1≤j<i and v′∈V such that buffers(cα(r,p,ℓ+1)−1)(p)(j)=(x,v′,\textscown). As a consequence, this implies that there is an i′:1≤i′≤∣DTSO2TSO+(buffers(cα(r,p,ℓ+1)−1)(p))∣ such that DTSO2TSO+(buffers(cα(r,p,ℓ+1)−1)(p)(i′))=(x,v) and there are no j′:1≤j′<i′ and v′∈V such that DTSO2TSO+(buffers(cα(r,p,ℓ+1)−1)(p)(j))=(x,v′). From the fact that*
[TABLE]
we have dr,p,ℓtα(r,p,ℓ+1)TSOdr,p,ℓ+1.
–
Read memory*:
We consider two cases:*
▹
buffers(cα(r,p,ℓ+1)−1)(p)(i)=(x,v,\textscown)* where i=∣buffers(cα(r,p,ℓ+1)−1)(p)∣ and
there are no j:1≤j<i and v′∈V such that buffers(cα(r,p,ℓ+1)−1)(p)(j)=(x,v′,\textscown): Since viewp(cα(r,p,ℓ+1)−1)=r, this implies from Lemma 13 that tir∈Δp and it is of the form (q,w(x,v),q′). Hence, we see that mem(cir)(x)=v and thus mem(dr,p,ℓ)(x)=mem(dr,p,ℓ+1)(x)=v.
Therefore, we have dr,p,ℓtα(r,p,ℓ+1)TSOdr,p,ℓ+1.*
2. ▹
(x,v′,\textscown)∈buffers(cα(r,p,ℓ+1)−1)(p)*
for all
v′∈V: Thus buffers(cα(r,p,ℓ+1)−1) is of the form w⋅(x,v).
Since viewp(cα(r,p,ℓ+1)−1)=r, this implies from Lemma 13 that mem(cir)(x)=v and thus mem(dr,p,ℓ)(x)=mem(dr,p,ℓ+1)(x)=v.
Therefore, we have
dr,p,ℓtα(r,p,ℓ+1)TSOdr,p,ℓ+1.*
To prove the lemma, we will prove the following properties:
(1)
states(dr,p,♯(r,p))(p′)=states(dr,succ(p),0)(p′)*
for all p′∈P,*
2. (2)
buffers(dr,p,♯(r,p))(p′)=buffers(dr,succ(p),0)(p′)*
for all p′∈P,*
3. (3)
mem(dr,p,♯(r,p))=mem(dr,succ(p),0).
We prove the property (1) by considering four cases:
•
p′=p:
From the definitions of dr,succ(p),ℓ and dr,p,♯(r,p), we have
states(dr,succ(p),ℓ)(p)=states(dr,p,♯(r,p))(p)
for all ℓ:0≤ℓ≤♯(r,succ(p)).
In particular, we see that
states(dr,succ(p),0)(p)=states(dr,p,♯(r,p))(p).
•
p′=succ(p):
From the definitions of dr,p,ℓ and dr,succ(p),0,
states(dr,p,ℓ)(succ(p))=states(dr,succ(p),0)(succ(p))
for all ℓ:0≤ℓ≤♯(r,p).
In particular,
we see that
states(dr,p,♯(r,p))(succ(p))=states(dr,succ(p),0)(succ(p)).
It follows from p′=succ(p)
that
states(dr,p,♯(r,p))(p′)=states(dr,succ(p),0)(p′).
•
p′≺p≺succ(p):
From the definitions of dr,succ(p),ℓ and dr,p′,♯(r,p′), we know that
states(dr,succ(p),ℓ)(p′)=states(dr,p′,♯(r,p′))(p′)
for all
ℓ:0≤ℓ≤♯(r,succ(p)).
In particular, we see that
states(dr,succ(p),0)(p′)=states(dr,p′,♯(r,p′))(p′).
Also, by a similar argument, we have
states(dr,p,ℓ)(p′)=states(dr,p′,♯(r,p′))(p′)
for all
ℓ:0≤ℓ≤♯(r,p).
In particular, we see that
states(dr,p,♯(r,p))(p′)=states(dr,p′,♯(r,p′))(p′).
Hence, we have
states(dr,succ(p),0)(p′)=states(dr,p,♯(r,p))(p′).
•
p≺succ(p)≺p′:
From the definitions of dr,succ(p),ℓ and
dr,p′,0,
we can see that
states(dr,succ(p),ℓ)(p′)=states(dr,p′,0)(p′)
for all
ℓ:0≤ℓ≤♯(r,p).
In particular, we see that
states(dr,succ(p),0)(p′)=states(dr,p′,0)(p′).
Also, by a similar argument, we have
states(dr,p,ℓ)(p′)=states(dr,p′,0)(p′)
for all
ℓ:0≤ℓ≤♯(r,p).
In particular, we see that
states(dr,p,♯(r,p))(p′)=states(dr,p′,0)(p′).
Hence, we have
states(dr,succ(p),0)(p′)=states(dr,p,♯(r,p))(p′).
We prove the properties (2) and (3).
By a similar manner to the case of states, we can show the property (2).
Finally, to show the property (3), by the definition of dr,p,♯(r,p), it follows that
mem(dr,p,♯(r,p))=mem(cir).
Also, by a similar argument, we have
mem(dr,succ(p),0)=mem(cir).
Hence, we have
mem(dr,p,♯(r,p))=mem(dr,succ(p),0).
mem(dr,pmax,♯(r,pmax))(x)=v* and
mem(dr+1,pmin,0)(x)=v′.*
We show the property (1).
Let p∈P∖{pu}.
From the definition of dr,pmax,♯(r,pmax) and dr,p,♯(r,p),
states(dr,pmax,♯(r,pmax))(p)=states(dr,p,♯(r,p))(p)=states(cα(r,p,♯(r,p)))(p)
and states(dr+1,pmin,0)(p)=states(dr+1,p,0)(p)=states(cα(r+1,p,0))(p).
From the definition of α, it follows that
tj∈Δp for all
j:α(r,p,♯(r,p))≤j<α(r+1,p,0). This implies that states(cα(r+1,p,0)−1)(p)=states(cα(r,p,♯(r,p)))(p). Now we have two cases:
•
{j∣viewp(cj)=r+1}=∅: We see that α(r+1,p,0)=α(r,p,♯(r,p)), and hence that states(dr,pmax,♯(r,pmax))(p)=states(dr+1,pmin,0)(p).
•
{j∣viewp(cj)=r+1}=∅: Since viewp(cα(r+1,p,0)−1)=r, we can show that tα(r+1,p,0)∈/Δp. This is done by contradiction as follows. In fact if tα(r+1,p,0)∈Δp, then it is either a write transition or an atomic read-write transition. This implies that in both cases that buffers(cα(r+1,p,0)−1)(p)=ϵ and that viewp(cα(r+1,p,0))=α(r+1,p,0). Hence, we have α(r+1,p,0)=r+1, and this leads to a contradiction since tir+1∈Δpu. Thus, we have states(dr,pmax,♯(r,pmax))(p)=states(dr+1,pmin,0)(p).
In a similar manner to the case of states, we can show the property (2).
Now we show the properties (3) and (4).
Using a similar reasoning as for the process p,
we know that states(cα(r+1,pu,0)−1)(pu)=states(cα(r,pu,♯(r,pu)))(pu). From the definition of πDTSO, it follows that
cα(r+1,pu,0)−1tα(r+1,pu,0)DTSOcα(r+1,pu,0).
Furthermore, since viewp(cα(r+1,pu,0))=r+1 and viewp(cα(r+1,pu,0)−1)<r+1, we know that tα(r+1,pu,0)=tir+1. This implies that buffers(cα(r+1,pu,0)−1)(pu)=buffers(cα(r+1,pu,0))(pu)=ϵ and states(cα(r+1,pu,0)−1)(pu)=q and states(cα(r+1,pu,0))(pu)=q′.
Now since
[TABLE]
we see that
[TABLE]
and that states(dr,pmax,♯(r,pmax))(pu)=q and states(dr+1,pmin,0)(pu)=q′.
This concludes the properties (3) and (4).
We show the property (5). From the definition of πDTSO, it follows that
mem(cir+1)=mem(cir)[x↩v′] with mem(cir)(x)=v.
Then from the definitions of dr,pmax,♯(r,pmax) and dr+1,pmin,0,
we have the property (5).
If r<k and tir+1∈Δpu such that tir+1 is of the form (q,w(x,v),q′),
then
dr,pmax,♯(r,pmax)∗TSOdr+1,pmin,0.
Proof A.8**.**
To prove the lemma, we will prove the following properties:
(1)
states(dr,pmax,♯(r,pmax))(p)=states(dr+1,pmin,0)(p)* for all p=pu,*
2. (2)
buffers(dr,pmax,♯(r,pmax))(p)=buffers(dr+1,pmin,0)(p)*
for all p=pu,*
3. (3)
The contents of buffers
states(dr,pmax,♯(r,pmax))(pu) and states(dr+1,pmin,0)(pu)
are compatible, i.e. with the properties (1)–(2), the property (3) allows
dr,pmax,♯(r,pmax)∗TSOdr+1,pmin,0.
We show the property (1).
Let p∈P∖{pu}.
From the definitions of dr,pmax,♯(r,pmax) and dr,p,♯(r,p),
states(dr,pmax,♯(r,pmax))(p)=states(dr,p,♯(r,p))(p)=states(cα(r,p,♯(r,p)))(p)
and that
states(dr+1,pmin,0)(p)=states(dr+1,p,0)(p)=states(cα(r+1,p,0))(p).
From the definition of α, it follows that
tj∈Δp for all
j:α(r,p,♯(r,p))≤j<α(r+1,p,0). This implies that states(cα(r+1,p,0)−1)(p)=states(cα(r,p,♯(r,p)))(p). Now we have two cases:
•
{j∣viewp(cj)=r+1}=∅: We see that α(r+1,p,0)=α(r,p,♯(r,p)), and hence that states(dr,pmax,♯(r,pmax))(p)=states(dr+1,pmin,0)(p).
•
{j∣viewp(cj)=r+1}=∅: Since viewp(cα(r+1,p,0)−1)=r, we can show that tα(r+1,p,0)∈/Δp . This is done by contradiction as follows. In fact if tα(r+1,p,0)∈Δp, then it is either a write transition or an atomic read-write transition. This implies that in both cases that buffers(cα(r+1,p,0)−1)(p)=ϵ and that viewp(cα(r+1,p,0))=α(r+1,p,0). Hence, we have α(r+1,p,0)=r+1, and this leads to a contradiction since tir+1∈Δpu. Thus, we have states(dr,pmax,♯(r,pmax))(p)=states(dr+1,pmin,0)(p).
In a similar manner to the case of states, we can show the property (2).
Now we show the property (3).
Using a similar reasoning as for the process p, we know that states(cα(r+1,pu,0)−1)(pu)=states(cα(r,pu,♯(r,pu)))(pu). From the definition of πDTSO, it follows that
cα(r+1,pu,0)−1tα(r+1,pu,0)DTSOcα(r+1,pu,0).
Furthermore, from the fact that viewp(cα(r+1,pu,0))=r+1 and viewp(cα(r+1,pu,0)−1)<r+1, we have two cases to consider:
•
buffers(cα(r+1,pu,0)−1)(pu)=ϵ: It follows from the conditions
for viewp(cα(r+1,pu,0)) and viewp(cα(r+1,pu,0)−1)
that tα(r+1,pu,0)=tir+1, buffers(cα(r+1,pu,0))(pu)=(x,v,\textscown), and that states(cα(r+1,pu,0)−1)(pu)=q and states(cα(r+1,pu,0))(pu)=q′. From
[TABLE]
we have buffers(dr,pmax,♯(r,pmax))(pu)=buffers(dr+1,pmin,0)(pu)=ϵ.
Moreover, we have states(dr,pmax,♯(r,pmax))(pu)=q, and states(dr+1,pmin,0)(pu)=q′.
Then, it is easy to see that mem(cir+1)=mem(cir)[x↩v]. Hence, we have dr,pmax,♯(r,pmax)tir+1TSOd′updatepuTSOdr+1,pmin,0 for some configuration d′.
•
buffers(cα(r+1,pu,0)−1)(pu)=ϵ: It follows from the conditions
for viewp(cα(r+1,pu,0)) and viewp(cα(r+1,pu,0)−1)
that tα(r+1,pu,0) is a delete transition of the process pu. As a consequence, buffers(cα(r+1,pu,0)−1)(pu)=w⋅(x,v,\textscown)⋅m and buffers(cα(r+1,pu,0))(pu)=w⋅(x,v,\textscown). Hence, we see that
[TABLE]
and therefore buffers(dr+1,pmin,0)(pu)=buffers(dr,pmax,♯(r,pmax))(pu)(x,v). Furthermore, we have states(cα(r+1,pu,0))(pu)=states(cα(r+1,pu,0)−1)(pu) and this implies that
states(dr,pmax,♯(r,pmax))(pu)=states(dr+1,pmin,0)(pu).
Then, it is easy to see that mem(cir+1)=mem(cir)[x↩v]. Hence, we have dr,pmax,♯(r,pmax)updatepuTSOdr+1,pmin,0.
The following lemma shows that the TSO-computation πTSO starts from the initial TSO-configuration.
Lemma 20**.**
d0,pmin,0* is the initial TSO-configuration.*
Proof A.9**.**
Let us take any p∈P.
By the definitions of d0,pmin,0, d0,p,0, and α(0,p,0), it follows that states(d0,pmin,0)(p)=states(d0,p,0)(p)=states(cα(0,p,0))(p)=states(c0)(p)=qinit.
Also,
buffers(d0,pmin,0)(p)=buffers(d0,p,0)(p)=DTSO2TSO+(buffers(c0)(p))=ϵ.
Finally, we have mem(d0,pmin,0)=mem(ci0)=mem(c0).
The result follows immediately for the
definition of the initial TSO-configuration.
This concludes the proof of Lemma 20.
The following lemma shows that the target of the TSO-computation πTSO has the same local process states as
the target cn of the DTSO-computation πDTSO.
Lemma 21**.**
states(dk,pmax,♯(k,pmax))=states(cn).
Proof A.10**.**
Let us take any p∈P.
By the definitions of dk,pmax,♯(k,pmax) and dk,p,♯(k,p), it follows that
states(dk,pmax,♯(k,pmax))(p)=states(dk,p,♯(k,p))(p)=states(cα(k,p,♯(k,p)))(p).
By definition of α(k,p,♯(k,p)), we know that
tj∈Δp for all
j:α(k,p,♯(k,p))<j≤n.
Therefore, we have
states(cj)(p)=states(cn)(p) for all
j:α(k,p,♯(k,p))≤j<n.
In particular, we have
states(cα(k,p,♯(k,p)))(p)=states(cn)(p).
Hence, we have
states(dk,pmax,♯(r,pmax))(p)=states(cn)(p).
This concludes the proof of Lemma 21.
A.2. From TSO to Dual TSO
We show the only if direction of Theorem 1. Consider a TSO-computation
[TABLE]
where c0=cinit and ci is of the form (qi,bi,memi) for all i:1≤i≤n with qn=qtarget.
In the following, we will derive a DTSO-computation πDTSO
such that states(target(πDTSO))=states(cn), i.e. the runs πTSO and πDTSO reach the same set of local states at the end of the runs.
Similar to the previous case, we will first define some functions that we will use in the construction
of the computation πDTSO.
Then, we define a sequence of DTSO-configurations that appear in πDTSO.
Finally, we show that the DTSO-computation πDTSO exists. In particular,
the target configuration target(πDTSO) has the same local states as the target cn of the TSO-computation πTSO.
For every p∈P, let Δpw,arw⊆Δp (resp. Δpu,arw⊆Δp∪{updatep}) be the set of write (resp. update) and atomic read-write transitions that can be performed by process p.
Let Δpr be the set of read transitions that can be performed by p.
Let I=i1…im be the maximal sequence of indices such that 1≤i1<i2<⋯<im≤n and for every j:1≤j≤m, we have tij is an update transition or an atomic read-write transition (i.e., tij∈⋃p∈PΔpu,arw).
In the following, we assume that i0=0.
Let Ip be the maximal subsequence of I such that
all transitions with indices in Ip belong to process p.
Let I′=i1′…im′ be the maximal sequence of indices such that 1≤i1′<i2′<⋯<im′≤n and for every j:1≤j≤m, we have tij′ is a write transition or an atomic read-write transition (i.e., tij′∈⋃p∈PΔpw,arw).
Let Ip′ be the maximal subsequence of I′ such that
all transitions with indices in Ip′ belong to process p.
Observe that ∣Ip∣=∣Ip′∣.
For every j:1≤j≤m, let proc(j) be the process that has the
update or atomic read-write transition tij where ij∈I.
We define match(ij) to be the index of the write (resp. atomic read-write) transition tmatch(ij)
that corresponds to the update (resp. atomic read-write) transition tij.
Formally,
match(ij):=l where ∃k:1≤k≤∣Ip∣,Ip(k)=ij, Ip′(k)=l and 1≤l≤n.
Observe that if tij is an atomic read-write operation, then match(ij)=ij.
{exa}
We give an example of how to calculate
the function match
for a TSO-computation.
Let us consider the following TSO-computation
[TABLE]
containing only transitions of a process p
with two variables x and y
where
ci=(qi,bi,memi) for all i:0≤i≤n=3
such that:
[TABLE]
Following the above definitions of I and I′, I=i1=2 (hence, m=1) is the maximal sequence of indices of all update or atomic read-write transitions in πTSO.
In a similar way, I′=i1′=1 is the maximal sequence of indices of all write or atomic read-write transitions in πTSO.
We note that
ti1=t2 is an update transition,
and
ti1′′=t1 is a write transition.
Since the TSO-computation contains only transition of the process p,
it follows that
I=Ip and I′=Ip′.
Following the above definition of match, with m=1 and n=3, we have
match(i1)=match(2)=1.
△
For every j:1≤j≤n such that tj∈Δpr is a read transition
of process p, we define fromMem(tj) as a predicate such that fromMem(tj) holds if and only if (x,v′)∈/buffers(cj−1) for all v′∈V.
For every j:1≤j≤n and p∈P, we define the function labelp(j) as follows:
(1)
labelp(j):=(x,v) if tj∈Δpr is of the form (q,r(x,v),q′) and fromMem(tj) holds.
2. (2)
labelp(j):=(x,v,\textscown) if tj=updatep and match(j)=l with tl of the form (q,w(x,v),q′).
3. (3)
labelp(j):=ϵ otherwise.
Given a sequence ℓ1⋯ℓk with k≥1 and 1≤ℓi≤n for all i:1≤i≤k, we define labelp(ℓ1⋯ℓk):=labelp(ℓ1)⋯labelp(ℓk−1)⋅labelp(ℓk).
Let labelprev(ℓ1⋯ℓk) with k≥1
and 1≤ℓi≤n for all i:1≤i≤k
be the reversed string of
labelp(ℓ1⋯ℓk), i.e. labelprev(ℓ1⋯ℓk):=labelp(ℓk)⋅labelp(ℓk−1)⋯labelp(ℓ1).
{exa}
In the following, we give an example of how to calculate the functions fromMem and label for the TSO-computation πTSO
given in Example A.2.
We recall that n=3
and the function match is given in Example A.2.
We also note that t3 is the only read transition in πTSO.
Following the above definition of fromMem, we have that fromMem(t3) holds.
Then following the definition of match, for every j:1≤j≤n=3, we define the function labelp(j) as follows:
[TABLE]
Below we show how to simulate all transitions of the TSO-computation πTSO by a set of corresponding transitions in the DTSO-computation πDTSO.
The idea is to divide the DTSO-computation to m+1 phases.
For
0≤r<m, each phase r will end at the configuration dr+1 by the simulation of the transition tmatch(ir+1) in πTSO.
Moreover, in phase r:0≤r<m,
we call the process proc(r+1) as the *active * process, and other processes as the *inactive * ones. We execute only the DTSO-transitions of the active process p=proc(r+1) in its active phases. For other processes p′=p, we only change the content of their buffers in the active phases of p.
In the final phase r=m,
all processes will be considered to be active because the index im+1 is not defined in the definition of the sequence I.
The DTSO-computation πDTSO will end at the configuration dm+1.
For every r:−1≤r<m and p∈P,
we define the function pos(r,p)
in an inductive way on r:
(1)
pos(−1,p):=0 for all p∈P.
2. (2)
pos(r,p):=pos(r−1,p) for all p=proc(r+1) and 0≤r<m.
3. (3)
pos(r,p):=match(ir+1) for p=proc(r+1) and 0≤r<m.
In other words,
the function pos(r,p) is the index of the last simulated transition by process p at the end of phase r in the computation πTSO.
Moreover, we use pos(−1,p) to be the index of the starting transition of process p before phase [math].
{exa}
In the following, we give an example of how to calculate the function pos for the TSO-computation πTSO
given in Example A.2.
We recall that
m=1 and
πTSO contains only transitions of the process p.
We also recall that the function match is given in Example A.2.
Following the above definition of pos, for every r:−1≤r<m=1, we define the function
pos(r,p) as follows:
[TABLE]
Let d0=cinitD=(qinit,binit,meminit). We define
the sequence of DTSO-configurations d1,…,dm,dm+1 by defining their local states, buffer contents, and memory states as follows:
In the following, we give an example of how to calculate the sequence of configurations d1,…,dm,dm+1
that will appear in the constructed DTSO-computation πDTSO
from the TSO-computation πTSO given in Figure A.2.
We recall that m=1, n=3, and the TSO-computation πTSO contains only transitions of the process p.
We also recall that
the functions label and pos are given in Example A.2
and Example A.2, respectively.
The DTSO-computation πDTSO will consist of m+1=2 phases, referred as the phase [math] and the phase 1.
For each r:0≤r≤m+1=2, we define the DTSO-configuration dr=(qr′,br′,memr′) based on the TSO-configurations that are appearing in πTSO as follows:
[TABLE]
Finally, we construct the DTSO-computation as follows:
[TABLE]
where
d12=(q12′,b12′,mem12),
d13=(q13,b13′,mem13),
d14=(q14′,b14′,mem14),
t1′=(q0,w(x,1),q1),
t2′=propagatepy,
t3′=deletept4′=(q1,r(y,0),q2),
t5′=deletep,
and:
[TABLE]
Since there is only one update transition in πDTSO and πTSO, it is easy to see that πTSO has the same sequence of memory updates as πDTSO.
It is also easy to see that
d0=cinitD
and
d3=(states(c3),b,mem(c3))
where b(p):=ϵ.
Therefore πDTSO is a witness of the construction.
△
Lemma 22
shows the existence of a DTSO-computation πDTSO that starts from the initial
TSO-configuration and whose target has the same local state definitions
as the target cn of the TSO-computation πTSO.
The only if direction of Theorem 1 will follow
directly from Lemma 22.
This concludes the proof of the only if direction of Theorem 1.
Lemma 22**.**
The following properties hold for the constructed sequence d1,…,dm,dm+1:
•
For every r:0≤r<m,
dr∗DTSOdr+1,
•
dm∗DTSOdm+1.
Proof A.11**.**
We show the proof of the lemma
follows directly Lemma 23 and Lemma 27.
To make the proof understandable,
below we consider a fence transition t=(q,fence,q′)
such that
ctTSOc′ for some c,c′
as an atomic read-write transition of the form
(q,arw(x,v,v),q′) where v∈V is the memory value of variable x∈X in c. For a given
TSO-computation πTSO, we can calculate such value v for each fence transition πTSO.
Lemma 23**.**
If 0≤r<m, then
dr∗DTSOdr+1.
Proof A.12**.**
We are in phase r.
Because from the configuration dr, the memory has not been changed until the transition tir+1, we observe that all memory-read transitions of the process p between transitions tir and tir+1 will get values from mem(dr) where p∈P.
Therefore, we can execute a sequence of propagation transitions to propagate from the memory to the buffer of process p to
full fill it by all messages that will satisfy all memory-read transitions of p between tir and tir+1.
We propagate to processes according to the order ≺: first to the process pmin and last to the process pmax. We have the following sequence:
dr(Δpropagate)∗DTSOdrpmin⋯(Δpropagate)∗DTSOdrpmax.
The shape of the configuration drpmax is:
Below let p=proc(r+1) be the active process in phase r of the DTSO-computation.
For each transition t in the sequence of transitions (including updates) of the active process, seq=(tpos(r−1,p)+1⋯tmatch(ir+1))∣Δproc(r+1)∪{updateproc(r+1)},
we execute a set of transitions in the DTSO-computation as follows:
•
To simulate a memory-read transition, we execute the same read transition.
And then we execute a delete transition to delete the oldest message
in the buffer of proc(r+1).
•
To simulate a read-own-write transition, we execute the same read transition.
•
To simulate a write transition, we execute the same write transition.
This transition must be the transition tmatch(ir+1).
According to Dual TSO semantics, we add an own-message to the buffer of proc(r+1).
•
To simulate an arw transition, we execute the same atomic read-write transition.
This transition must be the transition tmatch(ir+1) and match(ir+1)=ir+1.
•
To simulate an update transition, we execute a delete transition to delete the oldest message in the buffer of proc(r+1).
•
*To simulate a nop transition, we execute the same transitions in the DTSO-computation.
*
Let β(r,l) indicate the index in the TSO-computation of the lth transition in the sequence
seq where 1≤l≤∣seq∣.
Formally, we define β(r,l):=j where 1≤j≤n,
tj∈(Δp∪{updatep})
and
seq(l)=tj.
Let configuration dr,l where 0≤r<m be the DTSO-configuration
before simulating the transition with the index β(r,l).
We define dr,l by defining its local states,
buffer contents,
and memory state:
•
states(dr,l)(p)=states(cpos(r−1,p))(p)* for all inactive process p and all l:1≤l≤∣seq∣,*
•
states(dr,l)(p)=states(cβ(r,l)−1)(p)* for the active process p and all l:1≤l≤∣seq∣,*
•
mem(dr,l)=mem(cir)* for the active process p and all l:1≤l≤∣seq∣,*
•
buffers(dr,l)(p)=labelprev(pos(r−1,p)+1⋯ir+1−1)* for all inactive process p and all l:1≤l≤∣seq∣,*
•
buffers(dr,l)(p)=labelprev(β(r,l)⋯ir+1−1)* for the active process p and all l:1≤l≤∣seq∣.*
The Lemma 24, Lemma 26,
and Lemma 25 imply the result.
More precisely, it shows
the existence of a DTSO-computation
that starts from the DTSO-configuration drpmax and
whose target is the configuration dr+1.
This concludes the proof of Lemma 23.
Lemma 24**.**
dr,1= drpmax for 0≤r<m.
Proof A.13**.**
We show that dr,1 and drpmax have the same local states,
memory, and buffer contents. We consider two cases for the active and inactive processes.
•
For inactive process p=proc(r+1),
it is easy to see that:
–
states(dr,1)(p)=states(cpos(r−1,p))(p)=states(drpmax)(p)* by the definitions of configurations dr,1 and drpmax.*
–
buffers(dr,1)(p)=labelprev(pos(r−1,p)+1⋯ir+1−1)=buffers(drpmax)(p)* by the definitions of configurations dr,1 and drpmax.*
•
For the active process p=proc(r+1):
–
states(dr,1)(p)=states(cβ(r,1)−1)(p)=states(cpos(r−1,p))(p)* by the definition of β(r,1).
Therefore states(dr,1)(p)=states(drpmax)(p).*
–
buffers(dr,1)(p)=labelprev(β(r,1)⋯ir+1−1)*
= labelprev(pos(r−1,p)+1⋯ir+1−1) by the definition of β(r,1).
Therefore buffers(dr,1)(p) = buffers(drpmax)(p).*
In both cases,
for the memory,
mem(dr,1)=mem(cir)=mem(drpmax) by the definitions of configurations dr,1 and drpmax.
To prove the lemma, we will show the following properties:
(1)
∃dr+1′:dr,∣seq∣tmatch(ir+1)DTSOdr+1′, i.e. the transition tmatch(ir+1)
is feasible from
the configuration dr,∣seq∣.
2. (2)
Moreover, dr+1′=dr+1.
Let p=proc(r+1) be the active process.
We show the property (1) by considering two cases:
•
match(ir+1)* is a write transition:
By simulation, we execute the same transition in the DTSO-computation.
It is feasible since
states(dr,∣seq∣)(p)=states(cβ(r,∣seq∣)−1)(p)=states(cmatch(ir+1)−1)(p)
by the definitions of β(r,∣seq∣) and dr,∣seq∣.
This concludes the property (1).*
•
match(ir+1)* is an atomic read-write transition:
We notice that match(ir+1)=ir+1.
It is feasible since states(dr,∣seq∣)(p)=states(cβ(r,l)−1)(p)=states(cmatch(ir+1)−1)(p),
mem(dr,∣seq∣)=mem(cir),
and*
[TABLE]
by the definitions of β(r,∣seq∣) and dr,∣seq∣.
This concludes the property (1).
We show the property (2)
by showing that dr+1′ and dr+1 have the same local states,
memory,
and buffer contents.
Recall that
the tmatch(ir+1) can be a write transition or an atomic read-write transition.
We consider inactive processes.
For an inactive process p=proc(r+1),
we have:
•
Since the transition tmatch(ir+1) is of the active process,
we have states(dr+1′)(p)=states(drpmax)(p).
Moreover,
by
the definition
of drpmax, we see that states(drpmax)(p)=states(cpos(r−1,p))(p) .
Hence, by the definition of dr+1,
[TABLE]
•
Since the transition tmatch(ir+1) is of the active process,
we have
buffers(dr+1′)(p)=buffers(drpmax)(p).
Moreover, by
the definition
of drpmax, we have buffers(drpmax)(p)=labelprev(pos(r−1,p)+1⋯ir+1−1).
Hence, by the definition of
dr+1,
[TABLE]
We consider the active process p=proc(r+1) for the case that the transition tmatch(ir+1) is a write one.
By executing the same transition,
we add an owing message to the buffer of process p and change the memory.
•
Since the transition tmatch(ir+1) is of the active process,
we have
[TABLE]
Moreover,
it follows from the fact
β(r,l)=match(ir+1) and the definition of pos(r,p)
that
states(cβ(r,l))(p)=states(cmatch(ir+1))(p)=states(cpos(r,p))(p).
Hence, it follows by the definition of
dr+1
that states(dr+1′)(p)=states(dr+1)(p).
•
Since the transition tmatch(ir+1) is of the active process,
we have
buffers(dr+1′)(p)=labelprev(ir+1)⋅buffers(dr,∣seq∣)(p).
Then, it follows from the definition of dr,∣seq∣
that
buffers(dr+1′)(p)=labelprev(ir+1)⋅labelprev(β(r,∣seq∣)⋯ir+1−1)=labelprev(ir+1)⋅labelprev(match(ir+1)⋯ir+1−1)=labelprev(match(ir+1)⋯ir+1)=labelprev(pos(r,p)⋯ir+1)=labelprev(pos(r,p)+1⋯ir+1).
Hence, it follows by the definition of
dr+1
that buffers(dr+1′)(p)=buffers(dr+1)(p).
We consider the active process p=proc(r+1) for the case that the transition tmatch(ir+1) is an atomic read-write one.
By simulation, we execute the same transition and change the memory.
•
Since the transition tmatch(ir+1) is of the active process,
we have
states(dr+1′)(p)=states(cβ(r,l))(p).
Moreover,
it follows from the fact
β(r,l)=match(ir+1) and the definition of pos(r,p)
that
states(cβ(r,l))(p)=states(cmatch(ir+1))(p)=states(cpos(r,p))(p).
Hence, it follows by
the definition of
dr+1
that states(dr+1′)(p)=states(dr+1)(p).
•
Since the transition tmatch(ir+1) is of the active process,
we have
buffers(dr+1′)(p)=ϵ.
From the definitions of
dr+1 and pos(r,p) and the fact match(ir+1)=ir+1, we have
buffers(dr+1)(p)=labelprev(pos(r,p)+1⋯ir+1)=labelprev(match(ir+1)+1⋯ir+1)=labelprev(ir+1+1⋯ir+1)=ϵ.
Hene, it follows that buffers(dr+1′)(p)=buffers(dr+1)(p).
For both cases, for the memory, we have
mem(dr+1′)(p)=mem(cir+1)=mem(dr+1) from the fact that we change the memory by transition tmatch(ir+1)
and by the definition of dr+1.
Finally, we have dr+1′=dr+1.
The transition tβ(r,l) can be a read-from-memory, read-own-write, nop, update one.
First, we give our simulation
of the transition tβ(r,l)
from the configuration dr,l and show that this simulation is feasible.
We consider different types of the transition tβ(r,l).
Let process p=proc(r+1) is the active process.
•
tβ(r,l)* is a read-from-memory transition:
By simulation, we execute the same transition in the DTSO-computation.
Note that under the DTSO semantics, this transition will read an element in the buffers.
Then we delete the oldest
element in the buffer of the active process.
The transition tβ(r,l) is feasible because
by the definition of dr,l, we have
states(dr,l)(p)=states(cβ(r,l)−1)(p)
and
buffers(dr,l)(p)=labelprev(β(r,l)⋯ir+1−1).*
•
tβ(r,l)* is a nop transition:
By simulation, we execute the same transition in the DTSO-computation.
The nop transition is feasible because by the definition of dr,l,
we have
states(dr,l)(p)=states(cβ(r,l)−1)(p).*
•
tβ(r,l)* is a read-own-write read transition:
By simulation, we execute the same transition in the DTSO-computation.
Observe that
states(dr,l)(p)=states(cβ(r,l)−1)(p).
We show the read-own-write transition is feasible in the DTSO-computation.
In the TSO-computation, this read must get its value from
a write transition t1′∈Δpw that
has the corresponding update transition t2′∈Δpupdate.
According to the TSO semantics,
the write comes and goes out the buffer in FIFO order.
We have the order of these transitions in the TSO-computation:
(i) transition tβ(r,l) is between transitions t1′ and tmatch(ir+1), and
(ii) transition t2′ is between transitions tβ(r,l) and tmatch(ir+1).
Moreover, (iii) there is no other write transition of the same process and the same variable between transitions
t1′ and tβ(r,l).
In the simulation of the DTSO-computation,
when we meet the transition t1′ we put an own-message m to the buffer of the active process.
From that we do not meet any write transition to the same variable of the active process until the simulation of transition
tβ(r,l).
Moreover, the message m exists in the buffer until the simulation of transition
tβ(r,l) because the update transition t2′ is after the transition
tβ(r,l).
Therefore the message m is the newest
own-message in the buffer that can match to the read tβ(r,l).
In other words, the read transition tβ(r,l) is feasible.*
•
tβ(r,l)* is an update transition:
By simulation, we delete the oldest
own-message in the buffer of the active process in the DTSO-computation.
This transition is feasible because
by the definition of dr,l, we have
states(dr,l)(p)=states(cβ(r,l)−1)(p)
and
buffers(dr,l)(p)=labelprev(β(r,l)⋯ir+1−1).*
We have show our simulation of the transition
tβ(r,l) in the DTSO-computation is feasible.
Let dr,l+1′ be the configuration in the DTSO-computation after the simulation.
We proceed the proof of the lemma by
proving that dr,l+1′=dr,l+1.
To do this, we will show that
dr,l+1′ and dr,l+1 have the same local states, memory,
and buffer contents.
We consider inactive processes.
For an inactive process p=proc(r+1),
we have:
•
Since in the simulation, we only execute the transition of the active process,
we have
states(dr,l+1′)(p)=states(drpmax)(p). Moreover, by the definition of
drpmax, we see that states(drpmax)(p)=states(cpos(r−1,p))(p) .
Hence, it follows by the definition of dr,l+1
that states(dr,l+1′)(p)=states(dr,l+1)(p).
•
Since in the simulation, we only execute the transition of the active process, we have
buffers(dr,l+1′)(p)=buffers(drpmax)(p).
Moreover, by the definition of
drpmax,
we see that buffers(drpmax)(p)=labelprev(pos(r−1,p)+1⋯ir+1−1).
Hence, it follows by the definition of dr,l+1 that buffers(dr+1′)(p)=buffers(dr,l+1)(p).
We consider the active process p=proc(r+1) for the case that
the transition tβ(r,l) is a read-from-memory one.
From the simulation of tβ(r,l),
states(dr,l+1′)(p)=states(cβ(r,l+1)−1)(p).
We have states(dr,l+1)(p)=states(cβ(r,l+1)−1)(p)
from the definition of dr,l+1.
Furthermore, because we delete the oldest
message in the buffer of the process p after we execute the read transition, it follows by the definition of dr,l+1 that
buffers(dr,l+1′)(p)=labelprev(β(r,l+1)⋯ir+1−1)=buffers(dr,l+1)(p).
Finally, by the definitions of dr,l and dr,l+1,
we have
mem(dr,l+1′)=mem(dr,l)=mem(cir)=mem(dr,l+1).
Hence, it follows that dr,l+1′=dr,l+1.
We consider the active process p=proc(r+1) for the case that the transition tβ(r,l) is a nop one.
From the simulation of tβ(r,l),
states(dr,l+1′)(p)=states(cβ(r,l+1)−1)(p).
From the definition of dr,l+1,
states(dr,l+1)(p)=states(cβ(r,l+1)−1)(p).
From the definitions of dr,l and dr,l+1,
buffers(dr,l+1′)(p)=buffers(dr,l)(p)=labelprev(β(r,l)⋯ir+1−1)=labelprev(β(r,l+1)⋯ir+1−1)=buffers(dr,l+1)(p).
Finally, by the definitions of dr,l and dr,l+1, we have
mem(dr,l+1′)=mem(dr,l)=mem(cir)=mem(dr,l+1).
Hence, it follows that dr,l+1′=dr,l+1.
We consider the active process p=proc(r+1) for the case that the transition tβ(r,l) is a read-own-write one.
From the simulation of the transition, we have
states(dr,l+1′)(p)=states(cβ(r,l+1)−1)(p).
Then from the definition of dr,l+1,
we have
states(dr,l+1)(p)=states(cβ(r,l+1)−1)(p).
It follows from the definitions of dr,l and dr,l+1
that
buffers(dr,l+1′)(p)=buffers(dr,l)(p)=labelprev(β(r,l)⋯ir+1−1)=labelprev(β(r,l+1)⋯ir+1−1)=buffers(dr,l+1)(p).
Finally,
by the definitions of dr,l and dr,l+1, we have
mem(dr,l+1′)=mem(dr,l)=mem(cir)=mem(dr,l+1) .
Hence, it follows that dr,l+1′=dr,l+1.
We consider the active process p=proc(r+1) for the case that the transition tβ(r,l) is an update one.
From the simulation of the transition,
states(dr,l+1′)(p)=states(cβ(r,l+1)−1)(p).
We have
states(cβ(r,l+1)−1)(p)=states(dr,l+1)(p)
from the definition of dr,l+1.
Moreover, we have
buffers(dr,l+1′)(p)=labelprev(β(r,l+1)⋯ir+1−1)=buffers(dr,l+1)(p).
Futhermore, by the definitions of dr,l and dr,l+1,
we have
mem(dr,l+1′)=mem(dr,l)=mem(cir)=mem(dr,l+1) .
Hence, it follows that dr,l+1′=dr,l+1.
We are in the final phase r=m.
Observe that in this phase we do not have any write and atomic read-write transitions.
Because from the configuration dm until the end of the TSO-computation the memory has not been changed, we observe that all memory-read transitions of a process p∈P after transitions tim get their values from mem(dm).
Therefore, we can execute a sequence of propagation transitions to propagate from the memory to buffer of the process p to
full fill it by all messages that will satisfy all memory-read transitions of p after tim.
We propagate to processes according to the order ≺: first to process pmin and last to process pmax. We have the following sequence:
dm(Δpropagate)∗DTSOdmpmin⋯(Δpropagate)∗DTSOdmpmax.
Next we simulate the remaining transitions
(tpos(m−1,p)+1⋯tn)∣Δp∪{updatep}
for each process p of the TSO-computation πTSO according to the order ≺: first process
pmin and last process pmax.
•
To simulate a memory-read transition, we execute the same read transition.
And then we execute a delete transition to delete the oldest message
in the buffer of the process p.
•
To simulate a read-own-write transition, we execute the same read transition.
•
To simulate an update transition, we execute a delete transition to delete the oldest message in the buffer of the process p.
•
To simulate a nop transition, we execute the same transitions in the DTSO-computation.
Following the same argument as in Lemma 24, Lemma 25,
and Lemma 26
we show that all simulations of transitions are feasible.
As a consequence,
from the configuration dm
we reach the configuration dm+1 where for all p∈P: states(dm+1)(p)=states(cn)(p),
buffers(dm+1)(p)=ϵ,
and
mem(dm+1)=mem(cn).
Let ci=(qi,bi,memi) be DTSO-configurations
for i:1≤i≤3. Let us assume that
c1tDTSOc2
for some
t∈Δp∪{propagatepx,deletep}
and p∈P.
We will define
c4=(q4,b4,mem4)
such that
c3∗DTSOc4
and c2⊑c4.
We consider the following cases depending on t:
(1)
Nop:
t=(q1,nop,q2).
Define
q4:=q2,
b4:=b3, and
mem4:=mem2=mem3=mem1.
We have
c3tDTSOc4.
2. (2)
Write to memory:
t=(q,w(x,v),q′).
Define
q4:=q2,
b4:=b3[p↩(x,v,\textscown)⋅b3(p)],
and
mem4:=mem2.
We have
c3tDTSOc4.
3. (3)
Propagate:
t=propagatepx.
Define
q4:=q2,
mem4:=mem2=mem3=mem1, and
b4:=b3[p↩(x,v)⋅b3(p)]
where v=mem4(x).
We have
c3tDTSOc4.
4. (4)
Delete:
t=deletep.
Define
q4:=q2
and
mem4:=mem2=mem3=mem1.
Define b4
according to one of the following cases:
•
If
b1=b2[p↩b2(p)⋅(x,v)],
then define
b4:=b3.
In other words, we define c4:=c3.
•
If
b1=b2[p↩b2(p)⋅(x,v,\textscown)]
and (x,v′,\textscown)∈b2(p) for some
v′∈V,
then define
b4:=b3.
In other words, we define c4:=c3.
•
If
b1=b2[p↩b2(p)⋅(x,v,\textscown)]
and there is no v′∈V
such that (x,v′,\textscown)∈b2(p).
Since
b1(p)⊑b3(p),
we know that there is an i
and therefore a smallest i such that
b3(p)(i)=(x,v,\textscown).
Define
b4:=b3[p↩b3(p)(1)⋅b3(p)(2)⋯b3(p)(i−1)].
We can perform the following sequence of transitions
c3deletepDTSOc1′deletepDTSOc2′⋯deletepDTSOc∣b3(p)∣−i′deletepDTSOc4.
In other words, we reach the configuration c4
from c3 by first deleting
∣b3(p)∣−i messages from the head of
b3(p).
5. (5)
Read:
t=(q,r(x,v),q′).
Define
q4:=q2,
and
mem4:=mem2=mem3=mem1.
We define b4
according to one of the following cases:
•
Read-own-write:
If there is an i:1≤i≤∣b1(p)∣ such that
b1(p)(i)=(x,v,\textscown), and
there are no j:1≤j<i and v′∈V
such that b1(p)(j)=(x,v′,\textscown).
Since b1(p)⊑b3(h(p)),
there is an i′:1≤i′≤∣b3(p)∣ such that
b3(p)(i′)=(x,v,\textscown), and
there are no j:1≤j<i′ and v′∈V
such that b3(p)(j)=(x,v′,\textscown).
Define b4:=b3.
In other words,
we define c4:=c3.
•
Read from buffer:
If
(x,v′,\textscown)∈b1(p)
for all v′∈V and b1(p)=w⋅(x,v).
Let i be the largest i:1≤i≤∣b3(p)∣
such that b3(p)(i)=(x,v).
Since b1(p)⊑b3(p),
we know that such index i exists.
Define
b4:=b3[p↩b3(p)(1)⋅b3(p)(2)⋯b3(p)(i−1)].
We can perform the following sequence of transitions
c3deletepDTSOc1′deletepDTSOc2′⋯deletepDTSOc∣b3(p)∣−i′deletepDTSOc4.
In other words, we reach the configuration c4
from c3 by first deleting
∣b3(p)∣−i messages from the head of
b3(p).
6. (6)
Fence:
t=(q,fence,q′).
Define
q4:=q2,
b4:=ϵ, and
mem4:=mem2.
We can perform the following sequence of transitions
[TABLE]
In other words, we reach the configuration c4
from c3 by first emptying
the content of
b3(p)
and then performing t.
7. (7)
ARW:
t=(q,arw(x,v,v′),q′).
Define
q4:=q2,
b4:=ϵ, and
mem4:=mem2.
We can reach the configuration c4
from c3 in a similar manner to the case of the fence transition.
First we show that the ordering
w⊑w′ is a well-quasi-ordering.
It is an immediate consequence of the fact that
(i) the sub-word relation is a well-quasi-ordering on finite words
[Hig52],
and that (ii)
the number of
own-messages in the form (x,v,own)
that should be equal, is finite.
Given
two DTSO-configurations c=(q,b,mem) and c′=(q′,b′,mem′).
We define three orders ⊑state, ⊑mem,
and ⊑buffer over configurations of CDTSO:
c⊑statec′ iff q=q′,
c⊑memc′ iff mem′=mem,
and c⊑buferc′ iff b(p)⊑b′(p) for all process p∈P.
It is easy to see that each one of three orderings is a well-quasi-ordering.
It follows that
the ordering ⊑ on DTSO-configurations
based on ⊑state, ⊑mem,
and ⊑buffer is a well-quasi-ordering.
Since the number of processes, the number of local states, memory content, and the number of own-messages that should be equal are finite,
it is decidable
whether c1⊑c2.
Consider a DTSO-configuration c=(q,b,mem).
Let we recall the definition of minpre({c}):
minpre({c}):=min(PreT({c}↑)∪{c}↑).
We observe that
[TABLE]
For t∈Δ∪Δ′′,
we select min{c′∣c′tc} to be the minimal
set of all finite DTSO-configurations of the form
c′=(q′,b′,mem′)
such that one of the following properties is satisfied:
(1)
Nop:
t=(q1,nop,q2),
q(p)=q2 for some p∈P,
q′=q[p↩q1],
b′=b, and
mem′=mem.
2. (2)
Write:
t=(q1,w(x,v),q2),
q(p)=q2 for some p∈P,
b(p)=(x,v,\textscown)⋅w for some w,
mem(x)=v,
mem′(y)=mem(y) if y=x,
q′=q[p↩q1],
and
one of the following properties is satisfied:
•
b′=b[p↩w].
•
b′=b[p↩w1⋅(x,v′,own)⋅w2] for some v′∈V where w1⋅w2=w and (x,v′′,own)∈/w1 for all v′′∈V.
3. (3)
Propagate:
t=propagatepx for some p∈P,
mem(x)=v,
q′=q,
mem′=mem,
b(p)=(x,v)⋅w
for some w,
and
b′=b[p↩w].
4. (4)
Read:
t=(q1,r(x,v),q2),
q(p)=q2 for some p∈P,
q′=q[p↩q1],
and
mem′=mem, and one
of the following conditions is
satisfied:
•
Read-own-write:
there is an i:1≤i≤∣b(p)∣ such that
b(p)(i)=(x,v,own), and
there are no j:1≤j<i and v′∈V
such that b(p)(j)=(x,v′,own), and
b′=b.
•
Read from buffer:
(x,v′,\textscown)∈b(p)
for all v′∈V,
b(p)=w⋅(x,v) for some w,
and
b′=b.
•
Read from buffer:
(x,v′,own)∈b(p)
for all v′∈V,
b(p)=w⋅(x,v) for all w,
and
b′=b[p↩b(p)⋅(x,v)].
5. (5)
Fence:
t=(q1,fence,q2),
q(p)=q2 for some p∈P,
b(p)=ϵ,
q′=q[p↩q1],
b′=b, and
mem′=mem.
6. (6)
ARW:
t=(q1,arw(x,v,v′),q2),
mem(x)=v′,
mem′=mem[x↩v],
q(p)=q2 for some p∈P,
b(p)=ϵ,
q′=q[p↩q1],
b′=b.
7. (7)
Delete:
t=deletep for some p∈P,
q′=q,
mem′=mem.
Moreover,
(x,v,own)∈/b(p) for some x∈X and all v∈V,
b′=b[p↩b(p)⋅(x,v′,own)] for some v′∈V.
Let
αi=(Pi,ci)
and
ci=(qi,bi,memi)
for i:1≤i≤4.
We show that
if α1tα2 and α1⊴α3
for some
t∈Δp∪{propagatepx,deletep}
and
p∈P1 (note that P1=P2)
then the configuration α4 exists
such that α3∗α4
and α2⊴α4.
First we define P4:=P3.
Because of α1⊴α3,
there exists an
injection h:P1↦P3
in the ordering
α1⊴α3.
We define an injection
h′:P2↦P4
in
the ordering
α2⊴α4
such that
h=h′. Moreover,
for p∈P4,
let
q4(p):=q2(h′(p)) if the process p∈P2,
otherwise
q4(p):=q3(p).
We define c4 depending on different cases of t:
(1)
Nop:
t=(q1,nop,q2).
Define
b4:=b3 and
mem4:=mem2=mem3=mem1.
We have
α3tDTSOα4.
2. (2)
Write:
t=(q,w(x,v),q′).
Define
b4:=b3[h(p)↩(x,v,\textscown)⋅b3(h(p))]
and
mem4:=mem2.
We have
α3tDTSOα4.
3. (3)
Propagate:
t=propagatepx.
Define
mem4:=mem2=mem3=mem1
and
b4:=b3[h(p)↩(x,v)⋅b3(h(p))] where v=mem4(x).
We have
α3tDTSOα4.
4. (4)
Delete:
t=deletep.
Define
mem4:=mem2=mem3=mem1.
Define b4
according to one of the following cases:
•
If
b1=b2[p↩b2(p)⋅(x,v)],
then define
b4:=b3.
In other words, we have α4=α3.
•
If
b1=b2[p↩b2(p)⋅(x,v,\textscown)]
and (x,v′,\textscown)∈b2(p) for some
v′∈V,
then define
b4:=b3.
In other words, we have α4=α3.
•
If
b1=b2[p↩b2(p)⋅(x,v,\textscown)]
and there is no v′∈V
with (x,v′,\textscown)∈b2(p),
then since
b1(p)⊑b3(h(p))
we know that there is an i
and therefore a smallest i such that
b3(h(p))(i)=(x,v,\textscown).
Define
[TABLE]
We can perform the following sequence of transitions
α3deletepDTSOα1′deletepDTSOα2′⋯deletepDTSOα∣b3(h(p))∣−i′deletepDTSOα4.
In other words, we reach the configuration α4
from α3 by first deleting
∣b3(h(p))∣−i messages from the head of
b3(h(p)).
5. (5)
Read:
t=(q,r(x,v),q′).
Define
mem4:=mem2.
We define b4
according to one of the following cases:
•
Read-own-write:
If there is an i:1≤i≤∣b1(p)∣ such that
b1(p)(i)=(x,v,\textscown), and
there are no 1≤j<i and v′∈V
such that b1(p)(j)=(x,v′,\textscown).
Since b1(p)⊑b3(h(p)),
there is an i′:1≤i′≤∣b1(p)∣ such that
b1(p)(i′)=(x,v,\textscown), and
there are no 1≤j<i′ and v′∈V
such that b1(p)(j)=(x,v′,\textscown).
Define b4:=b3.
In other words, we have that α4=α3.
•
Read from buffer:
If
(x,v′,\textscown)∈b1(p)
for all v′∈V and b1=b2[p↩b2(p)⋅(x,v)], then
let i be the largest i:1≤i≤∣b3(h(p))∣
such that b3(h(p))(i)=(x,v).
Since b1(p)⊑b3(h(p)),
we know that such an i exists.
Define
[TABLE]
We can reach the configuration α4 from α3 in a similar manner to the last case
of the delete transition.
6. (6)
Fence:
t=(q,fence,q′).
Define
b4:=ϵ and
mem4:=mem2.
We can perform the following sequence of transitions
α3deletepDTSOα1′deletepDTSOα2′⋯deletepDTSOα∣b3(h(p))∣′tDTSOα4.
In other words, we can reach the configuration α4
from α3 by first emptying
the contents of
b3(h(p))
and then performing t.
7. (7)
ARW:
t=(q,arw(x,v,v′),q′).
Define
b4:=ϵ and
mem4:=mem2.
We can reach the configuration α4
from α3 in a similar manner to the case of the fence transition.
Consider a parameterized configuration α=(P,c)
with c=(q,b,mem).
We recall the definition of minpre({α}):
minpre({α}):=min(PreT({α}↑)∪{α}↑).
We observe that
[TABLE]
For t∈Δ∪Δ′′,
we select min{α′∣α′tα}
to be the minimal
set of all finite parameterized configurations of the form
α′=(P′,c′) with
c′=(q′,b′,mem′)
such that one of the following properties is satisfied:
(1)
Nop:
t=(q1,nop,q2),
q(p)=q2 for some p∈P,
P′=P,
q′=q[p↩q1],
b′=b, and
mem′=mem.
2. (2)
Write:
t=(q1,w(x,v),q2),
mem(x)=v for some v∈V,
mem′(y)=mem(y) if y=x,
and
one of the following conditions is satisfied:
•
q(p)=q2 for some p∈P,
P′=P,
q′=q[p↩q1],
b′=b[p↩w],
b(p)=(x,v,\textscown)⋅w for some w∈((X×V)∪(X×V×{\textscown}))∗.
•
q(p)=q2 for some p∈P,
P′=P,
q′=q[p↩q1],
b(p)=(x,v,\textscown)⋅w for some w∈((X×V)∪(X×V×{\textscown}))∗,
b′=b[p↩w1⋅(x,v′,own)⋅w2] for some v′∈V where w1,w2∈((X×V)∪(X×V×{\textscown}))∗, w1⋅w2=w and (x,v′′,own)∈/w1 for all v′′∈V.
In other words, (x,v′,own) is the most recent message to variable x belonging to p in the buffer b′(p). This condition corresponds to the case when we have some messages (x,v′,own) that are hidden by
the message (x,v,own) in the buffer b(p).
•
q(p)=q2 or b(p)=(x,v,\textscown)⋅w for any p∈P, w∈((X×V)∪(X×V×{\textscown}))∗,
P′=P∪{p′} for some p′∈P,
q′(p′)=q1,
q′(p′′)=q(p′′)
if p′′=p′,
b′(p′)=⟨(x1,v1,\textscown)∣ϵ⟩⟨(x2,v2,\textscown)∣ϵ⟩⋯⟨(xm,vm,\textscown)∣ϵ⟩
where xi=xj,
vi∈V, 1≤i,j≤∣X∣
and
b′(p′′)=b(p′′)
if p′′=p′.
In other words, we add one more process p′ to the configuration α′.
3. (3)
Propagate:
t=propagatepx for some p∈P,
mem(x)=v,
P′=P,
q′=q,
mem′=mem,
b(p)=(x,v)⋅w
for some w∈((X×V)∪(X×V×{\textscown}))∗,
and
b′=b[p↩w].
4. (4)
Read:
t=(q1,r(x,v),q2),
q(p)=q2 for some p∈P,
P′=P,
q′=q[p↩q1],
and
mem′=mem, and one
of the following conditions is
satisfied:
•
Read-own-write:
there is an i:1≤i≤∣b(p)∣ such that
b(p)(i)=(x,v,own), and
there are no j:1≤j<i and v′∈V
such that b(p)(j)=(x,v′,own), and
b′=b.
•
Read from buffer:
(x,v′,\textscown)∈b(p)
for all v′∈V,
b(p)=w⋅(x,v) for some w∈((X×V)∪(X×V×{\textscown}))∗,
and
b′=b.
•
Read from buffer:
(x,v′,own)∈b(p)
for all v′∈V,
b(p)=w⋅(x,v) for any w∈((X×V)∪(X×V×{\textscown}))∗,
and
b′=b[p↩b(p)⋅(x,v)].
This condition corresponds to the case when we have some messages (x,v) that are not explicitly presented at the head of the buffer b(p).
5. (5)
Fence:
t=(q1,fence,q2),
q(p)=q2 for some p∈P,
b(p)=ϵ,
P′=P,
q′=q[p↩q1],
b′=b, and
mem′=mem.
6. (6)
ARW:
t=(q1,arw(x,v,v′),q2),
mem(x)=v′,
mem′=mem[x↩v],
and one of the following conditions is satisfied:
•
q(p)=q2 for some p∈P,
b(p)=ϵ,
P′=P,
q′=q[p↩q1],
b′=b.
•
q(p)=q2 or
b(p)=ϵ for any p∈P,
P′=P∪{p′} for some p′∈P,
q′(p′)=q1,
q′(p′′)=q(p′′)
if p′′=p′,
b′(p′)=ϵ, and
b′(p′′)=b(p′′) if p′′=p′.
In other words, we add one more process p′ to the configuration α′.
7. (7)
Delete:
t=deletep for some p∈P,
P′=P,
q′=q,
mem′=mem,
(x,v,own)∈/b(p) for some x∈X and all v∈V,
b′=b[p↩b(p)⋅(x,v′,own)] for some v′∈V.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
1[AAA + 15] P. Abdulla, S. Aronis, M.F. Atig, B. Jonsson, C. Leonardsson, and K. Sagonas. Stateless model checking for TSO and PSO. In TACAS , volume 9035 of LNCS , pages 353–367. Springer, 2015.
2[AABN 16] Parosh Aziz Abdulla, Mohamed Faouzi Atig, Ahmed Bouajjani, and Tuan Phong Ngo. The benefits of duality in verifying concurrent programs under TSO. In CONCUR , volume 59 of LIP Ics , pages 5:1–5:15. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, 2016.
3[AABN 17] Parosh Aziz Abdulla, Mohamed Faouzi Atig, Ahmed Bouajjani, and Tuan Phong Ngo. Context-bounded analysis for POWER. In TACAS 2017 , pages 56–74, 2017.
4[AAC + 12a] P.A. Abdulla, M.F. Atig, Y.F. Chen, C. Leonardsson, and A. Rezine. Counter-example guided fence insertion under TSO. In TACAS 2012 , pages 204–219, 2012.
5[AAC + 12b] Parosh Aziz Abdulla, Mohamed Faouzi Atig, Yu-Fang Chen, Carl Leonardsson, and Ahmed Rezine. Automatic fence insertion in integer programs via predicate abstraction. In SAS 2012 , pages 164–180, 2012.
6[AAC + 13] P.A. Abdulla, M.F. Atig, Y.F. Chen, C. Leonardsson, and A. Rezine. Memorax, a precise and sound tool for automatic fence insertion under TSO. In TACAS , pages 530–536, 2013.
7[AAJL 16] Parosh Aziz Abdulla, Mohamed Faouzi Atig, Bengt Jonsson, and Carl Leonardsson. Stateless model checking for POWER. In CAV , volume 9780 of LNCS , pages 134–156. Springer, 2016.
8[AALN 15] Parosh Aziz Abdulla, Mohamed Faouzi Atig, Magnus Lång, and Tuan Phong Ngo. Precise and sound automatic fence insertion procedure under PSO. In NETYS 2015 , pages 32–47, 2015.