Dependence-Aware, Unbounded Sound Predictive Race Detection

Kaan Gen\c{c}; Jake Roemer; Yufan Xu; Michael D. Bond (Ohio State; University)

arXiv:1904.13088·cs.PL·June 29, 2021

Dependence-Aware, Unbounded Sound Predictive Race Detection

Kaan Gen\c{c}, Jake Roemer, Yufan Xu, Michael D. Bond (Ohio State, University)

PDF

1 Repo

TL;DR

This paper presents two novel, sound predictive race detection methods that incorporate data and control dependence, enabling analysis of full program executions and detecting more data races in large Java programs.

Contribution

It introduces two new approaches that improve sound predictive race detection by handling full executions and precisely modeling dependencies, surpassing prior methods.

Findings

01

Detect more data races in large Java programs.

02

Handle full program executions effectively.

03

Incorporate data and control dependence precisely.

Abstract

Data races are a real problem for parallel software, yet hard to detect. Sound predictive analysis observes a program execution and detects data races that exist in some other, unobserved execution. However, existing predictive analyses miss races because they do not scale to full program executions or do not precisely incorporate data and control dependence. This paper introduces two novel, sound predictive approaches that incorporate data and control dependence and handle full program executions. An evaluation using real, large Java programs shows that these approaches detect more data races than the closest related approaches, thus advancing the state of the art in sound predictive race detection.

Tables15

Table 1. Table 1. Definitions of three strict partial orders over events in an execution trace. Each order is the minimum relation satisfying the listed properties.

Property	$≺_{HB}$	$≺_{WCP}$	$≺_{DC}$
Same-lock critical section ordering	All	Confl.	Confl.
Orders rel to…	acq	wr/rd	wr/rd
Includes $≺_{PO}$ ?	Yes	No	Yes
Left-and-right composes with $≺_{HB}$ ?	Yes	Yes	No
$acq(m) ≺ rel(m)$ implies $rel(m) ≺ rel(m)$ ?	Yes	Yes	Yes
Transitive?	Yes	Yes	Yes

Table 2. Table 2. Definitions of five strict partial orders over events in an execution trace. Each order is the minimum relation satisfying the listed properties. This table adds columns ≺ SDP subscript precedes SDP \prec_{\textsc{\tiny{SDP}}} and ≺ WDP subscript precedes WDP \prec_{\textsc{\tiny{WDP}}} to Table 1 (page 1 ). * As the text explains, SDP adds release–access ordering for write–read and read–write conflicts, and adds ordering from the release to the next read for write–write conflicts. ** As the text explains, SDP adds release–release ordering for two critical sections on the same lock if the critical sections are HB ordered by a write–write conflict but do not both contain the conflicting writes.

	Prior work			This paper
Property	$≺_{HB}$	$≺_{WCP}$	$≺_{DC}$	$≺_{SDP}$	$≺_{WDP}$
Same-lock critical section ordering	All	Confl.	Confl.	Confl.	Last wr–rd only
Orders rel to…	acq	wr/rd	wr/rd	wr/rd or next rd*	Next br
Includes $≺_{PO}$ ?	Yes	No	Yes	No	Yes
Left-and-right composes with $≺_{HB}$ ?	Yes	Yes	No	Yes	No
$acq(m) ≺_{} rel(m)$ implies $rel(m) ≺_{} rel(m)$ ?	Yes	Yes	Yes	Yes**	Yes
Transitive?	Yes	Yes	Yes	Yes	Yes

Table 3. ((a)) Weaker above stronger relations

	Sound?	Complete?
HB	Yes (Lamport, 1978)	No (Fig. 2(b), 3(a), 4(a), 4(c))
WCP	Yes (Kini et al., 2017)	No (Fig. 2(b), 3(a), 4(a), 4(c))
DC	No (Roemer et al., 2018)	No (Fig. 2(b), 3(a), 4(a), 4(c))
SDP	Yes (Section 4.2)	No (Fig. 2(b), 4(a), 4(c))
WDP	No (Fig. 2(a) and 3(c))	Yes (Section 4.2)

Table 4. Table 5. Dynamic characteristics of the analyzed programs. Event counts (shown in millions) and percentages are collected from WDP analysis; other analyses do not analyze branch events.

			Total	Analyzed events
	#Thr		events	All (	acq/rel	wr	rd	br )
avrora	7	(7)	2,400 M	260 M (	1.2%	17.7%	42.2%	38.4% )
batik	7	(7)	490 M	17 M (	0.6%	26.3%	38.4%	34.0% )
h2	34	(33)	9,368 M	768 M (	0.5%	17.1%	43.1%	39.1% )
luindex	3	(3)	910 M	72 M (	0.6%	20.1%	42.4%	36.9% )
lusearch	34	(34)	2,746 M	301 M (	0.9%	19.5%	43.6%	35.6% )
pmd	33	(33)	403 M	41 M (	$<$ 0.1%	28.5%	37.3%	34.2% )
sunflow	65	(33)	14,452 M	887 M (	$<$ 0.1%	44.7%	41.4%	13.8% )
tomcat	106	(67)	113 M	29 M (	2.8%	18.7%	42.1%	36.1% )
xalan	33	(33)	1,306 M	436 M (	2.1%	12.0%	48.8%	37.1% )

Table 5. ((a)) HB, WCP, and SDP analyses on the same executions.

Program	HB-races		WCP-races		SDP-races
avrora	5	(202 K)	5	(203 K)	5	(203 K)
batik	0	(0)	0	(0)	0	(0)
h2	12	(53 K)	12	(53 K)	12	(54 K)
luindex	1	(1)	1	(1)	1	(1)
lusearch	0	(0)	0	(0)	0	(0)
pmd	8	(436)	8	(443)	10	(651)
sunflow	2	(20)	2	(26)	2	(26)
tomcat	98	(36 K)	99	(36 K)	103	(39 K)
xalan	7	(208)	15	(531 K)	37	(2072 K)

Table 6. ((a)) HB, WCP, and SDP analyses on the same executions.

Program	HB-races		WCP-races		SDP-races
avrora	5	(202 K)	5	(203 K)	5	(203 K)
batik	0	(0)	0	(0)	0	(0)
h2	12	(53 K)	12	(53 K)	12	(54 K)
luindex	1	(1)	1	(1)	1	(1)
lusearch	0	(0)	0	(0)	0	(0)
pmd	8	(436)	8	(443)	10	(651)
sunflow	2	(20)	2	(26)	2	(26)
tomcat	98	(36 K)	99	(36 K)	103	(39 K)
xalan	7	(208)	15	(531 K)	37	(2072 K)

Table 7. ((b)) DC and WDP analyses on the same executions.

Program	DC-races		WDP-races
avrora	5	(203 K)	5	(406 K)
batik	0	(0)	0	(0)
h2	11	(54 K)	12	(63 K)
luindex	1	(1)	1	(1)
lusearch	0	(0)	1	(30)
pmd	9	(2 K)	10	(3 K)
sunflow	2	(49)	2	(100)
tomcat	94	(36 K)	284	(125 K)
xalan	17	(649 K)	170	(15 M)

Table 8. Table 7. Static and dynamic (in parentheses) race counts from an experiment running SDP and WDP analyses together and vindicating dynamic instances of static WDP-only races. The WDP-only → → \rightarrow Verified column reports static WDP-only races, followed by how many static WDP-only races were verified as predictable races by VindicateWDPRace .

Program	SDP-races		WDP-races		WDP-only $\to$	Verified
avrora	5	(202 K)	5	(407 K)	0 $\to$
batik	0	(0)	0	(0)	0 $\to$
h2	12	(53 K)	12	(63 K)	1 $\to$	0
luindex	1	(1)	1	(1)	0 $\to$
lusearch	0	(0)	1	(30)	1 $\to$	0
pmd	9	(470)	11	(3 K)	1 $\to$	1
sunflow	2	(38)	2	(100)	0 $\to$
tomcat	99	(38 K)	321	(126 K)	222 $\to$	53
xalan	33	(1.8 M)	170	(15.2 M)	137 $\to$	135

Table 9. Table 9. Slowdowns of program instrumentation and various analyses over uninstrumented execution, and the average time taken to vindicate WDP-only races.

		Instr. only		Analyses w/o constraint graph				SDP+WDP+graph
	Base	w/o br	w/ br	WCP	SDP	SDP+DC	SDP+WDP	Slowdown	Failed	Verified
avrora	7.7 s	2.1 $\times$	2.6 $\times$	17 $\times$	20 $\times$	28 $\times$	34 $\times$	42 $\times$	-	-
batik	3.3 s	3.9 $\times$	5.5 $\times$	19 $\times$	20 $\times$	24 $\times$	33 $\times$	32 $\times$	-	-
h2	9.4 s	6.6 $\times$	9.3 $\times$	129 $\times$	139 $\times$	202 $\times$	212 $\times$	280 $\times$	233 s	-
luindex	1.4 s	5.9 $\times$	11 $\times$	91 $\times$	93 $\times$	118 $\times$	139 $\times$	154 $\times$	-	-
lusearch	3.8 s	4.4 $\times$	5.2 $\times$	20 $\times$	23 $\times$	29 $\times$	32 $\times$	42 $\times$	¡ 0.1 s	-
pmd	2.6 s	7.6 $\times$	10.0 $\times$	27 $\times$	29 $\times$	32 $\times$	36 $\times$	36 $\times$	-	8.6 s
sunflow	2.3 s	11 $\times$	13 $\times$	142 $\times$	196 $\times$	210 $\times$	221 $\times$	264 $\times$	-	-
tomcat	1.6 s	5.9 $\times$	6.2 $\times$	30 $\times$	37 $\times$	55 $\times$	59 $\times$	56 $\times$	2.0 s	55 s
xalan	5.3 s	2.6 $\times$	3.3 $\times$	40 $\times$	46 $\times$	63 $\times$	86 $\times$	124 $\times$	30 s	0.1 s

Table 10. Table 10. Static and dynamic (in parentheses) race counts from HB, WCP, and SDP analyses running together. A “ ± plus-or-minus \pm\, 0” confidence interval is the result of rounding to the nearest integer and implies that the interval is smaller than ± plus-or-minus \pm\, 0.5.

Program	HB-races				WCP-races				SDP-races
avrora	5 $\pm$	0	(202 K $\pm$	1 K)	5 $\pm$	0	(203 K $\pm$	1 K)	5 $\pm$	0	(203 K $\pm$	1 K)
batik	0 $\pm$	0	(0 $\pm$	0)	0 $\pm$	0	(0 $\pm$	0)	0 $\pm$	0	(0 $\pm$	0)
h2	12 $\pm$	1	(53 K $\pm$	344)	12 $\pm$	1	(53 K $\pm$	340)	12 $\pm$	1	(54 K $\pm$	367)
luindex	1 $\pm$	0	(1 $\pm$	0)	1 $\pm$	0	(1 $\pm$	0)	1 $\pm$	0	(1 $\pm$	0)
lusearch	0 $\pm$	0	(0 $\pm$	0)	0 $\pm$	0	(0 $\pm$	0)	0 $\pm$	0	(0 $\pm$	0)
pmd	8 $\pm$	1	(436 $\pm$	286)	8 $\pm$	1	(443 $\pm$	283)	10 $\pm$	1	(651 $\pm$	283)
sunflow	2 $\pm$	0	(20 $\pm$	1)	2 $\pm$	0	(26 $\pm$	17)	2 $\pm$	0	(26 $\pm$	17)
tomcat	98 $\pm$	2	(36 K $\pm$	159)	99 $\pm$	2	(36 K $\pm$	164)	103 $\pm$	2	(39 K $\pm$	163)
xalan	7 $\pm$	1	(208 $\pm$	15)	15 $\pm$	1	(531 K $\pm$	920)	37 $\pm$	1	(2 M $\pm$	4 K)

Table 11. Table 11. Static and dynamic (in parentheses) race counts from DC and WDP analyses running together. A “ ± plus-or-minus \pm\, 0” confidence interval is the result of rounding to the nearest integer and implies that the interval is smaller than ± plus-or-minus \pm\, 0.5.

Program	DC-races				WDP-races
avrora	5 $\pm$	0	(203 K $\pm$	673)	5 $\pm$	0	(406 K $\pm$	806)
batik	0 $\pm$	0	(0 $\pm$	0)	0 $\pm$	0	(0 $\pm$	0)
h2	11 $\pm$	13	(54 K $\pm$	1 K)	12 $\pm$	13	(63 K $\pm$	781)
luindex	1 $\pm$	0	(1 $\pm$	0)	1 $\pm$	0	(1 $\pm$	0)
lusearch	0 $\pm$	0	(0 $\pm$	0)	1 $\pm$	0	(30 $\pm$	0)
pmd	9 $\pm$	1	(2 K $\pm$	191)	10 $\pm$	1	(3 K $\pm$	332)
sunflow	2 $\pm$	0	(49 $\pm$	3)	2 $\pm$	0	(100 $\pm$	3)
tomcat	94 $\pm$	4	(36 K $\pm$	593)	284 $\pm$	72	(125 K $\pm$	5 K)
xalan	17 $\pm$	1	(649 K $\pm$	940)	170 $\pm$	1	(15 M $\pm$	37 K)

Table 12. Table 12. Static and dynamic (in parentheses) race counts. The WDP-only → → \rightarrow Verified column reports static WDP-only races, followed by how many static WDP-only races were verified as predictable races by VindicateWDPRace . A “ ± plus-or-minus \pm\, 0” confidence interval is the result of rounding to the nearest integer and implies that the interval is smaller than ± plus-or-minus \pm\, 0.5.

Program	SDP-races				WDP-races				WDP-only		Verified
avrora	5 $\pm$	0	(202 K $\pm$	431)	5 $\pm$	0	(407 K $\pm$	354)	0 $\pm$
batik	0 $\pm$	0	(0 $\pm$	0)	0 $\pm$	0	(0 $\pm$	0)	0 $\pm$
h2	12 $\pm$	2	(53 K $\pm$	814)	12 $\pm$	1	(63 K $\pm$	1 K)	1 $\pm$	0 $\to$	0 $\pm$	0
luindex	1 $\pm$	0	(1 $\pm$	0)	1 $\pm$	0	(1 $\pm$	0)	0 $\pm$
lusearch	0 $\pm$	0	(0 $\pm$	0)	1 $\pm$	0	(30 $\pm$	0)	1 $\pm$	0 $\to$	0 $\pm$	0
pmd	9 $\pm$	1	(470 $\pm$	152)	11 $\pm$	1	(3 K $\pm$	806)	1 $\pm$	1 $\to$	1 $\pm$	1
sunflow	2 $\pm$	0	(38 $\pm$	21)	2 $\pm$	0	(100 $\pm$	3)	0 $\pm$
tomcat	99 $\pm$	3	(38 K $\pm$	563)	321 $\pm$	74	(126 K $\pm$	10 K)	222 $\pm$	72 $\to$	53 $\pm$	10
xalan	33 $\pm$	2	(2 M $\pm$	4 K)	170 $\pm$	1	(15 M $\pm$	80 K)	137 $\pm$	1 $\to$	135 $\pm$	1

Table 13. Table 13. Slowdowns of program instrumentation over uninstrumented execution.

Program	Base		w/o br		w/ br
avrora	7.7 s $\pm$	1.0 s	2.1 $\times$ $\pm$	0.1 $\times$	2.6 $\times$ $\pm$	0.1 $\times$
batik	3.3 s $\pm$	0.0 s	3.9 $\times$ $\pm$	0.0 $\times$	5.5 $\times$ $\pm$	0.0 $\times$
h2	9.4 s $\pm$	0.3 s	6.6 $\times$ $\pm$	0.4 $\times$	9.3 $\times$ $\pm$	0.3 $\times$
luindex	1.4 s $\pm$	0.1 s	5.9 $\times$ $\pm$	0.0 $\times$	11 $\times$ $\pm$	0.2 $\times$
lusearch	3.8 s $\pm$	0.8 s	4.4 $\times$ $\pm$	0.8 $\times$	5.2 $\times$ $\pm$	0.6 $\times$
pmd	2.6 s $\pm$	0.2 s	7.6 $\times$ $\pm$	0.4 $\times$	10.0 $\times$ $\pm$	0.7 $\times$
sunflow	2.3 s $\pm$	0.3 s	11 $\times$ $\pm$	0.9 $\times$	13 $\times$ $\pm$	0.5 $\times$
tomcat	1.6 s $\pm$	0.1 s	5.9 $\times$ $\pm$	0.4 $\times$	6.2 $\times$ $\pm$	0.2 $\times$
xalan	5.3 s $\pm$	1.3 s	2.6 $\times$ $\pm$	0.3 $\times$	3.3 $\times$ $\pm$	0.2 $\times$

Table 14. Table 14. Slowdowns of various analyses over uninstrumented execution.

Program	WCP		SDP		SDP+DC		SDP+WDP
avrora	17 $\times$ $\pm$	0.6 $\times$	20 $\times$ $\pm$	1.3 $\times$	28 $\times$ $\pm$	1.3 $\times$	34 $\times$ $\pm$	2.1 $\times$
batik	19 $\times$ $\pm$	1.2 $\times$	20 $\times$ $\pm$	2.0 $\times$	24 $\times$ $\pm$	2.4 $\times$	33 $\times$ $\pm$	5.4 $\times$
h2	129 $\times$ $\pm$	13 $\times$	139 $\times$ $\pm$	21 $\times$	202 $\times$ $\pm$	20 $\times$	212 $\times$ $\pm$	22 $\times$
luindex	91 $\times$ $\pm$	11 $\times$	93 $\times$ $\pm$	12 $\times$	118 $\times$ $\pm$	20 $\times$	139 $\times$ $\pm$	32 $\times$
lusearch	20 $\times$ $\pm$	2.9 $\times$	23 $\times$ $\pm$	3.6 $\times$	29 $\times$ $\pm$	3.7 $\times$	32 $\times$ $\pm$	4.1 $\times$
pmd	27 $\times$ $\pm$	3.0 $\times$	29 $\times$ $\pm$	4.7 $\times$	32 $\times$ $\pm$	3.2 $\times$	36 $\times$ $\pm$	5.8 $\times$
sunflow	142 $\times$ $\pm$	9.8 $\times$	196 $\times$ $\pm$	31 $\times$	210 $\times$ $\pm$	34 $\times$	221 $\times$ $\pm$	29 $\times$
tomcat	30 $\times$ $\pm$	4.3 $\times$	37 $\times$ $\pm$	4.4 $\times$	55 $\times$ $\pm$	11 $\times$	59 $\times$ $\pm$	7.5 $\times$
xalan	40 $\times$ $\pm$	7.1 $\times$	46 $\times$ $\pm$	6.9 $\times$	63 $\times$ $\pm$	7.8 $\times$	86 $\times$ $\pm$	9.0 $\times$

Table 15. Table 15. Slowdowns of WDP analysis without and with graph generation, and the average time taken to vindicate WDP-only races.

			SDP+WDP+graph
Program	SDP+WDP		Slowdown		Failed		Verified
avrora	34 $\times$ $\pm$	2.1 $\times$	42 $\times$ $\pm$	7.0 $\times$	$\pm$		$\pm$
batik	33 $\times$ $\pm$	5.4 $\times$	32 $\times$ $\pm$	5.5 $\times$	$\pm$		$\pm$
h2	212 $\times$ $\pm$	22 $\times$	280 $\times$ $\pm$	51 $\times$	233 s $\pm$	185 s	$\pm$
luindex	139 $\times$ $\pm$	32 $\times$	154 $\times$ $\pm$	12 $\times$	$\pm$		$\pm$
lusearch	32 $\times$ $\pm$	4.1 $\times$	42 $\times$ $\pm$	3.8 $\times$	¡ 0.1 s $\pm$	¡ 0.1 s	$\pm$
pmd	36 $\times$ $\pm$	5.8 $\times$	36 $\times$ $\pm$	1.3 $\times$	$\pm$		8.6 s $\pm$	20 s
sunflow	221 $\times$ $\pm$	29 $\times$	264 $\times$ $\pm$	10 $\times$	$\pm$		$\pm$
tomcat	59 $\times$ $\pm$	7.5 $\times$	56 $\times$ $\pm$	3.1 $\times$	2.0 s $\pm$	¡ 0.1 s	55 s $\pm$	26 s
xalan	86 $\times$ $\pm$	9.0 $\times$	124 $\times$ $\pm$	6.1 $\times$	30 s $\pm$	9.4 s	0.1 s $\pm$	¡ 0.1 s

Equations6

d (e, e^{'}) = min ⎩ ⎨ ⎧ 0 1 + d (A (e), e^{'}) 0 \lx@intercol 1 + min_{e^{''}} (d (e, e^{''}) + d (e^{''}, e^{'})) \hfil \lx@intercol \infty if e ≺_{\textsc W D P} e^{'} by WDP conflicting critical section ordering if e ≺_{\textsc W D P} e^{'} by “ acq(m) ≺_{\textsc W D P} rel(m) ⟹ rel(m) ≺_{\textsc W D P} rel(m) ” if e ≺_{\textsc W D P} e^{'} by PO if \exists e^{''} : e ≺_{\textsc W D P} e^{''} \land e^{''} ≺_{\textsc W D P} e^{'} by WDP transitivity otherwise

d (e, e^{'}) = min ⎩ ⎨ ⎧ 0 1 + d (A (e), e^{'}) 0 \lx@intercol 1 + min_{e^{''}} (d (e, e^{''}) + d (e^{''}, e^{'})) \hfil \lx@intercol \infty if e ≺_{\textsc W D P} e^{'} by WDP conflicting critical section ordering if e ≺_{\textsc W D P} e^{'} by “ acq(m) ≺_{\textsc W D P} rel(m) ⟹ rel(m) ≺_{\textsc W D P} rel(m) ” if e ≺_{\textsc W D P} e^{'} by PO if \exists e^{''} : e ≺_{\textsc W D P} e^{''} \land e^{''} ≺_{\textsc W D P} e^{'} by WDP transitivity otherwise

e ⇝_{G} e^{'} \equiv (e, e^{'}) \in G \lor \exists e^{''} : e ⇝_{G} e^{''} \land e^{''} ⇝_{G} e^{'}

e ⇝_{G} e^{'} \equiv (e, e^{'}) \in G \lor \exists e^{''} : e ⇝_{G} e^{''} \land e^{''} ⇝_{G} e^{'}

\forall e,e^{\prime}\in\mathit{tr}\;\big{(}e\prec_{\textsc{\tiny{WDP}}}e^{\prime}\iff e\leadsto_{G}e^{\prime}\big{)}

\forall e,e^{\prime}\in\mathit{tr}\;\big{(}e\prec_{\textsc{\tiny{WDP}}}e^{\prime}\iff e\leadsto_{G}e^{\prime}\big{)}

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

PLaSSticity/SDP-WDP-implementation
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

1

Dependence-Aware, Unbounded Sound Predictive Race Detection

\minibox[frame]This extended arXiv version of an OOPSLA 2019 paper adds Appendices A–C and

corrects a technical issue (see corrigendum on the ACM DL)

Kaan Genç

Ohio State UniversityUSA

[email protected]

,

Jake Roemer

Ohio State UniversityUSA

[email protected]

,

Yufan Xu

Ohio State UniversityUSA

[email protected]

and

Michael D. Bond

Ohio State UniversityUSA

[email protected]

(2019)

Dependence-Aware, Unbounded Sound Predictive Race Detection

\minibox[frame]This extended arXiv version of an OOPSLA 2019 paper adds Appendices A–C and

corrects a technical issue (see corrigendum on the ACM DL)

Kaan Genç

Ohio State UniversityUSA

[email protected]

,

Jake Roemer

Ohio State UniversityUSA

[email protected]

,

Yufan Xu

Ohio State UniversityUSA

[email protected]

and

Michael D. Bond

Ohio State UniversityUSA

[email protected]

(2019)

Abstract.

Data races are a real problem for parallel software, yet hard to detect. Sound predictive analysis observes a program execution and detects data races that exist in some other, unobserved execution. However, existing predictive analyses miss races because they do not scale to full program executions or do not precisely incorporate data and control dependence.

This paper introduces two novel, sound predictive approaches that incorporate data and control dependence and handle full program executions. An evaluation using real, large Java programs shows that these approaches detect more data races than the closest related approaches, thus advancing the state of the art in sound predictive race detection.

data race detection, dynamic predictive analysis

††copyright: rightsretained††doi: 10.1145/3360605††journalyear: 2019††journal: PACMPL††journalvolume: 3††journalnumber: OOPSLA††article: 179††publicationmonth: 10††ccs: Software and its engineering Dynamic analysis††ccs: Software and its engineering Software testing and debugging

1. Introduction

With the rise in parallel software, data races represent a growing hazard. Programs with data races written in shared-memory languages including Java and C++ have weak or undefined semantics, as a result of assuming data race freedom for performance reasons (Manson et al., 2005; Boehm and Adve, 2008; Adve and Boehm, 2010). Data races are culprits in real software failures, resulting in substantial financial losses and even harm to humans (Boehm, 2011; Kasikci et al., 2015; Lu et al., 2008; Kasikci et al., 2012; Narayanasamy et al., 2007; Cao et al., 2016; Flanagan and Freund, 2010a; Sen, 2008; Burnim et al., 2011; Zhivich and Cunningham, 2009; U.S.–Canada Power System Outage Task Force, 2004; Leveson and Turner, 1993; PCWorld, 2012).

Writing scalable, data-race-free code is challenging, as is detecting data races, which occur nondeterministically depending on shared-memory interleavings and program inputs and environments. The most common approach for dealing with data races is to detect them during in-house testing using dynamic happens-before (HB) analysis (Flanagan and Freund, 2009; Pozniansky and Schuster, 2007; Elmas et al., 2007; Serebryany and Iskhodzhanov, 2009; Serebryany et al., 2012; Intel Corporation, 2016), which detects conflicting accesses (two memory accesses, at least one of which is a write, to the same variable by different threads) unordered by the HB partial order (Lamport, 1978). However, HB analysis misses data races when accesses could race in some other execution but are ordered by critical sections on the same lock in the observed execution.

A promising alternative to HB analysis is sound predictive analysis, which detects additional predictable data races from an observed execution (Huang et al., 2014; Said et al., 2011; Huang and Rajagopalan, 2016; Chen et al., 2008; Şerbănuţă et al., 2013; Liu et al., 2016; Smaragdakis et al., 2012; Kini et al., 2017; Roemer et al., 2018; Pavlogiannis, 2019); an analysis is sound if it detects no false races (Section 2). Some predictive analyses rely on generating and solving SMT constraints, so in practice they cannot scale to full program executions and instead analyze bounded windows of execution, missing races between accesses that do not execute close together (Huang et al., 2014; Said et al., 2011; Huang and Rajagopalan, 2016; Chen et al., 2008; Şerbănuţă et al., 2013; Liu et al., 2016) (Section 8). In contrast, unbounded predictive analyses avoid this limitation by detecting races based on computing a partial order weaker than HB, using analyses with linear running time in the length of the trace (Kini et al., 2017; Roemer et al., 2018). However, these partial-order-based analyses miss predictable races because they do not incorporate precise notions of data and control dependence. More precisely, existing predictive partial orders do not encode the precise conditions for reordering memory accesses to expose a race: The reordering can change the last writer of a memory read (data dependence) if the read in turn cannot affect whether the racing accesses execute (control dependence). Encoding data and control dependence precisely in a partial order is fundamentally challenging (Section 2).

Contributions.

This paper designs and evaluates new predictive analyses, making the following contributions:

•

A partial order called strong-dependently-precedes (SDP) that improves over the highest-coverage sound partial order from prior work (Kini et al., 2017) by incorporating data dependence more precisely (Section 4).

•

A proof that SDP is sound, i.e., detects no false races (Section 4.2).

•

A partial order called weak-dependently-precedes (WDP) that improves over the previous highest-coverage partial order from prior work (Roemer et al., 2018) by incorporating data and control dependence precisely (Section 4).

•

A proof that WDP is complete (sometimes called maximal (Huang et al., 2014; Şerbănuţă et al., 2013)), detecting all races knowable from an observed execution (Section 4.2).

•

Dynamic analyses that compute SDP and WDP and detect SDP- and WDP-races (Section 5).

•

An algorithm for filtering out WDP-races that are false races, by extending prior work’s vindication algorithm (Roemer et al., 2018), yielding an overall sound approach (Section 6).

•

An implementation and evaluation of SDP and WDP analyses and WDP-race vindication on benchmarked versions of real, large Java programs (Section 7). The evaluation shows that the analyses find predictable races missed by the closest related approaches (Kini et al., 2017; Roemer et al., 2018; Huang et al., 2014).

2. Background and Motivation

Recent partial-order-based predictive analyses can scale to full program executions, enabling detection of predictable races that are millions of executed operations apart (Kini et al., 2017; Roemer et al., 2018). However, these partial orders are fundamentally limited and miss predictable races, as this section explains. First, we introduce formalisms used throughout the paper.

2.1. Execution Model

An execution trace $\mathit{tr}$ is a sequence of events, ordered by the total order $<_{\textsc{$ \mathit{tr} $}}$ , that represents a multithreaded execution without loss of generality, corresponding to a linearization of a sequentially consistent (SC) execution.111Although programs with data races may violate SC (Manson et al., 2005; Boehm and Adve, 2008; Adve and Boehm, 2010; Dolan et al., 2018), dynamic race detection analyses (including ours) add synchronization instrumentation before accesses, generally ensuring SC. We assume every event in $\mathit{tr}$ is a unique object (e.g., has a unique identifier), making it possible to identify the same event $e$ across other, predicted traces. Each event has two attributes: (1) an identifier for the thread that executed the operation; and (2) an operation, which is one of wr(x), rd(x), acq(m), rel(m), or br, where x is a program memory location and m is a program lock. (Later we consider how to extend analyses to handle lock-free accesses and Java volatile / C++ atomic accesses.) An execution trace must be well formed: a thread may only acquire an unheld lock and may only release a lock it has acquired.

Each br (branch) event $b$ represents an executed conditional operation—such as a conditional jump, polymorphic call, or array element access—that may be dependent on some prior read event(s) by the same thread. We assume a helper function $\mathit{brDepsOn}(b,r)$ exists that returns true if the value read by read event $r$ may affect $b$ ’s outcome. An implementation could use static dependence analysis to identify reads on which a branch is data dependent. For simplicity, the paper’s examples assume $\mathit{brDepsOn}(b,r)$ always returns true, i.e., every branch is assumed dependent on preceding reads by the same thread. Our implementation and evaluation make the same assumption, as explained later. This assumption limits the capability of predictive analysis to predict different executions; in other words, it limits the number of knowable data races from a single execution.

Three example traces are shown in Figures 1(b), 1(c), and 1(e), in which top-to-bottom order represents trace order, and column placement denotes an event’s executing thread. We discuss these examples in detail later.

Two read or write events to the same variable are conflicting, notated $e\asymp e^{\prime}$ , if the events are executed by different threads and at least one is a write.

Program-order (PO) is a partial order that orders events in the same thread: $e\prec_{\textsc{\tiny{PO}}}e^{\prime}$ if $e<_{\textsc{$ \mathit{tr} $}}e^{\prime}$ and the events are executed by the same thread.

The function $\mathit{CS(}e)$ returns the set of events in the critical section started or ended by acquire or release event $e$ , including the bounding acquire and release events. $\mathit{R}(a)$ returns the release event ending the critical section started by acquire event $a$ , and $\mathit{A}(r)$ returns the acquire event starting the critical section ended by release event $r$ . The function $\mathit{lockset}(e)$ returns the set of locks held at a read or write event $e$ by its executing thread.

2.2. Predictable Traces and Predictable Races

By observing one execution of a program, it is possible to predict data races in both the observed execution and some other executions of the program. The information present in the observed execution implies the existence of other, different executions, called predictable traces. To define what traces can be predicted from an observed trace, we first define several relevant concepts.

Definition 2.1 (Last writer).

Given a trace $\mathit{tr}$ , let $\mathit{lastwr}_{\mathit{tr}}(r)$ for a read event $r$ be the last write event before $r$ in $\mathit{tr}$ that accesses the same variable as $r$ , or $\varnothing$ if no such event exists.

To ensure that a predictable trace is feasible, each read in a predictable trace must have the same last writer as in the observed trace—with one exception: a read can have a different last writer if the read cannot take the execution down a different control-flow path than the observed execution. An example of such a read is Thread 1’s rd(z) event in Figure 1(c). Next, we introduce a concept that helps in identifying reads whose last writer must be preserved in a predictable trace.

Definition 2.2 (Causal events).

Given a trace $\mathit{tr}$ , set of events $\mathit{S}$ , and event $e$ , let $\mathit{causal(\mathit{tr},\mathit{S},e)}$ be a function that returns true if at least one of the following properties holds, and false otherwise.

•

$e$ is a read, and there exists a branch event $b$ such that $b\in\mathit{S}\land e\prec_{\textsc{\tiny{PO}}}b\land\mathit{brDepsOn}(b,e)$ .

•

$e$ is a write, and there exists a read event $e^{\prime}$ such that $e^{\prime}\in\mathit{S}\land e=\mathit{lastwr}_{\mathit{tr}}(e^{\prime})$ .

•

$e$ is a read, and there exists a write event $e^{\prime}$ such that $e^{\prime}\in\mathit{S}\land e\prec_{\textsc{\tiny{PO}}}e^{\prime}$ ( $e$ and $e^{\prime}$ may access different variables).

Intuitively, $\mathit{causal(\mathit{tr},\mathit{S},e)}$ tells us whether an event $e$ could have affected some event $e^{\prime}$ in $\mathit{S}$ directly. For example, $\mathit{causal(\mathit{tr},\mathit{S},e)}$ if read $e$ may affect a branch event in $\mathit{S}$ ; if $e$ writes a variable later read by an event in $\mathit{S}$ ; or if $e$ reads a value that may affect a later write by the same thread in $\mathit{S}$ (even if the read and write are to different variables, to account for intra-thread data flow).

We can now define a predictable trace of an observed trace, which is a trace that is definitely a feasible execution of the program, given the existence of the observed execution.

Definition 2.3 (Predictable trace).

An execution trace $\mathit{tr^{\prime}}$ is a predictable trace of trace $\mathit{tr}$ if $\mathit{tr^{\prime}}$ contains only events in $\mathit{tr}$ (i.e., $\forall e:e\in\mathit{tr^{\prime}}\implies e\in\mathit{tr}$ ) and all of the following rules hold:

*Program order (PO) rule: * For any events $e_{1}$ and $e_{2}$ , if $e_{1}\prec_{\textsc{\tiny{PO}}}e_{2}$ , then $e_{1}<_{\textsc{$ \mathit{tr^{\prime}} $}}e_{2}\lor e_{2}\notin\mathit{tr^{\prime}}$ .

*Last writer (LW) rule: * For every read event $e$ such that $\mathit{causal(\mathit{tr},\mathit{tr^{\prime}},e)}$ , $\mathit{lastwr}_{\mathit{tr^{\prime}}}(e)=\mathit{lastwr}_{\mathit{tr}}(e)$ . (In this context, $\mathit{tr^{\prime}}$ means the set of events in the trace $\mathit{tr^{\prime}}$ .)

*Lock semantics (LS) rule: * For acquire events $e_{1}$ and $e_{2}$ on the same lock, if $e_{1}<_{\textsc{$ \mathit{tr^{\prime}} $}}e_{2}$ then $e_{1}<_{\textsc{$ \mathit{tr^{\prime}} $}}\mathit{R}(e_{1})<_{\textsc{$ \mathit{tr^{\prime}} $}}e_{2}$ .

The PO and LW rules ensure key properties from $\mathit{tr}$ also hold in $\mathit{tr^{\prime}}$ , while the LS rule ensures that $\mathit{tr^{\prime}}$ is well formed. The intuition behind the LW rule is that any read that may (directly or indirectly) affect the control flow of the program must have the same last writer in predictable trace $\mathit{tr^{\prime}}$ as in observed trace $\mathit{tr}$ .

Note that throughout the paper, partial ordering notation such as $e\prec e^{\prime}$ refers to the order of $e$ and $e^{\prime}$ in the observed trace $\mathit{tr}$ (not a predictable trace $\mathit{tr^{\prime}}$ ).

Predictable traces do not in general contain every event in the observed trace they are based on. For the purposes of race detection, a predictable trace will conclude with a pair of conflicting events, which are preceded by events necessary according to the definition of predictable trace. For example, consider Figure 1(c), which is a predictable trace of Figure 1(b) that excludes Thread 1’s event after wr(y). The PO rule is satisfied, and the LS rule is satisfied after reordering Thread 2 and 3’s critical sections before Thread 1’s. The LW rule is satisfied because rd(z) is not a causal event in $\mathit{tr^{\prime}}$ .

Definition 2.4 (Predictable race).

An execution $\mathit{tr}$ has a predictable race if a predictable trace $\mathit{tr^{\prime}}$ of $\mathit{tr}$ has two conflicting, consecutive events: $e_{1}\asymp e_{2}\land e_{1}<_{\textsc{$ \mathit{tr^{\prime}} $}}e_{2}\land(\nexists e:e_{1}<_{\textsc{$ \mathit{tr^{\prime}} $}}e<_{\textsc{$ \mathit{tr^{\prime}} $}}e_{2})$ .

Figure 1(b) has a predictable race, as demonstrated by the predictable trace in Figure 1(c). In contrast, Figure 1(e) has no predictable race. The difference between Figures 1(b) and 1(e) is the br event in Thread 1. Since no br exists in Thread 1 in Figure 1(b), rd(z) is not a causal event, which in turn allows the critical sections in Threads 2 and 3 to be reordered above the critical section in Thread 1, allowing wr(y) and rd(y) to be consecutive in the predictable trace. In contrast, a br event executes before wr(y) in Figure 1(e). No predictable trace of this example can exclude br without excluding wr(y); otherwise the PO rule would be violated. rd(z) is a causal event in any predictable trace where wr(y) is included, which makes it impossible to reorder the critical sections. As a result, no predictable trace exists in which wr(y) and rd(y) are consecutive. Figures 1(a) and 1(d) show source code that could lead to the executions in Figures 1(b) and 1(e), respectively. The code in Figure 1(d) has no race; in fact, any deviation of critical section ordering from Figure 1(e)’s causes wr(y) or rd(y) not to execute.

2.3. Existing Predictive Partial Orders

Here we overview three relations introduced in prior work, called happens-before (HB), weak-causally-precedes (WCP), and doesn’t-commute (DC), that can be computed in time linearly proportional to the length of the execution trace (Kini et al., 2017; Roemer et al., 2018). Intuitively, each relation orders events that may not be legal to reorder in a predictable trace, so that two unordered conflicting events represent a true or potential data race (depending on whether the relation is sound). An execution trace has an HB-race, WCP-race, or DC-race if it contains two conflicting events that are unordered by HB, WCP, or DC, respectively.

Definitions of relations.

Table 1 gives definitions of HB, WCP, and DC by presenting their properties comparatively. The first two rows of the table say how the relations order critical sections on the same lock. HB orders all critical sections on the same lock, and it orders the first critical section’s rel(m) to the second critical section’s acq(m). WCP and DC order only conflicting critical sections (critical sections on the same lock containing conflicting events), and they order from the first critical section’s rel(m) to the second critical section’s conflicting access event. That is, if $r_{1}$ and $r_{2}$ are release events on the same lock such that $r_{1}<_{\textsc{$ \mathit{tr} $}}r_{2}$ , and $e_{1}$ and $e_{2}$ are conflicting events ( $e_{1}\asymp e_{2}$ ) such that $e_{1}\in\mathit{CS(}r_{1})\land e_{2}\in\mathit{CS(}r_{2})$ , then $r_{1}\prec_{\textsc{\tiny{WCP}}}e_{2}$ and $r_{1}\prec_{\textsc{\tiny{DC}}}e_{2}$ . The intuition behind these properties of WCP and DC is that non-conflicting critical sections can generally be reordered in a predictable trace; and even in the case of conflicting critical sections, the second critical section can be “reordered” so that it executes only up to its conflicting access and the first critical section does not execute at all in the predictable trace.

The next two table rows show whether the relations include PO or compose with HB. HB and DC include (i.e., are supersets of) PO: if $e_{1}\prec_{\textsc{\tiny{PO}}}e_{2}$ , then $e_{1}\prec_{\textsc{\tiny{HB}}}e_{2}$ and $e_{1}\prec_{\textsc{\tiny{DC}}}e_{2}$ . In contrast, WCP does not include PO but instead composes with the stronger HB: if $e_{1}\prec_{\textsc{\tiny{WCP}}}e_{2}\prec_{\textsc{\tiny{HB}}}e_{3}$ or $e_{1}\prec_{\textsc{\tiny{HB}}}e_{2}\prec_{\textsc{\tiny{WCP}}}e_{3}$ , then $e_{1}\prec_{\textsc{\tiny{WCP}}}e_{2}$ . (By virtue of being transitive, HB composes with itself.) The intuition behind including or composing with PO (a subset of HB) is that PO-ordered events cannot be reordered in a predictable trace. The intuition behind WCP composing with HB, in essence, is to avoid predicting traces that violate the LS rule of predictable traces. As a result, WCP is sound while DC is unsound, as we will see.

The last two rows show properties shared by all relations. First, if two critical sections on the same lock $a_{1}\prec_{\textsc{\tiny{PO}}}r_{1}<_{\textsc{$ \mathit{tr} $}}a_{2}\prec_{\textsc{\tiny{PO}}}r_{2}$ are ordered at all (meaning simply $a_{1}\prec_{\ast}r_{2}$ because all relations minimally compose with PO), then their release events are ordered ( $r_{1}\prec_{\ast}r_{2}$ ). Second, all of the relations are transitive. As a result of being transitive, antisymmetric, and irreflexive, all of the relations are strict partial orders.

Example

As an example of WCP and DC ordering, consider the execution in Figure 2(b). Both relations order Thread 1’s rel(m) to Thread 2’s wr(x) because the critical sections on m contain conflicting accesses to x. By WCP’s composition with HB (and thus PO) and DC’s inclusion of PO, both WCP and DC transitively order rd(x) to wr(x) and wr(y) to rd(y) ( $\textsf{wr(y)}\prec_{\textsc{\tiny{WCP}}}\textsf{rd(y)}$ and $\textsf{wr(y)}\prec_{\textsc{\tiny{DC}}}\textsf{rd(y)}$ ), so the execution has no WCP- or DC-races.

Soundness and completeness.

A relation or analysis is sound if it detects a race only for an execution trace with a predictable race or deadlock.222A trace has a predictable deadlock if there exists a valid reordering with a deadlock. We define soundness to include predictable deadlocks because prior work’s WCP relation (Kini et al., 2017) and our SDP relation are sound in this way. A relation or analysis is complete if it detects a race for every execution trace with a predictable race.

WCP (and HB) are sound: a WCP-race (HB-race) indicates a predictable race or deadlock. DC is unsound: an execution with a DC-race may have no predictable race or deadlock. However, prior work shows that DC-races are generally true predictable races in practice, and an efficient vindication algorithm can verify DC-races as predictable races by computing additional constraints and building a predictable trace exposing the race (Roemer et al., 2018). Later in the paper, we provide more details about vindication, when introducing a new relation that (like DC) is unsound and makes use of a vindication algorithm.

2.4. Limitations of Existing Predictive Partial Orders

WCP and DC analyses are the state of the art in detecting as many predictable races as possible using online, unbounded analysis (Kini et al., 2017; Roemer et al., 2018). However, WCP and DC are incomplete, failing to detect some predictable races. WCP and DC are overly strict because they order all conflicting accesses, conservatively ruling out some predictable traces that still preserve the last writer of each causal read. This strictness arises from imprecise handling of data and control dependence:

Data dependence:

WCP and DC order all conflicting accesses, which is imprecise because the order of a write–write or read–write conflict does not necessarily need to be preserved to satisfy the last-writer (LW) rule of predictable traces.

Consider the executions in Figures 2(a) and 2(b), in which WCP and DC order rd(x) to wr(x). WCP and DC’s read–write ordering assumes that no predictable trace exists where a pair of conflicting accesses are reordered. This rationale works for Figure 2(a), in which no predictable race exists. However, conflicting accesses may be reordered as long as the LW rule of predictable traces is satisfied. Figure 2(c) is a predictable trace of Figure 2(b) that reorders the critical sections and exposes a race on y.

Similarly for write–write conflicts, consider Figure 3(a), in which WCP and DC order the two wr(x) events, leading to no WCP- or DC-race on accesses to y. However, the wr(x) events can be reordered, as the predictable trace in Figure 3(b) shows, exposing a race. (The reader can ignore Figure 3(c) and 3(d) until Section 4.)

It is difficult to model read–write and write–write dependencies more precisely using a partial order. In the case of a read–write dependency, the accesses can be reordered as long as the read cannot impact a branch’s outcome in the predictable trace (i.e., the read is not a causal event in the predictable trace, or is not part of the predictable trace). For a write–write dependency, the accesses can be reordered as long as they do not change a causal read’s last writer in the predictable trace. Incorporating either kind of constraint into a partial order is challenging but also desirable because partial orders can be computed efficiently.

Control dependence:

WCP and DC order true (write–read) dependencies even when the read may not affect a branch outcome that affects whether a race happens.

Figure 4(a) shows an execution with a predictable race, as the predictable trace in Figure 4(b) demonstrates. Note that in Figure 4(b), rd(x) has a different last writer than in Figure 4(a), but the lack of a following br event means that rd(y) is still guaranteed to happen (i.e., rd(x) is not a causal event). A variant of this example is in Figure 4(c), which has a branch event dependent on a read outcome, but the branch can be absent from a predictable trace demonstrating a predictable race (Figure 4(d)).

WCP and DC miss the predictable races in Figures 4(a) and 4(c) by conservatively assuming that any event after a rd(x) may be control dependent on the read value. Similarly, WCP and DC miss the predictable race in Figure 5, which our implementation found in the Java program pmd (Section 7). Essentially, WCP and DC conservatively assume that a dependent branch immediately follows each read. This limitation is unsurprising considering the challenge of modeling control dependencies using a partial order. In particular, it is difficult for a partial order to model the fact that a read must have the same last writer only if the read may affect a branch in the predictable trace.

This work develops partial orders that are weaker than WCP and DC and thus predict more races. At the same time, these new partial orders retain key properties of the existing relations: WCP’s soundness and DC’s amenability to a vindication algorithm that ensures soundness, respectively.

3. Overview

The previous section introduced prior work’s weak-causally-precedes (WCP) (Kini et al., 2017) and doesn’t-commute (DC) (Roemer et al., 2018), and explained their limitations that lead to missing predictable races. The next two sections introduce new relations and analyses that overcome these limitations. Section 4 introduces the strong-dependently-precedes (SDP) and weak-dependently-precedes (WDP) relations, which are weaker than WCP and DC, respectively. Section 5 presents online dynamic analyses for computing SDP and WDP and detecting SDP- and WDP-races.

4. New Dependence-Aware Predictive Relations

This section introduces new partial orders called strong-dependently-precedes (SDP) and weak-dependently-precedes (WDP) that overcome the limitations of prior work’s predictive relations (Roemer et al., 2018; Kini et al., 2017) (Section 2.4) by incorporating more precise notions of data and control dependence.

4.1. The SDP and WDP Partial Orders

SDP is weaker than WCP 333It may seem confusing that SDP is weaker than WCP. SDP is so named because it is stronger than WDP, while WCP is so named because it is weaker than prior work’s causally-precedes (CP) (Smaragdakis et al., 2012). by not ordering write–write conflicts, based on the insight that writes can be unordered unless they can affect the outcome of a read, but write–read and read–write ordering already handles that ordering soundly. WDP only orders the last writer of a read to a branch that depends on that read, which is the only reordering constraint that does not lead to missing predictable races.

Table 2 defines SDP and WDP. The table shows that SDP is like WCP and WDP is like DC except in how they order conflicting critical sections (first two table rows). Furthermore, SDP adds release–release ordering under certain conditions because of write–write conflicts (next-to-last row).

The SDP relation.

SDP only orders conflicting critical sections when one critical section contains a read. Like WCP, SDP orders the first critical section’s rel(m) to the second critical section’s access. The intuition behind this property is that write–write conflicts generally do not impose any limitations on what traces can be predicted. Figure 3(a) shows an example in which two conflicting writes can be safely reordered in a predictable trace.

However, ignoring write–write conflicts altogether would be unsound, as Figure 3(c) shows: the execution has no predictable race. To ensure soundness, SDP handles write–write conflicts by ordering the first critical section to the second thread’s next read to the same variable. Figure 3(d) shows a similar case where the execution has no predictable race. To handle this case as well, SDP also orders critical sections that are ordered by, but do not contain, write–write conflicts.

More formally, SDP is the minimal relation that satisfies the following rules:

•

[Unique to SDP] $r_{1}\prec_{\textsc{\tiny{SDP}}}e_{2}$ if $r_{1}$ and $r_{2}$ are release events on the same lock, $r_{1}<_{\textsc{$ \mathit{tr} $}}r_{2}$ , $e_{1}\in\mathit{CS(}r_{1})$ , $e_{2}\in\mathit{CS(}r_{2})$ , $e_{1}\asymp e_{2}$ , and $e_{1}$ or $e_{2}$ is a read.

•

[Unique to SDP] $r_{1}\prec_{\textsc{\tiny{SDP}}}e_{3}$ if $r_{1}$ and $r_{2}$ are release events on the same lock, $r_{1}<_{\textsc{$ \mathit{tr} $}}r_{2}$ , $e_{1}$ and $e_{2}$ are write events and $e_{3}$ is a read event, $e_{1}\asymp e_{2}$ , $e_{1}\asymp e_{3}$ , $e_{2}\prec_{\textsc{\tiny{PO}}}e_{3}$ , $e_{1}\in\mathit{CS(}r_{1})$ , and $e_{2}\in\mathit{CS(}r_{2})$ .

•

[Borrowed from WCP] $r_{1}\prec_{\textsc{\tiny{SDP}}}r_{2}$ if $r_{1}$ and $r_{2}$ are release events on the same lock and $\mathit{A}(r_{1})\prec_{\textsc{\tiny{SDP}}}r_{2}$ .

•

[Unique to SDP] $r_{1}\prec_{\textsc{\tiny{SDP}}}r_{2}$ if $e_{1}\asymp e_{2}$ are write events, $r_{1}$ and $r_{2}$ are release events on the same lock, $s_{1}$ and $s_{2}$ are release events on the same lock, $\mathit{A}(r_{1})\prec_{\textsc{\tiny{HB}}}s_{1}<_{\textsc{$ \mathit{tr} $}}e_{2}\prec_{\textsc{\tiny{HB}}}r_{2}$ , $e_{1}\in\mathit{CS(}s_{1})$ , $e_{2}\in\mathit{CS(}s_{2})$ , and $e_{1}\not\in\mathit{CS(}r_{1})\lor e_{2}\not\in\mathit{CS(}r_{2})$ .

•

[Borrowed from WCP] $e_{1}\prec_{\textsc{\tiny{SDP}}}e_{2}$ if $e_{3}$ is an event and $e_{1}\prec_{\textsc{\tiny{SDP}}}e_{3}\prec_{\textsc{\tiny{HB}}}e_{2}\lor e_{1}\prec_{\textsc{\tiny{HB}}}e_{3}\prec_{\textsc{\tiny{SDP}}}e_{2}$ .

In essence, SDP addresses a limitation of WCP via more precise handling of data dependencies. SDP certainly does not address all imprecise data dependencies (e.g., read–write dependencies), and it does not address control dependence. SDP is the weakest known sound partial order.

The WDP relation

A separate but worthwhile goal is to develop a partial order that is weaker than DC but produces few false positives so that it is practical to vindicate potential races. WDP achieves this goal and is in fact complete, detecting all predictable races. WDP orders the last writer of each read to the earliest branch that depends on that read (and orders no other conflicting critical sections). The intuition behind this behavior is that the only constraint that is universally true for all predictable traces is that the last writer of a read must not occur after the read if there is a branch that depends on the read.

More formally, if $r_{1}$ and $r_{2}$ are releases on the same lock, $e_{1}\in\mathit{CS(}r_{1})$ , $e_{2}\in\mathit{CS(}r_{2})$ , $e_{1}=\mathit{lastwr}_{tr}(e_{2})$ , $e_{2}\prec_{\textsc{\tiny{PO}}}b$ , and $\mathit{brDepsOn}(b,e_{2})$ , then $r_{1}\prec_{\textsc{\tiny{WDP}}}b$ .

Unlike DC, WDP integrates control dependence by ordering the write’s critical section to the first branch dependent on the read. WDP does not model local data dependencies, where a read affects the value written by a write in the same thread. As a result, WDP may find false races, but Section 6 describes a method for ruling out such false races. These properties make WDP complete (as we show). WDP is the strongest known complete partial order.

SDP- and WDP-races.

Unlike WCP and DC, SDP and WDP do not inherently order all conflicting accesses that hold a common lock. Thus the following definition of SDP- and WDP-races explicitly excludes conflicting accesses holding a common lock.

A trace has a SDP-race (or WDP-race) if it has two conflicting events unordered by SDP (WDP) that hold no lock in common. That is, $\mathit{tr}$ has an SDP-race (WDP-race) on events $e$ and $e^{\prime}$ if $e<_{\textsc{$ \mathit{tr} $}}e^{\prime}$ , $e\asymp e^{\prime}$ , $e\not\prec_{\textsc{\tiny{SDP}}}e^{\prime}$ ( $e\not\prec_{\textsc{\tiny{WDP}}}e^{\prime}$ ), and $\mathit{lockset}(e)\cap\mathit{lockset}(e^{\prime})=\emptyset$ .

Examples.

To illustrate SDP and WDP, we refer back to examples from pages 2–5.

The executions in Figures 2(a) and 2(b) have no SDP-races: SDP orders the read–write conflicts. In contrast, these executions have WDP-races: there is no cross-thread WDP ordering because the executions have no lock-protected write–read conflicts.

Figure 3(a) has SDP- and WDP-races. SDP and WDP do not order the write–write conflict on x. Nor does WDP order events on the write–read conflict on x, since Thread 1’s wr(x) is not the last writer of rd(x).

Figure 3(c), which has a WDP-race but no SDP-race or predictable race, shows the need for SDP’s release–read ordering for write–write conflicts. Figure 3(d) also has a WDP-race but no SDP-race or predictable race, showing the need for SDP’s release–release ordering for write–write conflicts.

The executions in Figures 4(a) and 4(c) have no SDP-race since SDP does not take branches into account. On the other hand, both executions have WDP-races: Figure 4(a) has no branch dependent on the read, and Figure 4(c) has a branch, but it occurs after rd(y).

WDP analysis discovers the predictable race in Figure 5. In this case, the fact that there is no branch within the critical section allows WDP to avoid creating an unnecessary transitive edge that otherwise would hide the race.

4.2. Soundness and Completeness

Figure 4 and Table 4 illustrate the relationships among the different relations and corresponding race types. SDP never misses a race that WCP finds, and WDP never misses a race that DC or SDP finds. SDP is sound but incomplete (never reports a false race but may miss predictable races), while WDP is unsound but complete (may report false races but never misses a predictable race).

Here we prove that SDP is sound and WDP is complete. The proofs are manual and have not been verified by a theorem prover.

Theorem 0 (SDP soundness).

If an execution trace has a SDP-race, then it has a predictable race or a predictable deadlock.

Proof.

We define $\prec_{\textsc{\tiny{SDP}}(i)}$ to be a variant of $\prec_{\textsc{\tiny{WCP}}}$ and $\prec_{\textsc{\tiny{SDP}}}$ that handles conflicting writes like SDP for the first $i$ conflicting writes in $\mathit{tr}$ , and orders conflicting writes like WCP otherwise (Table 2). Formally, $\prec_{\textsc{\tiny{SDP}}(i)}$ is the minimal relation that has all of the properties that $\prec_{\textsc{\tiny{SDP}}}$ has (Table 2; e.g., $r_{1}\prec_{\textsc{\tiny{SDP}}(i)}r_{2}$ if $r_{1}$ and $r_{2}$ are release events on the same lock such that $\mathit{A}(r_{1})\prec_{\textsc{\tiny{SDP}}(i)}r_{2}$ ) and also the following property:

$l\prec_{\textsc{\tiny{SDP}}(i)}w^{\prime}$ if $w$ and $w^{\prime}$ are conflicting write events, $l$ and $l^{\prime}$ are release events of critical sections on the same lock such that $w\in\mathit{CS(}l)$ and $w^{\prime}\in\mathit{CS(}l^{\prime})$ , and there are at least $i$ conflicting pairs of write events (i.e., two conflicting write events without an intervening conflicting write event) before444An event pair $(e,e^{\prime})$ is before an event pair $(w,w^{\prime})$ if $e^{\prime}<_{\mathit{tr}}w^{\prime}\lor(e^{\prime}=w\land e<_{\mathit{tr}}w)$ . the write pair $(w,w^{\prime})$ .

Note that $\prec_{\textsc{\tiny{SDP}}(0)}\equiv\prec_{\textsc{\tiny{WCP}}}$ (because SDP([math]) adds back the write–write conflict rule that SDP omits from WCP) and $\prec_{\textsc{\tiny{SDP}}(\infty)}\equiv\prec_{\textsc{\tiny{SDP}}}$ .

The rest of the proof proceeds by induction to show that SDP( $i$ ) is sound for all $i$ , i.e., if an execution trace has an SDP( $i$ )-race, then it has a predictable race or deadlock.

Base case:

Since $\prec_{\textsc{\tiny{SDP}}(0)}\equiv\prec_{\textsc{\tiny{WCP}}}$ and WCP is sound (Kini et al., 2017), SDP([math]) is sound.

Inductive step:

Let $\sigma$ be an execution trace whose first SDP( $i$ )-race is between events $e_{1}$ and $e_{2}$ , where first means that $e_{2}$ is as early as possible in $\sigma$ , and among SDP( $i$ )-races whose second event is $e_{2}$ , $e_{1}$ is as late as possible in $\sigma$ . Proceeding with proof by contradiction, suppose $\sigma$ has no predictable race or deadlock.

Now let $\mathit{tr}$ be a trace equivalent to $\sigma$ that moves all events between $e_{1}$ and $e_{2}$ that are not HB ordered with both events, to outside of $e_{1}$ and $e_{2}$ , and additionally removes all events after $e_{2}$ . Specifically:

•

if $e_{1}<_{\sigma}e\land e_{1}\not\prec_{\textsc{\tiny{HB}}}e\prec_{\textsc{\tiny{HB}}}e_{2}$ , move $e$ before $e_{1}$ in $\mathit{tr}$ ;

•

if $e_{1}<_{\sigma}e<_{\sigma}e_{2}\land{e\not\prec_{\textsc{\tiny{HB}}}e_{2}}$ , omit $e$ from $\mathit{tr}$ ;

•

if $e_{2}<_{\sigma}e$ , omit $e$ from $\mathit{tr}$ .

Thus the last event in $\mathit{tr}$ is $e_{2}$ . Like $\sigma$ , $\mathit{tr}$ ’s first SDP( $i$ )-race is between events $e_{1}$ and $e_{2}$ , and $\mathit{tr}$ has no predictable race or deadlock. That is, $e_{1}<_{\textsc{$ \mathit{tr} $}}e_{2}$ , $e_{1}\asymp e_{2}$ , $e_{1}\not\prec_{\textsc{\tiny{SDP}}}e_{2}$ in $\mathit{tr}$ , and $e_{1}$ and $e_{2}$ are not in critical sections on the same lock.

Because $\mathit{tr}$ has no predictable race or deadlock, $e_{1}\prec_{\textsc{\tiny{SDP}}(i-1)}e_{2}$ by the induction hypothesis. Because of the disparity between $e_{1}\prec_{\textsc{\tiny{SDP}}(i-1)}e_{2}$ and $e_{1}\not\prec_{\textsc{\tiny{SDP}}(i)}e_{2}$ , it must be that $l\prec_{\textsc{\tiny{SDP}}(i-1)}w^{\prime}$ and $l\not\prec_{\textsc{\tiny{SDP}}(i)}w^{\prime}$ , where $w^{\prime}$ is the $i$ th conflicting write in $\mathit{tr}$ , $w$ is the latest write before $w^{\prime}$ such that $w\asymp w^{\prime}$ , and $l$ is the outermost release event of a critical section containing $w$ such that a release event $l^{\prime}$ on the same lock that ends a critical section containing $w^{\prime}$ also exists.

Note that $l\prec_{\textsc{\tiny{SDP}}(i-1)}w^{\prime}$ “causes” $e_{1}\prec_{\textsc{\tiny{SDP}}(i-1)}e_{2}$ : the only difference between $\prec_{\textsc{\tiny{SDP}}(i-1)}$ and $\prec_{\textsc{\tiny{SDP}}(i)}$ is the order between $l$ and $w^{\prime}$ , along with transitive and release–release orders implied by it. Thus, by the definition of $\prec_{\textsc{\tiny{SDP}}(i-i)}$ , either

(1)

$e_{1}\prec_{\textsc{\tiny{HB}}}l\prec_{\textsc{\tiny{SDP}}(i-1)}w^{\prime}\prec_{\textsc{\tiny{HB}}}e_{2}$ ; or 2. (2)

$e_{\mathit{acq}}\prec_{\textsc{\tiny{HB}}}l\prec_{\textsc{\tiny{SDP}}(i-1)}w^{\prime}\prec_{\textsc{\tiny{HB}}}e_{\mathit{rel}}$ , where $e_{\mathit{acq}}$ and $e_{\mathit{rel}}$ are the start and end events, respectively, of two critical sections on the same lock. One or more applications of the acq–rel $\Rightarrow$ rel–rel property (Table 2) on $e_{\mathit{acq}}\prec_{\textsc{\tiny{SDP}}(i-1)}e_{\mathit{rel}}$ results in $e_{1}\prec_{\textsc{\tiny{SDP}}(i-1)}e_{2}$ .

In the second case, $e_{\mathit{acq}}\prec_{\textsc{\tiny{SDP}}(i)}e_{\mathit{rel}}$ by the “critical sections ordered by a write–write conflict” property of SDP (Table 2), and thus $e_{1}\prec_{\textsc{\tiny{SDP}}(i)}e_{2}$ , a contradiction. Thus the first case must hold.

Suppose there is a read event $r$ that reads the same variable as $w$ and $w^{\prime}$ such that $w^{\prime}<_{\textsc{$ \mathit{tr} $}}r<_{\textsc{$ \mathit{tr} $}}e_{2}$ . It must be that $w^{\prime}\prec_{\textsc{\tiny{HB}}}r\prec_{\textsc{\tiny{HB}}}e_{2}$ : if $r\not\prec_{\textsc{\tiny{HB}}}e_{2}$ then $r\not\in\mathit{tr}$ , and if $w^{\prime}\not\prec_{\textsc{\tiny{HB}}}r$ then ( $w^{\prime}$ , $r$ ) are a race earlier than ( $e_{1}$ , $e_{2}$ ), both contradictions.

Events $w^{\prime}$ and $r$ are executed either by the same or different threads:

(a)

If $w^{\prime}\asymp r$ , then $e_{1}\prec_{\textsc{\tiny{HB}}}w^{\prime}\prec_{\textsc{\tiny{SDP}}(i)}r\prec_{\textsc{\tiny{HB}}}e_{2}$ . 2. (b)

If $w^{\prime}\prec_{\textsc{\tiny{PO}}}r$ , then $w\asymp r$ and thus $e_{1}\prec_{\textsc{\tiny{HB}}}l\prec_{\textsc{\tiny{SDP}}(i)}r\prec_{\textsc{\tiny{HB}}}e_{2}$ by the “conflicting writes imply release–read ordering” property of SDP.

In all cases, $e_{1}\prec_{\textsc{\tiny{SDP}}(i)}e_{2}$ which is a contradiction. As a result, no such $r$ exists.

Now consider the trace $\mathit{tr^{\prime}}$ that is equivalent to $\mathit{tr}$ except that

•

$w^{\prime}$ is replaced by a wr(x) event, where x is a brand-new variable not used in $\mathit{tr}$ .

•

For every read $r$ in $\mathit{tr}$ that reads the same variable as $w^{\prime}$ , an event $r^{\prime}$ is appended immediately after $r$ such that $r^{\prime}$ is a rd(x) event and $r\prec_{\textsc{\tiny{PO}}}r^{\prime}$ in $\mathit{tr^{\prime}}$ .

Note that the SDP( $i-1$ ) ordering for $\mathit{tr^{\prime}}$ is the same as the SDP( $i$ ) ordering for $\mathit{tr}$ , because there exists no read for which $w^{\prime}$ was the last writer, and the added reads $r^{\prime}$ ensure that for any read $r$ for which $r<_{\textsc{$ \mathit{tr} $}}w^{\prime}$ we have $r^{\prime}<_{\textsc{$ \mathit{tr^{\prime}} $}}\textsf{wr(x)}$ . In other words, the rd(x)–wr(x) conflicts introduce the same ordering in $\mathit{tr^{\prime}}$ as the original read–write conflicts between $w^{\prime}$ and its prior reads in $\mathit{tr}$ , and $\mathit{tr^{\prime}}$ does not contain the write–write conflict on $w$ and $w^{\prime}$ found in $\mathit{tr}$ . Thus in $\mathit{tr^{\prime}}$ , $e_{1}\not\prec_{\textsc{\tiny{SDP}}(i-1)}e_{2}$ . By the induction hypothesis, $\mathit{tr^{\prime}}$ has a predictable race or deadlock. Let $\mathit{tr^{\prime\prime}}$ be a predictable trace of $\mathit{tr^{\prime}}$ that exposes a race or deadlock. However, if we modify $\mathit{tr^{\prime\prime}}$ by removing the rd(x) events and replacing the wr(x) event with $w^{\prime}$ , the resulting trace is a predictable trace of $\mathit{tr}$ that exposes a race or deadlock. Thus $\mathit{tr}$ has a predictable race or deadlock, which is a contradiction.

Thus for all $i$ , SDP( $i$ ) is sound. Since $\prec_{\textsc{\tiny{SDP}}(\infty)}\equiv\prec_{\textsc{\tiny{SDP}}}$ , therefore SDP is sound. ∎

Theorem 4.1 (WDP completeness).

If an execution trace has a predictable race, then it has a WDP-race.

To prove the theorem, we use the following helper lemma:

Lemma 4.2 (WDP-ordered events cannot be reordered).

Given an execution trace $\mathit{tr}$ , for any events $e_{1}$ and $e_{2}$ in $\mathit{tr}$ such that $e_{1}\prec_{\textsc{\tiny{WDP}}}e_{2}$ , let $\mathit{tr^{\prime}}$ be a reordering of $\mathit{tr}$ where $e_{1}$ and $e_{2}$ have been reordered: either $e_{2}<_{\textsc{$ \mathit{tr^{\prime}} $}}e_{1}$ or $e_{2}\in\mathit{tr^{\prime}}\land e_{1}\notin\mathit{tr^{\prime}}$ . Then, $\mathit{tr^{\prime}}$ must not be a valid predictable trace of $\mathit{tr}$ .

The overall proof strategy is analogous to a corresponding proof for DC (Roemer et al., 2018), so we have relegated the proof of Lemma 4.2 to Appendix A.

Proof of Theorem 4.1.

Let us prove this theorem by contradiction. Let $\mathit{tr}$ be a trace with a predictable race on conflicting events $e_{1}$ and $e_{2}$ such that $e_{1}<_{\textsc{$ \mathit{tr} $}}e_{2}$ , but no WDP-race. Let $\mathit{tr^{\prime}}$ be a predictable trace of $\mathit{tr}$ in which $e_{1}$ and $e_{2}$ are consecutive: $e_{1}<_{\textsc{$ \mathit{tr^{\prime}} $}}e_{2}$ and $\nexists e:e_{1}<_{\textsc{$ \mathit{tr^{\prime}} $}}e<_{\textsc{$ \mathit{tr^{\prime}} $}}e_{2}$ .

Applying the definition of a WDP-race (Section 4), either $e_{1}\prec_{\textsc{\tiny{WDP}}}e_{2}$ or $\mathit{lockset}(e_{1})\cap\mathit{lockset}(e_{2})\neq\emptyset$ . If $\mathit{lockset}(e_{1})\cap\mathit{lockset}(e_{2})\neq\emptyset$ , then $\mathit{tr^{\prime}}$ violates the LS rule of predictable traces.

Thus $e_{1}\prec_{\textsc{\tiny{WDP}}}e_{2}$ . By the definition of a predictable race, $e_{1}$ and $e_{2}$ must be read or write events, and must be on different threads. As a result, the WDP ordering between $e_{1}$ and $e_{2}$ cannot be established by WDP conflicting critical section ordering or “ $\textsf{acq(m)}\prec_{\textsc{\tiny{WDP}}}\textsf{rel(m)}\implies\textsf{rel(m)}\prec_{\textsc{\tiny{WDP}}}\textsf{rel(m)}$ ,” which require $e_{1}$ to be a release, and not by PO since the events are on different threads. Therefore, $e_{1}\prec_{\textsc{\tiny{WDP}}}e_{2}$ by WDP transitivity, so there must exist an event $e$ such that $e_{1}\prec_{\textsc{\tiny{WDP}}}e\prec_{\textsc{\tiny{WDP}}}e_{2}$ .

Since $e_{1}$ and $e_{2}$ are consecutive in $\mathit{tr^{\prime}}$ , either $e<_{\textsc{$ \mathit{tr^{\prime}} $}}e_{1}$ , $e_{2}<_{\textsc{$ \mathit{tr^{\prime}} $}}e$ , or $e_{2}\in\mathit{tr^{\prime}}\land e\notin\mathit{tr^{\prime}}$ . By Lemma 4.2, any of these possibilities implies $\mathit{tr^{\prime}}$ is an invalid predictable trace of $\mathit{tr}$ , a contradiction. ∎

4.3. Using Precise Dependence Information

Up to this point, we have assumed that a branch event depends on every preceding read event in the same thread, meaning that the condition $\mathit{brDepsOn}(b,e_{2})$ in WDP’s handling of write–read critical sections holds for every read $e_{2}$ and branch $b$ . This assumption is needed unless static control dependence information is available from conservative static program analysis (e.g., (Huang and Huang, 2017; Ferrante et al., 1987)). We tried out one kind of static analysis to compute static control dependencies but found it provided no benefit, so our experiments do not use it (Section 7). Here we show some examples of how WDP uses static control dependence information if it is available.

Figure 6 shows two executions that differ only in whether precise static control dependency information is available. Figure 6(a) has no control dependency information available, so each branch is conservatively dependent on all prior reads, or the information is available but the branch outcome may depend on the prior read. Figure 6(b) has control dependency information that says that the branch outcome does not depend on the prior read. As a result, Figure 6(b) has weaker WDP ordering than Figure 6(a), leading to a detected WDP-race in Figure 6(b) only.

5. SDP and WDP Analyses

SDP analysis and WDP analysis are new online dynamic program analyses that compute SDP and WDP and detect SDP- and WDP-races, respectively. Algorithms 1 and 2 show SDP and WDP analyses, respectively, for each kind of event. This section’s notation and terminology follow the WCP and DC papers’ to some extent (Roemer et al., 2018; Kini et al., 2017).

Both algorithms show the differences relative to prior analyses (SDP versus WCP and WDP versus DC) by labeling lines with “ $+$ ” to show logic added by our analyses and “ $-$ ” with grayed-out text to show lines removed by our analyses. Algorithm 1 shows that SDP analysis requires several changes to WCP analysis. These changes are for tracking write–write conflicts to add ordering (1) when a future read is detected on the second write’s thread and (2) when two critical sections are HB ordered by the conflicting writes. In addition, SDP analysis avoids reporting write–write races for writes in critical sections on the same lock.

Algorithm 2 shows that WDP analysis makes several significant changes to DC analysis. These changes are primarily to deal with branches, by recording information about write–read dependencies at read events (lines 16–19) and using the recorded information at branch events (lines 33–34). Unlike DC analysis, WDP analysis does not establish ordering at any conflicting accesses, and it never needs to track ordering from a read to another access (since it detects only write–read conflicts). In addition, WDP analysis ensures it does not report races on accesses in critical sections on the same lock.

Analysis details

In both SDP and WDP analyses, the procedural parameters $t$ and $l$ are the current thread and lock; $L$ is the set of locks held by the thread performing the current event; $R$ and $W$ are the sets of variables that were read and written in the ending critical section on $l$ ; and $e$ represents the current read or branch event (for detecting branch dependencies).

The analysis uses vector clocks (Mattern, 1988) to represent logical SDP or WDP time. A vector clock $C:\mathit{Tid}\mapsto\mathcal{N}$ maps each thread to a nonnegative integer. Operations on vector clocks are pointwise comparison ( $C_{1}\sqsubseteq C_{2}\iff\forall t.C_{1}(t)\leq C_{2}(t)$ ) and pointwise join ( $C_{1}\sqcup C_{2}\equiv\lambda t.\mathit{\max(C_{1}(t),C_{2}(t))}$ ):

Both analyses maintain the following state:

•

$\use@mathgroup\M@U\symAMSb{C}_{t}$ is a vector clock that represents the current SDP or WDP time for thread $t$ .

•

$\use@mathgroup\M@U\symAMSb{R}_{x}$ and $\use@mathgroup\M@U\symAMSb{W}_{x}$ are vector clocks that represent the SDP or WDP time of the last reads and writes to $x$ .

•

$\mathit{Acq}_{l}(t)$ and $\mathit{Rel}_{l}(t)$ (SDP) and $\mathit{Acq}_{l,t^{\prime}}(t)$ and $\mathit{Rel}_{l,t^{\prime}}(t)$ (WDP) are queues of vector clocks that help compute the “ $\textsf{acq(m)}\prec\textsf{rel(m)}$ implies $\textsf{rel(m)}\prec\textsf{rel(m)}$ ” property (Table 2).

In addition, SDP analysis maintains the following state:

•

$\use@mathgroup\M@U\symAMSb{H}_{t}$ is a vector clock that represents the current HB time for thread $t$ . $\use@mathgroup\M@U\symAMSb{C}_{t}[t:=\use@mathgroup\M@U\symAMSb{H}_{t}(t)]$ , which evaluates to a vector clock with every element equal to $\use@mathgroup\M@U\symAMSb{C}_{t}$ except that element $t$ is equal to $\use@mathgroup\M@U\symAMSb{H}_{t}(t)$ , represents $\prec_{\textsc{\tiny{SDP}}}\cup\prec_{\textsc{\tiny{PO}}}$ .

•

$\use@mathgroup\M@U\symAMSb{L}^{r}_{l,x}$ and $\use@mathgroup\M@U\symAMSb{L}^{w}_{l,x}$ are vector clocks that represent the HB times of critical sections on $l$ that read and wrote $x$ , respectively.

•

$\use@mathgroup\M@U\symAMSb{B}_{t,x}$ is a vector clock that represents the SDP time of a release event of a critical section on lock $m$ containing a write event $w$ to $x$ such that a later write event $w^{\prime}$ to $x$ by $t$ conflicts with the write and $m\in\mathit{lockset}(w)\cap\mathit{lockset}(w^{\prime})$ .

•

$\use@mathgroup\M@U\symAMSb{K}_{t}$ is a vector clock that represents the time of a partial order that orders conflicting writes (i.e., $r_{1}\prec w_{2}$ if $w_{1}<_{\textsc{$ \mathit{tr} $}}w_{2}$ are conflicting write events, $r_{1}$ is a release and $a_{2}$ is an acquire on the same lock, $w_{1}\in\mathit{CS(}r_{1})$ , and $w_{2}\in\mathit{CS(}a_{2})$ ) and composes with HB, for thread $t$ .

•

$\use@mathgroup\M@U\symAMSb{Y}_{l}$ is a vector clock that represents the time of a partial order that orders conflicting writes except those involving the current critical section on lock $l$ (i.e., $r_{1}\prec w_{2}$ if $w_{1}<_{\textsc{$ \mathit{tr} $}}w_{2}$ are conflicting write events, $r_{1}$ is a release and $a_{2}$ is an acquire on the same lock, $a_{2}$ is not the most recent acquire of lock $l$ , $w_{1}\in\mathit{CS(}r_{1})$ , and $w_{2}\in\mathit{CS(}a_{2})$ ) and composes with HB, for the thread currently holding lock $l$ . (While $l$ is unheld, $\use@mathgroup\M@U\symAMSb{Y}_{l}$ is undefined.)

•

$\mathit{Acq}^{K}_{l}(t)$ is a queue of vector clocks akin to $\mathit{Acq}_{l}(t)$ , but instead of representing SDP times, each vector clock represents the time of a partial order that orders conflicting writes and composes with HB.

WDP analysis maintains the following additional state:

•

$\use@mathgroup\M@U\symAMSb{L}_{l,x}$ is a vector clock that represents the WDP time of critical sections on $l$ that wrote $x$ .

•

$\use@mathgroup\M@U\symAMSb{B}_{t,x}$ is a vector clock that represents the WDP time of the last write event $e$ to $x$ such that a later read event $e^{\prime}$ to $x$ by $t$ conflicts with $e$ and $\mathit{lockset}(e)\cap\mathit{lockset}(e^{\prime})\neq\emptyset$ .

•

$T_{x}$ is the last thread to write to $x$ , or $\varnothing$ if no thread has yet written $x$ .

•

$D_{t}$ is a set of pairs $\langle x,e\rangle$ such that event $e$ is a read to $x$ by a thread that has not (yet) executed a branch $b$ such that $\mathit{brDepsOn}(b,e)$ .

•

$L^{r}_{x,t}$ and $L^{w}_{x,t}$ are sets of locks that were held by thread $t$ when it last read and wrote variable $x$ , respectively.

Initially, every vector clock maps every thread to 0, except $\forall t.\use@mathgroup\M@U\symAMSb{H}_{t}(t)=1$ for SDP analysis, and $\forall t.\use@mathgroup\M@U\symAMSb{C}_{t}(t)=1$ for WDP analysis. Every queue and set is initially empty.

The analyses compute the “ $\textsf{acq(m)}\prec\textsf{rel(m)}$ implies $\textsf{rel(m)}\prec\textsf{rel(m)}$ ” property (Table 2) similarly to WCP and DC analyses, respectively (Kini et al., 2017; Roemer et al., 2018). Briefly, $\mathit{Acq}_{l}(t)$ and $\mathit{Rel}_{l}(t)$ contain times of acq(l) and rel(l) operations (by any thread other than $t$ ) such that the acq(l) operation is not yet SDP ordered to a following rel(l) by thread $t$ . $\mathit{Acq}_{l,t^{\prime}}(t)$ and $\mathit{Rel}_{l,t^{\prime}}(t)$ contain times of acq(l) and rel(l) operations by thread $t$ such that the acq(l) operation is not yet WDP ordered to a following rel(l) by thread $t^{\prime}$ . SDP additionally uses $\mathit{Acq}^{K}_{l}(t)$ for acq(l) operations for write–write conflicts that are HB ordered before the acquire, which is required for soundness since SDP does not directly order write–write conflicts like WCP.

The analyses provide SDP and WDP’s handling of conflicting critical sections by detecting some kinds of conflicts on accesses holding a common lock. SDP analysis orders the earlier release of a common lock to the current event for write–read and read–write conflicts using $\use@mathgroup\M@U\symAMSb{L}^{r}_{l,x}$ and $\use@mathgroup\M@U\symAMSb{L}^{w}_{l,x}$ (lines 24 and 28 in Algorithm 1). For write–write conflicts, SDP analysis stores the time of the earlier release of a common lock in $\use@mathgroup\M@U\symAMSb{B}_{t,x}$ (line 31) to order the earlier write to a later read of $x$ by the current thread (line 23).

SDP analysis detects release–release ordering by write–write conflicts using $\use@mathgroup\M@U\symAMSb{K}_{t}$ , $\use@mathgroup\M@U\symAMSb{Y}_{l}$ , and $Acq^{K}_{l}(t)$ . In particular, the analysis tracks write–write ordering using $\use@mathgroup\M@U\symAMSb{K}_{t}$ , and it tracks write–write ordering excluding the current critical section on $l$ using $\use@mathgroup\M@U\symAMSb{Y}_{l}$ , in accordance with the SDP definition. Line 11 compares $\use@mathgroup\M@U\symAMSb{Y}_{l}$ and $Acq^{K}_{l}(t)$ to check whether two critical sections are ordered by write–write ordering excluding the current critical section on $l$ . Effectively, $\use@mathgroup\M@U\symAMSb{Y}_{l}$ , $\use@mathgroup\M@U\symAMSb{K}_{t}$ , and $\mathit{Acq}^{K}_{l}(t)$ represent relations that SDP can use to check its release–release rules (page • ‣ 4.1).

WDP analysis orders the release on the writer’s executing thread to a later branch dependent on the read. The analysis does so by recording the time of the last writer’s release in $\use@mathgroup\M@U\symAMSb{L}_{l,x}$ (line 8 in Algorithm 2). Later, when a conflicting read occurs on thread $t$ holding $l$ , the analysis uses $\use@mathgroup\M@U\symAMSb{L}_{l,x}$ to get the time for the last conflicting writer $T_{x}$ ’s release, and stores this time in $\use@mathgroup\M@U\symAMSb{B}_{t,x}$ (line 18). When $t$ executes a branch dependent on the prior conflicting read, WDP adds ordering from the release to the current branch (line 33). The analysis detects the dependent branch using $D_{t}$ , which contains a set of $\langle x,e\rangle$ pairs for which a branch dependent on read event $e$ has not yet executed (line 30). The exact representation of $e$ and behavior of $\mathit{brDepsOn}(b,e)$ are implementation dependent.

SDP analysis provides composition with HB using $\use@mathgroup\M@U\symAMSb{C}_{t}$ , $\use@mathgroup\M@U\symAMSb{H}_{t}$ , and $\use@mathgroup\M@U\symAMSb{C}_{t}[t:=\use@mathgroup\M@U\symAMSb{H}_{t}(t)]$ .555In Algorithm 1, $\use@mathgroup\M@U\symAMSb{C}_{t}$ and $\use@mathgroup\M@U\symAMSb{C}_{t}[t:=\use@mathgroup\M@U\symAMSb{H}_{t}(t)]$ are analogous to the WCP paper’s $\use@mathgroup\M@U\symAMSb{P}_{t}$ and $\use@mathgroup\M@U\symAMSb{C}_{t}$ , respectively (Kini et al., 2017). WDP analysis includes PO with the increment of $\use@mathgroup\M@U\symAMSb{C}_{t}(t)$ at line 11.

The analyses check the conditions for a SDP- or WDP-race by using $\use@mathgroup\M@U\symAMSb{R}_{x}$ and $\use@mathgroup\M@U\symAMSb{W}_{x}$ . Since the analyses do not order all pairs of conflicting accesses, unordered conflicting accesses are not sufficient to report a race. SDP analysis uses the vector clock $\use@mathgroup\M@U\symAMSb{L}^{w}_{l,x}$ and WDP analysis uses the locksets $L^{r}_{x,t}$ and $L^{w}_{x,t}$ to check if the current and prior conflicting accesses’ held locks overlap (lines 25, 30, and 34 in Algorithm 1; lines 15, 25, and 27 in Algorithm 2).

Atomic accesses and operations.

We extend SDP and WDP analyses to handle accesses that have ordering or atomicity semantics: atomic accesses that introduce ordering such as Java volatile and C++ atomic accesses, and atomic read-modify-write operations such as atomic test-and-set.

The following pseudocode shows how we extend WDP analysis (Algorithm 2) to handle atomic reads and writes and atomic operations. (The extensions to SDP analysis are similar but also add conflicting read–write and write–write–read ordering.)

1:procedure atomicRead( $t,x,e$ )

2: let $t^{\prime}\leftarrow T_{x}$ $\triangleright$ Get last writer thread of $x$

3: if $t^{\prime}\notin\{\varnothing,t\}\land\use@mathgroup\M@U\symAMSb{W}_{x}\nsqsubseteq\use@mathgroup\M@U\symAMSb{C}_{t}$ then $\triangleright$ Write–read conflict

4: $\use@mathgroup\M@U\symAMSb{B}_{t,x}\leftarrow\use@mathgroup\M@U\symAMSb{W}_{x}$ $\triangleright$ Record the write

5: if $\use@mathgroup\M@U\symAMSb{B}_{t,x}\nsqsubseteq\use@mathgroup\M@U\symAMSb{C}_{t}$ then $D_{t}\leftarrow D_{t}\cup\{\langle x,e\rangle\}$

6:procedure atomicWrite( $t,x$ )

7: $\use@mathgroup\M@U\symAMSb{W}_{x}\leftarrow\use@mathgroup\M@U\symAMSb{C}_{t}$

8: $T_{x}\leftarrow t$ $\triangleright$ Set last writer thread of $x$

9:procedure atomicReadModifyWrite( $t,x,e$ )

10: atomicRead( $t$ , $x$ , $e$ )

11: atomicWrite( $t$ , $x$ )

In essence, the analysis handles atomic accesses like regular accesses contained in single-access critical sections on a unique lock to the accessed variable. The analysis treats an atomic operation as an atomic read followed by an atomic write.

Handling races.

The behavior of programs with data races is unreliable (Adve and Boehm, 2010; Dolan et al., 2018), but our analyses’ instrumentation performs synchronization operations before accesses, which generally ensures sequential consistency (SC) for all program executions. A different problem is that if an analysis continues detecting races after the first race, then additional detected races are not necessarily real races because they may depend on an earlier race (i.e., if the earlier race were ordered, the later race would not exist). Our implementation (Section 7.1) addresses this issue by treating racing accesses as if they were contained in single-access critical sections on the same lock. Specifically, SDP analysis orders one racing event to the other for write–read and read–write races, and WDP analysis orders write–read races to a branch that depends on the read if the write is the last writer of the read. For example, after detecting a race in line 15, WDP analysis performs the following: $\textbf{if}\>t^{\prime}=T_{x}\>\textbf{then}\;\use@mathgroup\M@U\symAMSb{B}_{t,x}\leftarrow\use@mathgroup\M@U\symAMSb{W}_{x}(t^{\prime});\>D_{t}\leftarrow D_{t}\cup\{\langle x,e\rangle\}$ .

Time and space complexity.

The run times of SDP and WDP analyses are each linear in the number of events. For WCP and DC analyses (Kini et al., 2017; Roemer et al., 2018), for $N$ events, $L$ locks held at once by a thread, and $T$ threads, time complexity for an entire execution trace is $O(N\times(L\times T+T^{2}))$ . For SDP analysis, time complexity is generally the same as for WCP analysis, except that the run time of line 33 in Algorithm 1 is $O(L^{2}\times T)$ . As a result, SDP’s run time is $O(N\times(L^{2}\times T+T^{2}))$ . Since $L$ is the number of locks simultaneously held by a single thread, we expect it to be small in practice. For WDP analysis, time complexity for each event type is the same as for DC analysis, and the run time for branch events can be amortized over read events.

6. Verifying WDP-Races

WDP analysis is unsound, so a WDP-race may not indicate a predictable race (i.e., there may be no race exposed in any predictable trace). To avoid reporting false races, our approach post-processes each detected WDP-race with an algorithm called VindicateWDPRace. Here we overview VindicateWDPRace; Appendix B presents VindicateWDPRace in detail with an algorithm and examples.

To support performing VindicateWDPRace on WDP-races, WDP analysis builds a constraint graph in which execution events are nodes, and initially edges represent WDP ordering. VindicateWDPRace discovers and adds additional constraints to the graph that enforce lock semantics (LS) and last-writer (LW) rules. VindicateWDPRace uses the constraint graph to attempt to construct a predictable trace that exposes the WDP-race as a predictable race.

VindicateWDPRace extends prior work’s VindicateDCRace algorithm for checking DC-races (Roemer et al., 2018). VindicateWDPRace differs from VindicateDCRace primarily in the following way. VindicateWDPRace computes and adds LW constraints to the constraint graph for all reads that must be causal events in the predictable trace. Importantly, VindicateWDPRace computes causal reads and adds LW constraints at each iteration of adding constraints and at each attempt at building a predictable trace.

Algorithm 3 shows VindicateWDPRace at a high level; Appendix B presents VindicateWDPRace in detail. VindicateWDPRace takes the initial constraint graph ( $G$ ) and a WDP-race ( $e_{1},e_{2}$ ) as input (line 1). It first calls AddConstraints (line 2), which adds necessary constraints to $G$ and returns an updated constraint graph. AddConstraints first adds consecutive-event constraints (i.e., edges) to $G$ to enforce that any predictable trace must execute $e_{1}$ and $e_{2}$ consecutively (line 9). AddConstraints then computes the set of causal events for any predictable trace constrained by $G$ , which it uses to add LW constraints to $G$ , ensuring that every causal read in a predictable trace can have the same last writer as in the original trace (line 11). Next, AddConstraints adds LS constraints to $G$ , by identifying critical sections on the same lock that are partially ordered and thus must be fully ordered to obey LS rules (line 12). Since added LW and LS constraints may lead to new LS and LW constraints being detected, respectively, AddConstraints iterates until it finds no new constraints to add (lines 10–13).

The constraints added by AddConstraints are necessary but insufficient constraints on any trace exposing a predictable race on $e_{1}$ and $e_{2}$ . Thus if $G$ has a cycle that must be part of any predictable trace, then the original trace has no predictable race on $e_{1}$ and $e_{2}$ (line 3). Otherwise, AddConstraints calls ConstructReorderedTrace (line 5), which attempts to construct a legal predictable trace $\mathit{tr^{\prime}}$ . ConstructReorderedTrace is a greedy algorithm that starts from $e_{1}$ and $e_{2}$ and works backward, adding events in reverse order that satisfy $G$ ’s constraints and also conform to LS and LW rules ( $G$ ’s constraints are necessary but insufficient). If ConstructReorderedTrace returns a (non-empty) trace $\mathit{tr^{\prime}}$ , it is a legal predictable trace exposing a race on $e_{1}$ and $e_{2}$ (line 6). Otherwise, ConstructReorderedTrace returns an empty trace, which means that it could not find a predictable race, although one may exist (line 7).

7. Evaluation

This section evaluates the predictive race detection effectiveness and run-time performance of this paper’s approaches.

7.1. Implementation

We implemented SDP and WDP analyses and VindicateWDPRace by extending the publicly available Vindicator implementation, which includes HB, WCP, and DC analyses and VindicateDCRace (Roemer et al., 2018).666https://github.com/PLaSSticity/Vindicator Vindicator is built on top of RoadRunner, a dynamic analysis framework for concurrent Java programs (Flanagan and Freund, 2010b).777https://github.com/stephenfreund/RoadRunner/releases/tag/v0.5 We extended RoadRunner to instrument branches to enable WDP analysis at program branches. RoadRunner operates on the Java bytecode of analyzed programs, so analysis properties such as SDP soundness and WDP completeness hold with respect to the execution of the bytecode, even if the JVM compiler optimizes away control or data dependencies.

Our implementation of SDP and WDP analyses and VindicateWDPRace is publicly available.888https://github.com/PLaSSticity/SDP-WDP-implementation

We evaluated Joana to perform static analysis for detecting whether a branch depends on prior reads or not (Giffhorn and Hammer, 2008), following the system dependency graphs used in MCR-S (Huang and Huang, 2017). We found no practical advantages to using Joana. In most programs, for the vast majority of the write–read branch dependencies executed, the next branch after the read is dependent on the read according to Joana. In pmd and sunflow, static analysis reported many write–read dependencies where the following branch did not depend on the read, but this did not lead to any additional WDP-races being detected. It is unclear whether these results are mainly due to properties of the evaluated programs (i.e., if almost all branches do depend on prior reads) or imprecision of Joana’s static analysis. Our implementation and evaluation do not use static analysis, and instead assume that branches always depend on prior reads.

SDP and WDP analyses.

We implemented a single analysis tool within RoadRunner that can perform HB, WCP, DC, SDP, and WDP analyses on a single observed execution. The implementation of HB, WCP, and DC analyses are taken from the Vindicator implementation, and implementation of SDP and WDP analyses follows Algorithms 1 and 2. For thread fork and join (including implicitly forked threads (Roemer et al., 2018)) and static class initializer edges (Lindholm and Yellin, 1999), each analysis adds appropriate ordering between the two synchronizing events. The analyses treat calls to m.wait() as a release of m followed by an acquire of m. The analyses instrument volatile variable accesses as atomic accesses as described in Section 5. The analyses can in theory handle lock-free data structures, such as data structures in java.util.concurrent, by handling atomic operations as in Section 5. However, RoadRunner instruments only application code, not Java library code, and it does not intercept underlying atomic operations (e.g., by instrumenting calls to atomic sun.misc.Unsafe methods). The analyses may thus miss some synchronization in the evaluated programs.

The analyses can determine that some observed events are “redundant” and cannot affect the analysis results. For a read or write event, if the same thread has performed a read or write, respectively, to the same variable without an intervening synchronization operation, then the access is redundant. For a branch event, if the same thread has not performed a read event since the last branch event, then the branch is redundant (since our implementation assumes that a branch is dependent on all prior reads). The implementation “fast path” detects and filters redundant events, and does not perform analysis for them.

The implementation is naturally parallel because application threads running in parallel perform analysis. The implementation uses fine-grained synchronization on metadata to ensure atomicity of the analysis for an event. For WDP analysis, to obtain an approximation of $<_{\textsc{$ \mathit{tr} $}}$ (needed by vindication; see line 44 of Algorithm 4 in Appendix B), the implementation assigns each event node in the constraint graph a Lamport timestamp (Lamport, 1978) that respects HB order: $e\prec_{\textsc{\tiny{HB}}}e^{\prime}\implies\mathit{ts}(e)<\mathit{ts}(e^{\prime})$ .

Handling races.

To keep finding real races after the first detected race, whenever an analysis detects a race, it updates vector clocks (and WDP’s constraint graph) so that the execution so far is race free. SDP and WDP analyses treat racing accesses as though minimal critical sections on the same lock protected them, as described in Section 5. HB, WCP, and DC analyses handle detected races by adding ordering between all accesses.

If an analysis detects multiple races involving the current access, it reports only one of the races—the best-estimate “shortest” race according to the Lamport timestamps—but adds ordering to eliminate all of the races.

Vindication.

WDP analysis constructs a constraint graph representing the observed execution’s WDP ordering. When the execution completes, the implementation calls VindicateWDPRace on a configurable subset of the WDP-races, e.g., each WDP-race that is not also a SDP-race.

7.2. Methodology

The experiments execute large, real Java programs harnessed as the DaCapo benchmarks (Blackburn et al., 2006), version 9.12-bach. We use a version of the DaCapo programs that the RoadRunner authors have modified to work with RoadRunner;999https://github.com/stephenfreund/RoadRunner/releases/tag/v0.5 the resulting workloads are approximately equal to DaCapo’s default workload. The experiments exclude DaCapo programs eclipse, tradebeans, and tradesoap, which the RoadRunner authors have not modified to run with RoadRunner; jython, which failed to run with RoadRunner in our environment; and the single-threaded program fop.

The experiments execute on a quiet Intel Xeon E5-4620 with four 8-core processors with hyperthreading disabled and 256 GB of main memory, running Linux 3.10.0. We execute RoadRunner with the HotSpot 1.8.0 JVM and set the maximum heap size to 245 GB.

We run various combinations of the analyses to collect race results and statistics and measure performance. To account for run-to-run variation, each reported result is the mean of five trials.

Each WDP-race in an execution is a dynamic WDP-race (similarly for SDP-, DC-, WCP-, and HB-races). Among dynamic WDP-races, some may be detected at the same static accesses. If two dynamic WDP-races have the same two static source location regardless of order, then they are the same static WDP-race (similarly for SDP-, DC-, WCP-, and HB-races).

7.3. Dynamic Characteristics

Table 5 shows properties of the analyzed programs. The #Thr column reports total threads created by an execution and, in parentheses, threads active at termination. The rest of the columns count events from WDP analysis; other analyses are similar but exclude branch events. Total events are all executed events instrumented by the analysis.

Analyzed events are the events not filtered by the fast path that detects redundant events. The rest of the columns show the breakdown of analyzed events by event type. The percentages do not add up to 100% because they do not include other events (e.g., fork, join, wait, volatile access, and static class initializer events), which are always less than 1% of analyzed events. Unsurprisingly, most analyzed events are memory accesses or branches.

7.4. Race Detection Effectiveness

Table 6 reports detected races for two different experiments that each run a combination of analyses on the same executions. Table 6(a)’s results are from an experiment that runs HB, WCP, and SDP analyses together on the same executions, to compare these analyses’ race detection capabilities directly. Likewise, a separate experiment runs DC and WDP analyses together on the same executions to make them directly comparable, resulting in Table 6(b)’s results.

For each race count, the first value is static races, followed by dynamic races in parentheses. For example, on average over the five trials, the analysis detects about 406,000 WDP-races for avrora, which each correspond to one of 5 different unordered pairs of static program locations.

Table 6(a) shows that SDP analysis finds significantly more races than not only HB analysis but also WCP analysis—the state of the art in unbounded sound predictive race detection (Flanagan and Freund, 2017; Kini et al., 2017). These additional races are due to SDP incorporating data dependence more precisely than WCP by not ordering write–write conflicting critical sections, essentially permitting predictable traces that swap writes without changing a causal read’s last writer.

Likewise, Table 6(b) shows that WDP analysis finds more races than DC analysis, the state of the art in high-coverage unbounded predictive race detection (Roemer et al., 2018). These additional races result from WDP being more precise with respect to both data and control dependence than DC, and in fact being complete.

The counts of HB-, WCP-, and DC-races we report here are significantly different from those reported by the Vindicator paper (Roemer et al., 2018). (While the counts are not directly comparable, both papers show similar trends between relations.) The most significant cause of this effect is that RoadRunner stops tracking a field after the field has 100 races, a behavior that Vindicator used but that we disabled for these results to avoid artificially underreporting race counts. Furthermore, our analyses do not use a Vindicator optimization that merges events, reducing the number of races reported when there are multiple races between synchronization-free regions. We disabled this optimization because WDP analysis must track variable access information for each event, negating the advantages of this optimization. Another difference is that the Vindicator experiments spawned fewer threads for some benchmarks by setting the number of available cores to 8.

Table 8 reports results from an experiment that runs SDP and WDP analyses together and then performs VindicateWDPRace on WDP-only races, which are WDP-races that are not also SDP-races. The SDP-races and WDP-races columns report static and dynamic races, as in Table 6. The WDP-only column is static WDP-only races, which are static WDP-races that have no dynamic instances that are SDP-races. The last column, WDP-only $\rightarrow$ Verified, reports how many static WDP-only races are detected and how many are successfully vindicated as true races by VindicateWDPRace. In this experiment, the implementation tries to vindicate up to 10 dynamic instances of each static WDP-only race. The implementation first attempts to vindicate the five earliest dynamic instances of a static WDP-only race, then five random dynamic instances, stopping as soon as it verifies any dynamic instance of the static race.

Over half of the static WDP-only races are verified predictable races: out of 359 static WDP-only races on average, 189 are verified predictable races. As Section 7.5 shows, it can take a few minutes for VindicateWDPRace to check a WDP-race. Given the difficulty and importance of detecting unknown, hard-to-expose data races in mature software—and the amount of time developers currently spend on testing and debugging—the time for VindicateWDPRace is reasonable.

We confirmed that in all of the experiments, SDP analysis detected every race detected by WCP analysis, and WDP analysis detected every race detected by DC or SDP analysis. For sanity, we also successfully vindicated every reported SDP-only race.

Race characteristics.

SMT-solver-based predictive race detectors can be as precise as SDP and WDP analyses, but cannot scale to unbounded program executions (Huang et al., 2014; Said et al., 2011; Huang and Rajagopalan, 2016; Chen et al., 2008; Şerbănuţă et al., 2013; Liu et al., 2016) (Section 8). These approaches typically analyze bounded windows of an execution trace, missing races involving “far apart” events. We can estimate whether SMT-based approaches would miss a predictable race by computing the race’s event distance, which is the number of events in the execution trace between the race’s two events. Since our implementation does not compute a total order of events, it approximates event distance using Lamport timestamps: event distance is the number of events $e$ such that $\mathit{ts}(e_{1})<\mathit{ts}(e)<\mathit{ts}(e_{2})$ .

Table 8 reports the distribution of event distances between accesses in each successfully vindicated WDP-only race (i.e., the last column of Table 8). The average distance and standard deviation are across all trials. The last column reports the greatest distance found among all trials.

7.5. Performance

Table 9 reports the run-time performance of various combinations of analyses. Base is native execution time without any instrumentation. Other columns (excluding Failed and Verified) are slowdowns relative to Base.

The Instr. only columns are RoadRunner configurations that instrument events (excluding or including branches) but perform no analysis in the instrumentation.

The Analyses w/o constraint graph show configurations that do not construct a constraint graph. Only configurations including WDP analysis instrument branch events. The WCP and SDP columns show the slowdowns from running the WCP and SDP analyses independently; the performance difference between them is usually modest, suggesting that there is minimal performance penalty from using SDP analysis over WCP analysis.

SDP+DC represents performing SDP and DC analyses together. We run DC analysis with SDP analysis to minimize DC-races that need vindication. (Vindicator combined DC analysis with WCP analysis for this purpose (Roemer et al., 2018), but SDP analysis is more powerful.)

SDP+WDP+graph represents the canonical use case for WDP analysis. This configuration performs SDP and WDP analyses and constructs the constraint graph to enable vindication. It uses SDP to reduce how many WDP-races need vindication. For comparison purposes, SDP+WDP forgoes constructing the constraint graph, showing the cost of constructing the graph, which we have not optimized. SDP+WDP is slower than SDP+DC because WDP analysis is generally more complex than DC analysis.

Finally, Failed and Verified are the average times taken for each dynamic race that VindicateWDPRace fails to verify or successfully verifies, respectively. Vindication times vary significantly across programs; vindication is particularly slow for tomcat because most of its racing accesses are separated by millions of events (Table 8). Vindication is slow for h2, even though its races are not far apart, because VindicateWDPRace discovers new critical section constraints that require analyzing over 500 million events.

7.6. Summary and Discussion

Our SDP- and WDP-based approaches are slower than other predictive approaches, but they find more races, some of which are millions of events apart. SMT-based approaches would not be able to find these far-apart races because they cannot scale past analyzing bounded windows of executions (Huang et al., 2014; Said et al., 2011; Huang and Rajagopalan, 2016; Chen et al., 2008; Şerbănuţă et al., 2013; Liu et al., 2016) (Section 8). Notably, RVPredict, which (like WDP) incorporates precise control and data dependence, uses an analysis window of 10,000 events (Huang et al., 2014), meaning it would miss many of the predictable races detected and verified by our approach.

Our evaluation demonstrates the power of SDP and WDP analyses to find more races than prior approaches in a single execution. A potential limitation of the evaluation is that it does not compare our analyses with approaches that perform HB analysis on multiple executions (e.g., using one of the many schedule exploration approaches; Section 8). Our work’s aim is to push the limit on what can be found in a single execution, which is essentially complementary to approaches that explore multiple schedules. No other known sound technique could have predicted all of these races from the observed executions.

8. Related Work

This section describes and compares with prior work, starting with the most closely related work.

Unbounded predictive analysis.

Prior work introduces unbounded predictive analyses, weak-causally-precedes (WCP) and doesn’t-commute (DC) analyses (Kini et al., 2017; Roemer et al., 2018), which Sections 2 and 7 covered and evaluated in detail. SDP and WDP analyses predict more races in real programs than WCP and DC analyses, respectively (Section 7).

The WCP relation is weaker (i.e., detects more races) than Smaragdakis et al.’s earlier causally-precedes (CP) relation (Smaragdakis et al., 2012). Smaragdakis et al.’s implementation detects races within bounded windows of 500 events because of the difficulty of developing an efficient unbounded analysis for CP (Smaragdakis et al., 2012; Roemer and Bond, 2019).

Recent work introduces the afterward-confirm (AC) relation and an approach called DigHR for computing AC (Luo et al., 2018). AC is the same as CP except that it removes write–write conflicting critical section ordering. Despite this similarity with our work, the contributions differ significantly, and the DigHR work has major correctness issues. Foremost, the DigHR paper claims incorrectly that AC is sound. AC is, to the best of our understanding, unsound: its removal of write–write ordering leads to detecting false races, including for the execution in Figure 3(c) (with the br event omitted; DigHR’s event model does not include branches). The DigHR paper provides a soundness proof, which we believe is incorrect as a result of informality leading to not covering cases such as Figure 3(c). In contrast with DigHR, our work introduces a sound relaxation of WCP (SDP). Additionally, our work introduces a complete relation (WDP), handles control dependencies (br events), and presents linear-time analyses for SDP and WDP (DigHR is superlinear, like existing CP analyses (Smaragdakis et al., 2012; Roemer and Bond, 2019)).

Concurrently with our work, Pavlogiannis introduces a predictive race detection approach called M2 that is related to vindication (Pavlogiannis, 2019). Pavlogiannis uses lockset analysis as an imprecise filter for potential races checked by M2, while our work introduces linear-time WDP analysis as a less-imprecise filter for potential races checked by VindicateWDPRace. Although Pavlogiannis reports performance that is sometimes competitive with the performance of HB, WCP, and DC analyses, Pavlogiannis’s implementations of these analyses perform extra passes over execution traces in addition to the efficient single-pass vector-clock-based analyses from prior work (Kini et al., 2017; Roemer et al., 2018). It is unclear to us how M2 and VindicateWDPRace would compare in terms of detection capability (aside from the fact that only VindicateWDPRace takes branches and control dependence into account). In addition to these differences, our work incorporates branches and control dependence sensitivity, while Pavlogiannis’s work does not and thus would miss races such as Figures 1(b) and 5; and our work introduces a sound partial order and linear-time analysis (SDP and SDP analysis).

Our concurrent work introduces the SmartTrack algorithm, which optimizes the performance of WCP and DC analyses (Roemer et al., 2020). SmartTrack’s optimizations apply to analyses that compute predictive relations that order all pairs of conflicting accesses—a property that SDP and WDP do not conform to. In any case, optimizing WDP analysis would have limited benefit because the analysis still must construct a constraint graph in order to perform vindication (a necessity considering that so many WDP-races fail vindication in practice). The SmartTrack paper also introduces a new relation weak-doesn’t-commute (WDC) that is a weak variant of DC (Roemer et al., 2020). Unlike SDP and WDP, WDC does not help find more races than DC, but rather serves to improve analysis performance.

Bounded predictive approaches.

Other approaches predict data races by generating and solving satisfiability modulo theories (SMT) constraints (Huang et al., 2014; Said et al., 2011; Huang and Rajagopalan, 2016; Chen et al., 2008; Şerbănuţă et al., 2013; Liu et al., 2016). These SMT-based approaches cannot analyze long execution traces in reasonable time, because constraints are quadratic or cubic in trace length, and constraint-solving time often grows superlinearly with constraints. These approaches thus break traces into bounded “windows” of traces (e.g., 500–10,000 events each), missing predictable races involving accesses that occur further apart in the trace.

One advantage of SMT-based approaches is that they can be both sound and complete (within a bounded window) by encoding precise constraints. Notably, RVPredict includes branches in its execution model (Huang et al., 2014). (RVPredict also incorporates values into its event model, so a read in a predictable trace can have a different last writer as long as it writes the same value (Huang et al., 2014).) RVPredict is thus complete by Section 2’s definition, except that it is practically limited to bounded windows of execution. WDP analysis, on the other hand, is complete without the windowing limitation, but VindicateWDPRace is not guaranteed to vindicate a WDP-race even when a predictable race exists.

Schedule exploration.

In contrast to predictive analysis, schedule exploration approaches execute the program multiple times to explore more program behaviors (Huang, 2015; Huang and Huang, 2017; Musuvathi and Qadeer, 2007; Burckhardt et al., 2010; Eslamimehr and Palsberg, 2014; Sen, 2008; Cai and Cao, 2015; Henzinger et al., 2004). These approaches may be systematic (often called model checking) or be based on randomness or heuristics. Schedule exploration is complementary to predictive analysis, which aims to glean as much as possible from a given execution. Maximal causality reduction (MCR) combines schedule exploration with predictive analysis (Huang, 2015; Huang and Huang, 2017). MCR-S incorporates static control flow information to reduce the complexity of MCR’s generated SMT constraints (Huang and Huang, 2017).

Other analyses.

Widely used data race detectors typically use dynamic happens-before (HB) analysis (Lamport, 1978; Flanagan and Freund, 2009; Pozniansky and Schuster, 2007; Elmas et al., 2007; Serebryany and Iskhodzhanov, 2009; Serebryany et al., 2012; Intel Corporation, 2016). HB analysis cannot predict races involving reordered critical sections on the same lock. The detected races thus depend heavily on the scheduling of the analyzed program. Other analyses find a subset of HB-races by detecting simultaneously executing conflicting accesses or regions (Veeraraghavan et al., 2011; Biswas et al., 2015; Effinger-Dean et al., 2012; Erickson et al., 2010; Sen, 2008; Biswas et al., 2017).

Lockset analysis checks a locking discipline, ensuring that all pairs of conflicting accesses hold some common lock (Savage et al., 1997; Nishiyama, 2004; Choi et al., 2002; von Praun and Gross, 2001). Lockset analysis is predictive but unsound, reporting false races for synchronization patterns other than the locking discipline. Hybrid lockset–HB analyses generally incur disadvantages of one or both kinds of analysis (Dinning and Schonberg, 1991; O’Callahan and Choi, 2003; Yu et al., 2005; Pozniansky and Schuster, 2007).

Sampling-based analysis trades coverage for performance (opposite of predictive analysis) in order to detect data races in production (Bond et al., 2010; Erickson et al., 2010; Marino et al., 2009; Kasikci et al., 2013; Biswas et al., 2017; Sheng et al., 2011; Zhang et al., 2017). Custom hardware support can detect data races with low performance cost but has not been implemented (Devietti et al., 2012; Zhou et al., 2007; Wood et al., 2014; Segulja and Abdelrahman, 2015; Peng et al., 2017; Lucia et al., 2010; Marino et al., 2010; Singh et al., 2011).

Dynamic analysis can estimate the likely harm of a data race (Boehm, 2011; Kasikci et al., 2015; Narayanasamy et al., 2007; Cao et al., 2016; Flanagan and Freund, 2010a; Burnim et al., 2011), which is orthogonal to detection. All data races are erroneous under language memory models that ascribe them undefined semantics (Adve and Boehm, 2010; Boehm and Adve, 2008; Manson et al., 2005; Boehm and Demsky, 2014; Boehm, 2012; Boehm and Adve, 2012; Ševčík and Aspinall, 2008). Java’s memory model defines weak semantics for data races (Manson et al., 2005), but inadvertently prohibits common JVM optimizations (Ševčík and Aspinall, 2008; Boehm and Demsky, 2014).

Static program analysis can detect all races across all feasible executions of a program (von Praun and Gross, 2003; Naik et al., 2006; Naik and Aiken, 2007; Pratikakis et al., 2006; Engler and Ashcraft, 2003; Voung et al., 2007), but it reports thousands of false races for real programs (Biswas et al., 2017; Lee et al., 2012).

Avoiding or tolerating data races.

New languages and type systems can ensure data race freedom, but require significant programmer effort (Bocchino et al., 2009; Boyapati et al., 2002; Rinard and Lam, 1998; Matsakis and Klock, 2014; Abadi et al., 2006; Flanagan and Freund, 2007). Compilers and hardware can provide well-defined semantics for data races, but incur high run-time costs or hardware complexity (Segulja and Abdelrahman, 2015; Lucia et al., 2010; Marino et al., 2010; Singh et al., 2011; Marino et al., 2011; Singh et al., 2012; Sura et al., 2005; Sengupta et al., 2015; Ahn et al., 2009; Ouyang et al., 2013).

9. Conclusion

SDP and WDP analyses improve over existing predictive analyses by incorporating precise notions of data and control dependence, finding more races both in theory and in practice while retaining linear (in trace length) run time and thus unbounded analysis. SDP analysis maintains WCP analysis’s soundness while increasing race coverage. WDP analysis finds all data races that can be predicted from an observed execution; not all WDP-races are predictable races, but VindicateWDPRace can efficiently filter false races. Experiments show that our new approaches find many predictable races in real programs that prior approaches are unable to find. These properties and results suggest that our contributions advance the state of the art in predictive race detection analysis.

Acknowledgments

We thank Rob LaTour for early help with this project. Thanks to Steve Freund for making RoadRunner publicly available and providing help with using and modifying it. Thanks to Andreas Pavlogiannis, Umang Mathur, and Mahesh Viswanathan for providing the execution in Figure 3(d), which is a soundness counterexample for a previous version of SDP. Thanks to the anonymous reviewers for their thorough and insightful feedback.

Appendix A Proof of WDP Completeness Helper Lemma

Proof of Lemma 4.2.

We proceed by induction on the WDP-distance of two events, $\mathit{d(e,e^{\prime})}$ , defined as follows (the WDP properties refer to Table 2):

[TABLE]

Base case

Let $e_{1}$ and $e_{2}$ be events in a trace $\mathit{tr}$ such that $\mathit{d(e_{1},e_{2})}=0$ and $e_{1}\prec_{\textsc{\tiny{WDP}}}e_{2}$ . Since $\mathit{d(e_{1},e_{2})}=0$ , $e_{1}$ and $e_{2}$ are ordered directly, using PO or WDP’s conflicting critical section ordering:

If $e_{1}\prec_{\textsc{\tiny{WDP}}}e_{2}$ by PO, then $e_{1}\prec_{\textsc{\tiny{PO}}}e_{2}$ . Let $\mathit{tr^{\prime}}$ be a predictable trace of $\mathit{tr}$ in which $e_{2}<_{\textsc{$ \mathit{tr^{\prime}} $}}e_{1}$ or $e_{2}\in\mathit{tr^{\prime}}\land e_{1}\notin\mathit{tr^{\prime}}$ , either of which violates the PO rule of a predictable traces.

If $e_{1}\prec_{\textsc{\tiny{WDP}}}e_{2}$ by WDP’s conflicting critical section ordering, then $e_{1}$ is a release event, $e_{2}$ is a branch event, there exists a release event $r_{2}$ over the same lock as $e_{1}$ , and there exists write event $e$ and read event $e^{\prime}$ such that $e\in\mathit{CS(}e_{1})$ , $e^{\prime}\in\mathit{CS(}r_{2})$ , $e\asymp e^{\prime}$ , $\mathit{lastwr}_{\mathit{tr}}(e^{\prime})=e$ , and $\mathit{brDepsOn}(e_{2},e^{\prime})$ . Let $\mathit{tr^{\prime}}$ be a predictable trace of $\mathit{tr}$ where $e_{2}<_{\textsc{$ \mathit{tr^{\prime}} $}}e_{1}$ or $e_{2}\in\mathit{tr^{\prime}}\land e_{1}\notin\mathit{tr^{\prime}}$ . Then in either case, $r_{2}<_{\textsc{$ \mathit{tr^{\prime}} $}}a$ or $r_{2}\in\mathit{tr^{\prime}}\land a\notin\mathit{tr^{\prime}}$ , where $a$ is the matching acquire of $e_{1}$ ; otherwise $\mathit{tr^{\prime}}$ would be an invalid predictable trace due to the LS rule of predictable traces. As a result, $e^{\prime}<_{\textsc{$ \mathit{tr^{\prime}} $}}e$ or $e^{\prime}\in\mathit{tr^{\prime}}\land e\notin\mathit{tr^{\prime}}$ , which means $\mathit{lastwr}_{\mathit{tr^{\prime}}}(e^{\prime})\neq\mathit{lastwr}_{\mathit{tr}}(e^{\prime})$ . Furthermore, $e^{\prime}$ is a causal read in $\mathit{tr^{\prime}}$ as $\mathit{brDepsOn}(e_{2},e^{\prime})$ and $e_{2}\in\mathit{tr^{\prime}}$ . As a result, $\mathit{tr^{\prime}}$ violates the LW rule of predictable traces.

Inductive step

Given some $k>0$ , suppose that the lemma statement holds for $\mathit{d(e_{1},e_{2})}<k$ (induction hypothesis).

Let $e_{1}$ and $e_{2}$ be events in a trace $\mathit{tr}$ such that $\mathit{d(e_{1},e_{2})}=k$ and $e_{1}\prec_{\textsc{\tiny{WDP}}}e_{2}$ . Since $\mathit{d(e_{1},e_{2})}>0$ , $e_{1}$ and $e_{2}$ are ordered indirectly, using WDP transitivity or “ $\textsf{acq(m)}\prec_{\textsc{\tiny{WDP}}}\textsf{rel(m)}\implies\textsf{rel(m)}\prec_{\textsc{\tiny{WDP}}}\textsf{rel(m)}$ ”:

If $e_{1}\prec_{\textsc{\tiny{WDP}}}e_{2}$ by WDP transitivity, then there must be an event $e$ such that $e_{1}\prec_{\textsc{\tiny{WDP}}}e\land e\prec_{\textsc{\tiny{WDP}}}e_{2}$ and $\mathit{d(e_{1},e_{2})}=1+\mathit{d(e_{1},e)}+\mathit{d(e,e_{2})}$ . Let $\mathit{tr^{\prime}}$ be a predictable trace of $\mathit{tr}$ in which $e_{2}<_{\textsc{$ \mathit{tr^{\prime}} $}}e_{1}$ or $e_{2}\in\mathit{tr^{\prime}}\land e_{1}\notin\mathit{tr^{\prime}}$ . Then either $e<_{\textsc{$ \mathit{tr^{\prime}} $}}e_{1}$ , $e_{2}<_{\textsc{$ \mathit{tr^{\prime}} $}}e$ , $e_{2}\in\mathit{tr^{\prime}}\land e\notin\mathit{tr^{\prime}}$ , or $e\in\mathit{tr^{\prime}}\land e_{1}\notin\mathit{tr^{\prime}}$ . Any of these possibilities makes $\mathit{tr^{\prime}}$ an invalid predictable trace of $\mathit{tr}$ according to the induction hypothesis on $e_{1}\prec_{\textsc{\tiny{WDP}}}e$ and $e\prec_{\textsc{\tiny{WDP}}}e_{2}$ .

If $e_{1}\prec_{\textsc{\tiny{WDP}}}e_{2}$ by “ $\textsf{acq(m)}\prec_{\textsc{\tiny{WDP}}}\textsf{rel(m)}\implies\textsf{rel(m)}\prec_{\textsc{\tiny{WDP}}}\textsf{rel(m)}$ ”, then there exist acquire events $a_{1}$ and $a_{2}$ on the same lock such that $e_{1}$ and $e_{2}$ are their corresponding releases, $a_{1}\prec_{\textsc{\tiny{WDP}}}e_{2}$ , and $\mathit{d(e_{1},e_{2})}=1+\mathit{d(\mathit{A}(e),e^{\prime})}$ . Let $\mathit{tr^{\prime}}$ be a predictable trace of $\mathit{tr}$ where $e_{2}<_{\textsc{$ \mathit{tr^{\prime}} $}}e_{1}$ or $e_{2}\in\mathit{tr^{\prime}}\land e_{1}\notin\mathit{tr^{\prime}}$ . Then because of the LS rule of predictable traces, either $e_{2}<_{\textsc{$ \mathit{tr^{\prime}} $}}a_{1}$ or $e_{2}\in\mathit{tr^{\prime}}\land a_{1}\notin\mathit{tr^{\prime}}$ . Either of these possibilities makes $\mathit{tr^{\prime}}$ invalid predictable trace of $\mathit{tr}$ according to the induction hypothesis on $a_{1}\prec_{\textsc{\tiny{WDP}}}e_{2}$ . ∎

Appendix B Vindicating WDP-Races

WDP analysis is unsound: not every WDP-race indicates a true predictable race (Section 4). While WDP-races often indicate true predictable races in practice, true and false races alike take hours or days for programmers to understand and fix (Marino et al., 2009; Narayanasamy et al., 2007; Flanagan and Freund, 2009; Godefroid and Nagappan, 2008; Bessey et al., 2010). We thus introduce an algorithm called VindicateWDPRace that determines whether a WDP-race is a true predictable race. As shown in Algorithm 4, VindicateWDPRace takes as input a single WDP-race and a constraint graph corresponding to the execution trace with WDP ordering, and either returns a valid reordering exposing a predictable race, or reports failure.

VindicateWDPRace extends prior work’s algorithm for vindicating DC-races (Roemer et al., 2018), referred to here as VindicateDCRace. The main challenge in developing VindicateWDPRace is that unlike DC, WDP does not in general order a read after its last writer, since only causal reads in a predictable trace $\mathit{tr^{\prime}}$ need to be ordered after their last writers from $\mathit{tr}$ . VindicateWDPRace thus computes reads that must be causal reads in $\mathit{tr^{\prime}}$ , both when computing constraints and when constructing the reordered trace $\mathit{tr^{\prime}}$ .

The rest of this section first defines the constraint graph and then explains VindicateWDPRace and its component procedures.

B.1. Constraint Graph

The constraint graph, $G$ , is a directed graph in which the nodes are the events in $\mathit{tr}$ and the edges represent reordering constraints on events in any reordered trace $\mathit{tr^{\prime}}$ . Notationally, we represent $G$ as a set of edges, e.g., $(e,e^{\prime})\in G$ .

We use the notation $e\leadsto_{G}e^{\prime}$ to indicate that $e^{\prime}$ is reachable from $e$ in $G$ :

[TABLE]

The initial constraint graph is constructed by WDP analysis, which creates a node for each event and adds edges that correspond to WDP ordering between events. That is, initially the following property holds:

[TABLE]

Figures 2(a), 2(b), 3(a), 3(c), 4(a), and 4(c) (pages 2(a)–4(a)) essentially show initial constraint graphs for traces that have a WDP-race but no DC-race. The reader should consider only the edges labeled “WDP” to be part of the constraint graph, ignoring edges labeled “DC” and “SDP.” The arrows in the figures represent edges corresponding to WDP conflicting critical section ordering.101010These figures do not explicitly depict any “ $\textsf{acq(m)}\prec_{\textsc{\tiny{WDP}}}\textsf{rel(m)}\implies\textsf{rel(m)}\prec_{\textsc{\tiny{WDP}}}\textsf{rel(m)}$ ” edges because all such edges are already implied by other edges in the examples, e.g., Figure 4(c)’s rel(m) events are already ordered by a conflicting critical section edge composed with PO. The figures omit showing PO edges that exist between events by the same thread.

B.2. AddConstraints

The initial constraint graph $G$ is insufficiently constrained—a reordered trace that obeys $G$ ’s constraints will not in general be a valid reordering that exposes a predictable race—both because WDP is unsound and because $G$ ’s constraints do not ensure that a WDP-race’s accesses occur consecutively.

VindicateWDPRace first calls AddConstraints (at line 2 in Algorithm 4), which adds necessary constraints for a valid reordering exposing a predictable race: constraints that (1) force the input WDP-race’s accesses to be consecutive (to satisfy the definition of predictable race), (2) order some causal events (to satisfy the LW rule of valid reordering), and (3) order some critical sections (to satisfy the LS rule of valid reordering).

B.2.1. Consecutive-event constraints

For an input WDP-race pair $e_{1}$ and $e_{2}$ to be a predictable race, $e_{1}$ and $e_{2}$ must be consecutive in a valid reordering. To force this ordering, every event ordered before $e_{1}$ or $e_{2}$ must be ordered before both $e_{1}$ and $e_{2}$ . Lines 9 and 10 of Algorithm 4 add consecutive-event constraints to $G$ . For every immediate predecessor event $\mathit{src}$ of $e_{1}$ or $e_{2}$ in $G$ , AddConstraints adds a constraint from $\mathit{src}$ to $e_{2}$ or $e_{1}$ , respectively.

Figures 7(a) and 7(a) show constraint graphs after adding consecutive-event constraints (dashed lines). Note that edges from $e_{2}$ to $e_{1}$ ’s predecessor typically point backward relative to $<_{\textsc{$ \mathit{tr} $}}$ order.

AddConstraints adds the new constraint edges not only to $G$ but also to a new set $C$ whose edges are the starting point for discovering ordering constraints on critical sections (described later).

B.2.2. Last-writer constraints

After adding consecutive-event constraints, AddConstraints identifies and adds ordering constraints for causal reads, called last-writer (LW) constraints (lines 13–16 in Algorithm 4). These constraints have the following form: if a read event $e$ may affect the outcome of some branch that must be in any valid reordering (i.e., a branch ordered in $G$ before $e_{1}$ or $e_{2}$ ), then the last writer of $e$ , if any, must be ordered to the read in any valid reordering.

These read events are exactly the set of causal reads (Definition 2.2). AddConstraints identifies the causal reads by calling a procedure GetCausalReads at line 13. GetCausalReads (definition not shown) performs a backward traversal of $G$ starting from $e_{1}$ and $e_{2}$ to identify reads that are causal events according to the recursive definition of causal events. This traversal includes not only constraint edges in $G$ but also includes traversing backward from any causal read to its last writer (if any), as computed by WDP analysis. Note that edges between a causal read and its last writer do not in general exist in $G$ (intuitively, WDP analysis cannot know which reads must be causal reads in valid reorderings for a particular WDP-race).

After identifying causal reads, AddConstraints adds edges between each causal read and its last writer, if any, at lines 14–16. These LW constraints are necessary constraints on a valid reordering; we find that these constraints are useful for identifying false races during AddConstraints (instead of during ConstructReorderedTrace; Section B.3) and for constraining reordering enough during ConstructReorderedTrace so that true races can be successfully vindicated. On the other hand, LW constraints are insufficient to ensure the LW rule of valid reordering: Enforcing merely that each read’s last writer precede it does not ensure preservation of each read’s last writer. AddConstraints could potentially add more constraints, but we have found the added LW constraints to be sufficient in practice for identifying false races during AddConstraints and for enabling ConstructReorderedTrace to vindicate true races.

Figure 7(b) shows the constraint graph from Figure 7(a) after adding the sole LW edge (solid arrow labeled LW). The rd(x) is a causal read (since it can affect the br event, which must be in a valid reordering since $\textsf{br}\leadsto_{G}\textsf{rd(y)}$ ), so AddConstraints adds an edge to the read from its last writer, $\textsf{wr(x)}^{\textsf{T2}}$ (i.e., the wr(x) executed by Thread 2).

Likewise, Figure 7(b) shows the constraint graph from Figure 7(a) after adding LW edges (solid arrows labeled LW). However, AddConstraints does not actually add these edges until after it has added LS edges (dotted arrows): Thread 3’s br event does not reach wr(x) or rd(x) without the LS edges. AddConstraints adds LW and LS constraints iteratively until convergence (lines 11–24). After adding the shown LS constraints (explained below), the next loop iteration of AddConstraints detects that $\textsf{br}\leadsto_{G}\textsf{wr(x)}$ and thus that rd(y) and rd(z) are causal reads, and it adds the shown LW constraints.

B.2.3. Lock-semantics constraints

AddConstraints also identifies and adds necessary ordering constraints on critical sections, called lock-semantics (LS) constraints (lines 18–22 in Algorithm 4). These constraints have the following form: if two critical sections on the same lock are ordered in $G$ , at least in part, and both critical sections are ordered in $G$ , at least in part, before $e_{1}$ or $e_{2}$ , then the critical sections must be fully ordered in any valid reordering. (The helper function $\mathit{L}(e)$ returns m if $e$ is an acq(m) or rel(m).)

Figure 7(c) shows the constraint graph from Figure 7(b) after adding LS constraints (dotted arrows). The algorithm detects that $\textsf{acq(m)}^{\textsf{T3}}\leadsto_{G}\textsf{rel(m)}^{\textsf{T1}}$ and that both critical sections reach at least one access to y, so it adds an edge from $\textsf{rel(m)}^{\textsf{T3}}$ to $\textsf{acq(m)}^{\textsf{T1}}$ . After the first iteration of its do–while loop, AddConstraints finds no new LW or LS edges to add (convergence).

Figure 7(b) shows the constraint graph from Figure 7(a) after adding all LS constraints (dotted arrows) as well as LW constraints. During the first iteration of the loop (lines 11–24), AddConstraints detects that $\textsf{acq(p)}^{\textsf{T5}}\leadsto_{G}\textsf{rel(p)}^{\textsf{T1}}$ and $\textsf{acq(n)}^{\textsf{T4}}\leadsto_{G}\textsf{rel(n)}^{\textsf{T2}}$ and that both pairs of critical sections reach at least one access to x, so it adds edges from $\textsf{rel(p)}^{\textsf{T5}}$ to $\textsf{acq(p)}^{\textsf{T1}}$ and from $\textsf{rel(n)}^{\textsf{T4}}$ to $\textsf{acq(n)}^{\textsf{T2}}$ . During the second loop iteration, because of the added edge from $\textsf{rel(n)}^{\textsf{T4}}$ to $\textsf{acq(n)}^{\textsf{T2}}$ , AddConstraints detects the path $\textsf{acq(m)}^{\textsf{T3}}\leadsto_{G}\textsf{rel(m)}^{\textsf{T2}}$ , and thus adds the edge $\textsf{rel(m)}^{\textsf{T3}}$ to $\textsf{acq(m)}^{\textsf{T2}}$ . Finally, in the third loop iteration, as explained above, GetCausalReads identifies rd(y) and rd(z) as causal events and adds the corresponding LW edges shown in the figure.

B.2.4. Detecting cycles

VindicateWDPRace will definitely not be able to construct a valid reordering that satisfies $G$ ’s constraints if $G$ contains a cycle that reaches $e_{1}$ or $e_{2}$ . At the end of each loop iteration, AddConstraints checks for such a cycle at line 23, and returns an empty constraint graph if it detects a cycle.

Figure 7(c)’s constraint graph is acyclic, and AddConstraints returns the shown constraint graph. In contrast, Figure 7(b)’s constraint graph contains a cycle that reaches $e_{1}$ (and $e_{2}$ ): $\textsf{wr(x)}^{\textsf{T1}}\leadsto_{G}\textsf{rel(p)}^{\textsf{T5}}\leadsto_{G}\textsf{acq(p)}^{\textsf{T1}}\leadsto_{G}\textsf{wr(x)}^{\textsf{T1}}$ . AddConstraints detects the cycle and returns $\emptyset$ , and VindicateWDPRace reports that no predictable race exists.

B.3. ConstructReorderedTrace

If AddConstraints returns an acyclic constraint graph $G$ , a predictable race does not necessarily exist; and if a predictable race does exist, what is a valid reordering that exposes the race? VindicateWDPRace addresses both issues by calling ConstructReorderedTrace (at line 5 in Algorithm 4), which attempts to construct a valid reordering that satisfies $G$ ’s constraints.

Construction algorithm

ConstructReorderedTrace calls AttemptToConstructTrace, which does the work of attempting to construct a valid reordering $\mathit{tr^{\prime}}$ satisfying $G$ ’s constraints.

AttemptToConstructTrace first computes the set of (backward) reachable events $R$ that reach $e_{1}$ or $e_{2}$ (line 33). These events, along with $e_{1}$ and $e_{2}$ , must be in a valid reordering. The algorithm then computes the set of causal reads (line 34), using the GetCausalReads procedure described in Section B.2.2. The causal reads are events that must be in $\mathit{tr^{\prime}}$ and that must each have the same last writer as in $\mathit{tr}$ .

AttemptToConstructTrace constructs $\mathit{tr^{\prime}}$ in reverse order, starting with $e_{2}$ and $e_{1}$ , prepending events to $\mathit{tr^{\prime}}$ until all events in $R$ have been added (lines 35–45). A prepended event must satisfy the constraints in $G$ (line 37), and must not violate the LW or LS rules of valid reordering (line 38).

The algorithm omits the detailed logic for checking LW and LS rules; we describe the logic here briefly. An event in a critical section on m cannot be prepended if m is currently held by a different thread, and a critical section on m must be prepended in its entirety if $\mathit{tr^{\prime}}$ already contains events from another critical section on m. For each causal read, the last write must appear in the reordered trace before the read, and no write to the same variable can interleave the latest write and read. If a causal read has no last writer in $\mathit{tr}$ , then the read must have no last writer $\mathit{tr^{\prime}}$ .

AttemptToConstructTrace is a greedy algorithm: it chooses the latest event in $<_{\textsc{$ \mathit{tr} $}}$ order among the legal events (line 44). The intuition for choosing the latest event (shared by VindicateDCRace (Roemer et al., 2018)) is that the original order of critical sections and memory accesses is most likely to avoid failure to produce a valid reordering, especially for events executed before $e_{1}$ in the original execution.

Retrying construction

As mentioned above in the context of enforcing the LS rule, if $\mathit{tr^{\prime}}$ already contains an acq(m) event, then the algorithm cannot prepend an event $e\in\mathit{CS(}r)$ , where $r$ is a rel(m) event, before first prepending $r$ to $\mathit{tr^{\prime}}$ . However, $r$ may not be in $R$ . If AttemptToConstructTrace encounters this case (line 40), it returns the missing event $r$ (line 41). ConstructReorderedTrace then ensures that $r$ will be part of $R$ (line 29) and again calls AttemptToConstructTrace, which recomputes $R$ and the set of causal reads. In the worst case, $R$ might be missing release events for each critical section that contains a thread’s last event in $R$ , bounding the number of times that ConstructReorderedTrace can retry AttemptToConstructTrace.

AttemptToConstructTrace eventually returns either a valid reordered trace $\mathit{tr^{\prime}}$ that demonstrates a predictable race (line 46), or it fails if no release events are missing and no legal events can be added to $\mathit{tr^{\prime}}$ , in which case it returns an empty trace (line 42).

Discussion

ConstructReorderedTrace is sound: if it returns a non-empty trace $\mathit{tr^{\prime}}$ , then $\mathit{tr^{\prime}}$ is a valid reordering in which $e_{1}$ and $e_{2}$ are consecutive.

ConstructReorderedTrace is however incomplete. Because of its greedy algorithm for choosing among legal events to prepend, AttemptToConstructTrace may fail to construct a valid reordering even when one exists. (If AttemptToConstructTrace backtracked, it would be complete but incur exponential worst-case complexity.) Since ConstructReorderedTrace is incomplete, if ConstructReorderedTrace fails to construct a valid reordering, VindicateWDPRace reports “Don’t know” (line 7). However, ConstructReorderedTrace never reported “Don’t know” in our experiments (Section 7).

If $\mathit{tr}$ has a predictable deadlock (a deadlock in some valid reordering) but no predictable race, then no valid reordering exposing a predictable race exists, so VindicateWDPRace will report either “No race predictable” or “Don’t know.” It might be possible to extend VindicateWDPRace to check for predictable deadlocks.

B.4. Asymptotic Complexity

As for VindicateDCRace (Roemer et al., 2018), VindicateWDPRace’s worst-case time complexity is O( $N^{k}$ ), i.e., polynomial in $N$ , the number of events in $G$ . Briefly, every loop iteration count is bounded by the number of $G$ ’s nodes and edges. The exact degree $k$ of the polynomial depends on various implementation details. VindicateWDPRace uses $\Omega(N)$ space for $G$ and $\mathit{tr^{\prime}}$ .

Appendix C Results with Confidence Intervals

This section presents Tables 10—15, which show the same results as Section 7, but include 95% confidence intervals for every result, delimited with a $\pm$ sign.

Bibliography102

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1(1)
2Abadi et al . (2006) Martín Abadi, Cormac Flanagan, and Stephen N. Freund. 2006. Types for Safe Locking: Static Race Detection for Java. TOPLAS 28, 2 (2006), 207–255.
3Adve and Boehm (2010) Sarita V. Adve and Hans-J. Boehm. 2010. Memory Models: A Case for Rethinking Parallel Languages and Hardware. CACM 53 (2010), 90–101. Issue 8.
4Ahn et al . (2009) Wonsun Ahn, Shanxiang Qi, Marios Nicolaides, Josep Torrellas, Jae-Woo Lee, Xing Fang, and David Wong. 2009. Bulk Compiler: High-Performance Sequential Consistency through Cooperative Compiler and Hardware Support. In MICRO (New York, New York). 133–144.
5Bessey et al . (2010) Al Bessey, Ken Block, Ben Chelf, Andy Chou, Bryan Fulton, Seth Hallem, Charles Henri-Gros, Asya Kamsky, Scott Mc Peak, and Dawson Engler. 2010. A Few Billion Lines of Code Later: Using Static Analysis to Find Bugs in the Real World. CACM 53, 2 (2010), 66–75.
6Biswas et al . (2017) Swarnendu Biswas, Man Cao, Minjia Zhang, Michael D. Bond, and Benjamin P. Wood. 2017. Lightweight Data Race Detection for Production Runs. In CC (Austin, TX, USA). 11–21.
7Biswas et al . (2015) Swarnendu Biswas, Minjia Zhang, Michael D. Bond, and Brandon Lucia. 2015. Valor: Efficient, Software-Only Region Conflict Exceptions. In OOPSLA (Pittsburgh, PA, USA). 241–259.
8Blackburn et al . (2006) S. M. Blackburn, R. Garner, C. Hoffman, A. M. Khan, K. S. Mc Kinley, R. Bentzur, A. Diwan, D. Feinberg, D. Frampton, S. Z. Guyer, M. Hirzel, A. Hosking, M. Jump, H. Lee, J. E. B. Moss, A. Phansalkar, D. Stefanović, T. Van Drunen, D. von Dincklage, and B. Wiedermann. 2006. The Da Capo Benchmarks: Java Benchmarking Development and Analysis. In OOPSLA . 169–190.