Automated Recurrence Analysis for Almost-Linear Expected-Runtime Bounds

Krishnendu Chatterjee; Hongfei Fu; Aniket Murhekar

arXiv:1705.00314·cs.PL·May 2, 2017

Automated Recurrence Analysis for Almost-Linear Expected-Runtime Bounds

Krishnendu Chatterjee, Hongfei Fu, Aniket Murhekar

PDF

TL;DR

This paper introduces a fast, sound algorithm for automatically deriving almost-linear expected-runtime bounds for recurrence relations in randomized algorithms, improving analysis efficiency over traditional methods.

Contribution

The paper presents a simple linear-time algorithm that automatically infers optimal expected-runtime bounds for classical randomized algorithms, enhancing analysis efficiency and accuracy.

Findings

01

Efficiently derives expected-runtime bounds for classical randomized algorithms.

02

Successfully infers asymptotically optimal bounds for algorithms like QUICK-SORT and COUPON-COLLECTOR.

03

Implemented approach demonstrates practical efficiency in experimental evaluations.

Abstract

We consider the problem of developing automated techniques for solving recurrence relations to aid the expected-runtime analysis of programs. Several classical textbook algorithms have quite efficient expected-runtime complexity, whereas the corresponding worst-case bounds are either inefficient (e.g., QUICK-SORT), or completely ineffective (e.g., COUPON-COLLECTOR). Since the main focus of expected-runtime analysis is to obtain efficient bounds, we consider bounds that are either logarithmic, linear, or almost-linear ( $O (lo g n)$ , $O (n)$ , $O (n \cdot lo g n)$ , respectively, where n represents the input size). Our main contribution is an efficient (simple linear-time algorithm) sound approach for deriving such expected-runtime bounds for the analysis of recurrence relations induced by randomized algorithms. Our approach can infer the asymptotically optimal…

Tables3

Table 1. Table 1: Illustration for Definition 2 where the notations are given in the top-left corner.

Notation	Expression	$𝔣$ , $T$ -term	Over-approximation
$𝔢_{1}$	$T (𝔫 - 1)$	$\ln 𝔫$ , $𝔢_{1}$	$\ln 𝔫 - \frac{1}{𝔫}$
$𝔢_{2}$	$T (⌊ \frac{𝔫}{2} ⌋)$	$\ln 𝔫$ , $𝔢_{2}$	$\ln 𝔫 - \ln 2$
$𝔢_{3}$	$T (⌈ \frac{𝔫}{2} ⌉)$	$\ln 𝔫$ , $𝔢_{3}$	$\ln 𝔫 - \ln 2 + \frac{1}{𝔫}$
$𝔢_{4}$	$\frac{1}{𝔫} \cdot \sum_{𝔧 = 1}^{𝔫 - 1} T (𝔧)$	$\ln 𝔫$ , $𝔢_{4}$	$\ln 𝔫 - 1 - \frac{\ln 𝔫}{2 \cdot 𝔫} + \frac{13}{12} \cdot \frac{1}{𝔫}$
$𝔢_{5}$	$\frac{1}{𝔫} \cdot (\sum_{𝔧 = ⌈ \frac{𝔫}{2} ⌉}^{𝔫 - 1} T (𝔧) + \sum_{𝔧 = ⌊ \frac{𝔫}{2} ⌋}^{𝔫 - 1} T (𝔧))$	$\ln 𝔫$ , $𝔢_{5}$	$\ln 𝔫 - (1 - \ln 2) + \frac{\ln 𝔫}{2 \cdot 𝔫} + \frac{0.6672}{𝔫} + \frac{1}{2 \cdot 𝔫^{2}}$
$𝔣$ , $T$ -term	Over-approximation	$𝔣$ , $T$ -term	Over-approximation
$𝔫$ , $𝔢_{1}$	$𝔫 - 1$	$𝔫 \cdot \ln 𝔫$ , $𝔢_{1}$	$𝔫 \cdot \ln 𝔫 - \ln 𝔫 - 1 + \frac{1}{𝔫}$
$𝔫$ , $𝔢_{2}$	$\frac{𝔫}{2}$	$𝔫 \cdot \ln 𝔫$ , $𝔢_{2}$	$\frac{1}{2} \cdot 𝔫 \cdot \ln 𝔫 - \frac{\ln 2}{2} \cdot 𝔫$
$𝔫$ , $𝔢_{3}$	$\frac{𝔫 + 1}{2}$	$𝔫 \cdot \ln 𝔫$ , $𝔢_{3}$	$\frac{𝔫 \cdot \ln 𝔫}{2} - \frac{\ln 2}{2} \cdot 𝔫 + \frac{1 - \ln 2}{2} + \frac{\ln 𝔫}{2} + \frac{1}{2 \cdot 𝔫}$
$𝔫$ , $𝔢_{4}$	$\frac{𝔫 - 1}{2}$	$𝔫 \cdot \ln 𝔫$ , $𝔢_{4}$	$\frac{𝔫 \cdot \ln 𝔫}{2} - \frac{𝔫}{4} - \frac{\ln 𝔫}{2} + \frac{\ln 𝔫}{12 \cdot 𝔫} + \frac{0.5139}{𝔫}$
$𝔫$ , $𝔢_{5}$	$\frac{3}{4} \cdot 𝔫 - \frac{1}{4 \cdot 𝔫}$	$𝔫 \cdot \ln 𝔫$ , $𝔢_{5}$	$\frac{3}{4} \cdot 𝔫 \cdot \ln 𝔫 - 0.2017 \cdot 𝔫 - \frac{1}{2} \cdot \ln 𝔫$
$𝔫$ , $𝔢_{5}$	$\frac{3}{4} \cdot 𝔫 - \frac{1}{4 \cdot 𝔫}$	$𝔫 \cdot \ln 𝔫$ , $𝔢_{5}$	$- 0.2698 + \frac{\ln 𝔫}{8 \cdot 𝔫} + \frac{1.6369}{𝔫} + \frac{1}{2 \cdot 𝔫 \cdot (𝔫 - 1)} + \frac{1}{4 \cdot 𝔫^{2}}$

Table 2. Table 2: Experimental results where all running times (averaged over 5 5 5 runs) are less than 0.02 0.02 0.02 seconds, between 0.01 0.01 0.01 and 0.02 0.02 0.02 in all cases.

Recur. Rel.	$𝔣$	$ϵ, Dec$	$d$	$d_{100}$	Recur. Rel.	$𝔣$	$ϵ, Dec$	$d$	$d_{100}$
R.-Sear.	$\ln 𝔫$	UniDec	✓	$15.129$	Sort-Sel.	$𝔫 \cdot \ln 𝔫$	UniDec	✓	$16.000$
		$0.5$	$40.107$				$0.5$	$50.052$
		$0.3$	$28.363$				$0.3$	$24.852$
		$0.1$	$21.838$				$0.1$	$17.313$
		$0.01$	$19.762$				$0.01$	$16.000$
Q.-Sort	$𝔫 \cdot \ln 𝔫$	UniDec	✓	$3.172$	Coupon	$𝔫 \cdot \ln 𝔪$	UniDec	✓	$0.910$
		$0.5$	$9.001$				$0.5$	$3.001$
		$0.3$	$6.143$				$0.3$	$1.858$
		$0.1$	$4.556$				$0.1$	$1.223$
		$0.01$	$4.051$				$0.01$	$1.021$
Q.-Select	$𝔫$	UniDec	✓	$7.909$	Res. A	$𝔫 \cdot \ln 𝔪$	UniDec	✓	$2.472$
		$0.5$	$17.001$				$0.5$	$6.437$
		$0.3$	$11.851$				$0.3$	$4.312$
		$0.1$	$9.001$				$0.1$	$3.132$
		$0.01$	$8.091$				$0.01$	$2.756$
Diam. A	$𝔫 \cdot \ln 𝔫$	UniDec	✓	$4.525$	Res. B	$𝔪$	UniDec	✓	$2.691$
		$0.5$	$9.001$				$0.5$	$6.437$
		$0.3$	$6.143$				$0.3$	$4.312$
		$0.1$	$4.556$				$0.1$	$3.132$
		$0.01$	$4.525$				$0.01$	$2.756$
Diam. B	$𝔫$	UniDec	✓	$5.918$	-	-	-	-	-
		$0.5$	$13.001$				-	-
		$0.3$	$9.001$				-	-
		$0.1$	$6.778$				-	-
		$0.01$	$6.071$				-	-

Table 3. Table 3: Detailed experimental results where all running times (averaged over 5 5 5 runs) are less than 0.02 0.02 0.02 seconds (between 0.01 0.01 0.01 and 0.02 0.02 0.02 seconds).

Program	$𝔣$	UniDec	UniSynth(✓)
Program	$𝔣$	UniDec	$ϵ$	$N_{ϵ, p, q}$	$d$	$d_{100}$
R.-Sear.	$\ln 𝔫$	✓	$0.5$	$13$	$40.107$	$15.129$
			$0.3$	$25$	$28.363$
			$0.1$	$97$	$21.838$
			$0.01$	$1398$	$19.762$
Q.-Sort	$\ln 𝔫$	$\times$	-	-	-	-
	$𝔫$	$\times$	-	-	-	-
	$𝔫 \ln 𝔫$	✓	$0.5$	$10$	$9.001$	$3.172$
			$0.3$	$21$	$6.143$
			$0.1$	$91$	$4.556$
			$0.01$	$1458$	$4.051$
Q.-Select	$\ln 𝔫$	$\times$	-	-	-	-
	$𝔫$	✓	$0.5$	$33$	$17.001$	$7.909$
			$0.3$	$54$	$11.851$
			$0.1$	$160$	$9.001$
			$0.01$	$1600$	$8.091$
Diam. A	$\ln 𝔫$	$\times$	-	-	-	-
	$𝔫$	$\times$	-	-	-	-
	$𝔫 \ln 𝔫$	✓	$0.5$	$3$	$9.001$	$4.525$
			$0.3$	$3$	$6.143$
			$0.1$	$4$	$4.556$
			$0.01$	$4$	$4.525$
Diam. B	$\ln 𝔫$	$\times$	-	-	-	-
	$𝔫$	✓	$0.5$	$9$	$13.001$	$5.918$
			$0.3$	$14$	$9.001$
			$0.1$	$40$	$6.778$
			$0.01$	$400$	$6.071$
Sort-Sel.	$\ln 𝔫$	$\times$	-	-	-	-
	$𝔫$	$\times$	-	-	-	-
	$𝔫 \ln 𝔫$	✓	$0.5$	$18$	$50.052$	$16.000$
			$0.3$	$29$	$24.852$
			$0.1$	$87$	$17.313$
			$0.01$	$866$	$16.000$
Coupon	$𝔫 \cdot \ln 𝔪$	✓	$0.5$	$2$	$3.001$	$0.910$
			$0.3$	$2$	$1.858$
			$0.1$	$2$	$1.223$
			$0.01$	$2$	$1.021$
Res. A	$𝔫 \cdot \ln 𝔪$	✓	$0.5$	$2$	$6.437$	$2.472$
			$0.3$	$2$	$4.312$
			$0.1$	$2$	$3.132$
			$0.01$	$2$	$2.756$
Res. B	$\ln 𝔪$	$\times$	-	-	-	-
	$𝔪$	✓	$0.5$	$2$	$6.437$	$2.691$
			$0.3$	$2$	$4.312$
			$0.1$	$2$	$3.132$
			$0.01$	$2$	$2.756$

Equations210

e

e

∣ \frac{\sum _{j = 1}^{n - 1} T ( j )}{n} ∣ \frac{1}{n} \cdot (\sum_{j = ⌈ n /2 ⌉}^{n - 1} T (j) + \sum_{j = ⌊ n / 2 ⌋}^{n - 1} T (j)) ∣ c \cdot e ∣ e + e

eq_{1} : T (n) = e; eq_{2} : T (1) = c

eq_{1} : T (n) = e; eq_{2} : T (1) = c

\textstyle\mathrm{T}(\mathfrak{n})=6+\frac{1}{\mathfrak{n}}\cdot\big{(}\sum_{\mathfrak{j}=\left\lceil\mathfrak{n}/{2}\right\rceil}^{\mathfrak{n}-1}\mathrm{T}(\mathfrak{j})+\sum_{\mathfrak{j}=\left\lfloor\mathfrak{n}/{2}\right\rfloor}^{\mathfrak{n}-1}\mathrm{T}(\mathfrak{j})\big{)}

\textstyle\mathrm{T}(\mathfrak{n})=6+\frac{1}{\mathfrak{n}}\cdot\big{(}\sum_{\mathfrak{j}=\left\lceil\mathfrak{n}/{2}\right\rceil}^{\mathfrak{n}-1}\mathrm{T}(\mathfrak{j})+\sum_{\mathfrak{j}=\left\lfloor\mathfrak{n}/{2}\right\rfloor}^{\mathfrak{n}-1}\mathrm{T}(\mathfrak{j})\big{)}

T (n) = 2 \cdot n + 2 \cdot (\sum_{j = 1}^{n - 1} T (j)) / n

T (n) = 2 \cdot n + 2 \cdot (\sum_{j = 1}^{n - 1} T (j)) / n

T (n) = 4 + 2 \cdot n + \frac{1}{n} \cdot (\sum_{j = ⌊ n /2 ⌋}^{n - 1} T (j) + \sum_{j = ⌈ n /2 ⌉}^{n - 1} T (j))

T (n) = 4 + 2 \cdot n + \frac{1}{n} \cdot (\sum_{j = ⌊ n /2 ⌋}^{n - 1} T (j) + \sum_{j = ⌈ n /2 ⌉}^{n - 1} T (j))

T (n) = 2 + n + 2 \cdot n \cdot ln n + (\sum_{j = 1}^{n - 1} T (j)) / n;

T (n) = 2 + n + 2 \cdot n \cdot ln n + (\sum_{j = 1}^{n - 1} T (j)) / n;

T (n) = 2 + n + 2 \cdot n + (\sum_{j = 1}^{n - 1} T (j)) / n

T (n) = 2 + n + 2 \cdot n + (\sum_{j = 1}^{n - 1} T (j)) / n

T (n) = 4 + T^{*} (n) + T (⌊ n / 2 ⌋) + T (⌈ n / 2 ⌉)

T (n) = 4 + T^{*} (n) + T (⌊ n / 2 ⌋) + T (⌈ n / 2 ⌉)

e

e

∣ \frac{\sum _{j = 1}^{m - 1} T ( n , j )}{m} ∣ \frac{1}{m} \cdot (\sum_{j = ⌈ m / 2 ⌉}^{m - 1} T (n, j) + \sum_{j = ⌊ m / 2 ⌋}^{m - 1} T (n, j)) ∣ c \cdot e ∣ e + e

h

eq_{1} : T (n, m) = e + h \cdot b; eq_{2} : T (n, 1) = h \cdot c

eq_{1} : T (n, m) = e + h \cdot b; eq_{2} : T (n, 1) = h \cdot c

T (n, 1) = n \cdot 1; T (n, m) = n / m + T (n, m - 1)

T (n, 1) = n \cdot 1; T (n, m) = n / m + T (n, m - 1)

T (n, 1) = n \cdot 1; T (n, m) = (n \cdot e) / m + T (n, m - 1)

T (n, 1) = n \cdot 1; T (n, m) = (n \cdot e) / m + T (n, m - 1)

T (n, 1) = 1 \cdot 1; T (n, m) = 1 \cdot e + T (n, m - 1)

T (n, 1) = 1 \cdot 1; T (n, m) = 1 \cdot e + T (n, m - 1)

T_{G} (n) \leq d \cdot Subst (f) (n) + c

T_{G} (n) \leq d \cdot Subst (f) (n) + c

(1) ln n - ln 2 - \frac{1}{n - 1} \leq ln ⌊ \frac{n}{2} ⌋ \leq ln n - ln 2; (2) ln n - ln 2 \leq ln ⌈ \frac{n}{2} ⌉ \leq ln n - ln 2 + \frac{1}{n} .

(1) ln n - ln 2 - \frac{1}{n - 1} \leq ln ⌊ \frac{n}{2} ⌋ \leq ln n - ln 2; (2) ln n - ln 2 \leq ln ⌈ \frac{n}{2} ⌉ \leq ln n - ln 2 + \frac{1}{n} .

2 \cdot (Γ_{l n n} (n) + \frac{1}{12}) - (Γ_{l n n} (⌈ \frac{n}{2} ⌉) + Γ_{l n n} (⌊ \frac{n}{2} ⌋) - 0.5402)

2 \cdot (Γ_{l n n} (n) + \frac{1}{12}) - (Γ_{l n n} (⌈ \frac{n}{2} ⌉) + Γ_{l n n} (⌊ \frac{n}{2} ⌋) - 0.5402)

p (n) = \sum_{i = 0}^{k} a_{i} \cdot n^{i} \cdot ln n + \sum_{i = 0}^{ℓ} b_{i} \cdot n^{i} .

p (n) = \sum_{i = 0}^{k} a_{i} \cdot n^{i} \cdot ln n + \sum_{i = 0}^{ℓ} b_{i} \cdot n^{i} .

d \cdot ln n + 1 \geq 7 + d \cdot [ln n - (1 - ln 2) + \frac{ln n}{2 \cdot n} + \frac{0.6672}{n} + \frac{1}{2 \cdot n ^{2}}]

d \cdot ln n + 1 \geq 7 + d \cdot [ln n - (1 - ln 2) + \frac{ln n}{2 \cdot n} + \frac{0.6672}{n} + \frac{1}{2 \cdot n ^{2}}]

T (n) \leq 6 + \frac{1}{n} \cdot 1 \leq ℓ^{*} < n max (ℓ = 1 \sum ℓ^{*} T (n - ℓ) + ℓ = ℓ^{*} + 1 \sum n T (ℓ - 1))

T (n) \leq 6 + \frac{1}{n} \cdot 1 \leq ℓ^{*} < n max (ℓ = 1 \sum ℓ^{*} T (n - ℓ) + ℓ = ℓ^{*} + 1 \sum n T (ℓ - 1))

T (n) \leq 6 + \frac{1}{n} \cdot max {ℓ = 1 \sum n - 1 T (n - ℓ), ℓ = 2 \sum n T (ℓ - 1)}

T (n) \leq 6 + \frac{1}{n} \cdot max {ℓ = 1 \sum n - 1 T (n - ℓ), ℓ = 2 \sum n T (ℓ - 1)}

T (n) = 6 + \frac{1}{n} \cdot 1 \leq ℓ^{*} < n max (ℓ = 1 \sum ℓ^{*} T (n - ℓ) + ℓ = ℓ^{*} + 1 \sum n T (ℓ - 1))

T (n) = 6 + \frac{1}{n} \cdot 1 \leq ℓ^{*} < n max (ℓ = 1 \sum ℓ^{*} T (n - ℓ) + ℓ = ℓ^{*} + 1 \sum n T (ℓ - 1))

1 \leq ℓ^{*} < n max (ℓ = 1 \sum ℓ^{*} T^{'} (n - ℓ) + ℓ = ℓ^{*} + 1 \sum n T^{'} (ℓ - 1))

1 \leq ℓ^{*} < n max (ℓ = 1 \sum ℓ^{*} T^{'} (n - ℓ) + ℓ = ℓ^{*} + 1 \sum n T^{'} (ℓ - 1))

⎩ ⎨ ⎧ T (n) = 6 + \frac{1}{n} \cdot j = ⌈ \frac{n}{2} ⌉ \sum n - 1 T (j) + j = ⌊ \frac{n}{2} ⌋ \sum n - 1 T (j) T (1) = 1 .

⎩ ⎨ ⎧ T (n) = 6 + \frac{1}{n} \cdot j = ⌈ \frac{n}{2} ⌉ \sum n - 1 T (j) + j = ⌊ \frac{n}{2} ⌋ \sum n - 1 T (j) T (1) = 1 .

T (n) = 2 \cdot n + 2 \cdot (j = 1 \sum n - 1 T (j)) / n

T (n) = 2 \cdot n + 2 \cdot (j = 1 \sum n - 1 T (j)) / n

T (n) = 4 + 2 \cdot n + \frac{1}{n} \cdot 1 \leq ℓ^{*} \leq n max (ℓ = 1 \sum ℓ^{*} - 1 T (n - ℓ) + ℓ = ℓ^{*} + 1 \sum n T (ℓ - 1)) .

T (n) = 4 + 2 \cdot n + \frac{1}{n} \cdot 1 \leq ℓ^{*} \leq n max (ℓ = 1 \sum ℓ^{*} - 1 T (n - ℓ) + ℓ = ℓ^{*} + 1 \sum n T (ℓ - 1)) .

⎩ ⎨ ⎧ T (n) = 4 + 2 \cdot n + \frac{1}{n} \cdot j = ⌊ \frac{n}{2} ⌋ + 1 \sum n - 1 T (j) + j = ⌈ \frac{n}{2} ⌉ \sum n - 1 T (j) T (1) = 1

⎩ ⎨ ⎧ T (n) = 4 + 2 \cdot n + \frac{1}{n} \cdot j = ⌊ \frac{n}{2} ⌋ + 1 \sum n - 1 T (j) + j = ⌈ \frac{n}{2} ⌉ \sum n - 1 T (j) T (1) = 1

⎩ ⎨ ⎧ T (n) = 4 + 2 \cdot n + \frac{1}{n} \cdot j = ⌊ \frac{n}{2} ⌋ \sum n - 1 T (j) + j = ⌈ \frac{n}{2} ⌉ \sum n - 1 T (j) T (1) = 1 .

⎩ ⎨ ⎧ T (n) = 4 + 2 \cdot n + \frac{1}{n} \cdot j = ⌊ \frac{n}{2} ⌋ \sum n - 1 T (j) + j = ⌈ \frac{n}{2} ⌉ \sum n - 1 T (j) T (1) = 1 .

T (n) = 2 + n + 2 \cdot n \cdot ln n + (j = 1 \sum n - 1 T (j)) / n;

T (n) = 2 + n + 2 \cdot n \cdot ln n + (j = 1 \sum n - 1 T (j)) / n;

T (n) = 2 + n + 2 \cdot n + (j = 1 \sum n - 1 T (j)) / n

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

11institutetext: IST Austria, Klosterneuburg, Austria 22institutetext: State Key Laboratory of Computer Science, Institute of Software

Chinese Academy of Sciences, Beijing, P.R. China 33institutetext: IIT Bombay, India

Automated Recurrence Analysis

for Almost-Linear Expected-Runtime Bounds

Krishnendu Chatterjee 11

Hongfei Fu 22

Aniket Murhekar 33

Abstract

We consider the problem of developing automated techniques for solving recurrence relations to aid the expected-runtime analysis of programs. Several classical textbook algorithms have quite efficient expected-runtime complexity, whereas the corresponding worst-case bounds are either inefficient (e.g., Quick-Sort), or completely ineffective (e.g., Coupon-Collector). Since the main focus of expected-runtime analysis is to obtain efficient bounds, we consider bounds that are either logarithmic, linear, or almost-linear ( $\mathcal{O}(\log{n})$ , $\mathcal{O}(n)$ , $\mathcal{O}(n\cdot\log{n})$ , respectively, where $n$ represents the input size). Our main contribution is an efficient (simple linear-time algorithm) sound approach for deriving such expected-runtime bounds for the analysis of recurrence relations induced by randomized algorithms. Our approach can infer the asymptotically optimal expected-runtime bounds for recurrences of classical randomized algorithms, including Randomized-Search, Quick-Sort, Quick-Select, Coupon-Collector, where the worst-case bounds are either inefficient (such as linear as compared to logarithmic of expected-runtime, or quadratic as compared to linear or almost-linear of expected-runtime), or ineffective. We have implemented our approach, and the experimental results show that we obtain the bounds efficiently for the recurrences of various classical algorithms.

1 Introduction

Static analysis for quantitative bounds. Static analysis of programs aims to reason about programs without running them. The most basic properties for static analysis are qualitative properties, such as safety, termination, liveness, that for every trace of a program gives a Yes or No answer (such as assertion violation or not, termination or not). However, recent interest in analysis of resource-constrained systems, such as embedded systems, as well as for performance analysis, quantitative performance characteristics are necessary. For example, the qualitative problem of termination asks whether a given program always terminates, whereas the quantitative problem asks to obtain precise bounds on the number of steps, and is thus a more challenging problem. Hence the problem of automatically reasoning about resource bounds (such as time complexity bounds) of programs is both of significant theoretical as well as practical interest.

Worst-case bounds. The worst-case analysis of programs is the fundamental problem in computer science, which is the basis of algorithms and complexity theory. However, manual proofs of worst-case analysis can be tedious and also require non-trivial mathematical ingenuity, e.g., the book The Art of Computer Programming by Knuth presents a wide range of involved techniques to derive such precise bounds [37, 38]. There has been a considerable research effort for automated analysis of such worst-case bounds for programs, see [23, 24, 26, 27] for excellent expositions on the significance of deriving precise worst-case bounds and the automated methods to derive them. For the worst-case analysis there are several techniques, such as worst-case execution time analysis [47], resource analysis using abstract interpretation and type systems [24, 2, 34, 26, 27], ranking functions [8, 9, 15, 42, 45, 17, 48, 43], as well as recurrence relations [21, 2, 3, 4].

Expected-runtime bounds. While several works have focused on deriving worst-case bounds for programs, quite surprisingly little work has been done to derive precise bounds for expected-runtime analysis, with the exception of [20], which focuses on randomization in combinatorial structures (such as trees). This is despite the fact that expected-runtime analysis is an equally important pillar of theoretical computer science, both in terms of theoretical and practical significance. For example, while for real-time systems with hard constraints worst-case analysis is necessary, for real-time systems with soft constraints the more relevant information is the expected-runtime analysis. Below we highlight three key significance of expected-runtime analysis.

Simplicity and desired properties: The first key aspect is simplicity: often much simpler algorithms (thus simple and efficient implementations) exist for expected-runtime complexity as compared to worst-case complexity. A classic example is the Selection problem that given a set of $n$ numbers and $0\leq k\leq n$ , asks to find the $k$ -th largest number (eg, for median $k=n/2$ ). The classical linear-time algorithm for the problem (see [16, Chapter 9]) is quite involved, and its worst-case analysis to obtain linear time bound is rather complex. In contrast, a much simpler algorithm exists (namely, Quick-Select) that has linear expected-runtime complexity. Moreover, randomized algorithms with expected-runtime complexity enjoy many desired properties, which deterministic algorithms do not have. A basic example is Channel-Conflict Resolution (see Example 7, Section 2.4) where the simple randomized algorithm can be implemented in a distributed or concurrent setting, whereas deterministic algorithms are quite cumbersome. 2. 2.

Efficiency in practice: Since worst-case analysis concerns with corner cases that rarely arise, many algorithms and implementations have much better expected-runtime complexity, and they perform extremely well in practice. A classic example is the Quick-Sort algorithm, that has quadratic worst-case complexity, but almost linear expected-runtime complexity, and is one of the most efficient sorting algorithms in practice. 3. 3.

Worst-case analysis ineffective: In several important cases the worst-case analysis is completely ineffective. For example, consider one of the textbook stochastic process, namely the Coupon-Collector problem, where there are $n$ types of coupons to be collected, and in each round, a coupon type among the $n$ types is obtained uniformly at random. The process stops when all types are collected. The Coupon-Collector process is one of the basic and classical stochastic processes, with numerous applications in network routing, load balancing, etc (see [40, Chapter 3] for applications of Coupon-Collector problems). For the worst-case analysis, the process might not terminate (worst-case bound infinite), but the expected-runtime analysis shows that the expected termination time is $\mathcal{O}(n\cdot\log n)$ .

Challenges. The expected-runtime analysis brings several new challenges as compared to the worst-case analysis. First, for the worst-case complexity bounds, the most classical characterization for analysis of recurrences is the Master Theorem (cf. [16, Chapter 1]) and Akra-Bazzi’s Theorem [1]. However, the expected-runtime analysis problems give rise to recurrences that are not characterized by these theorems since our recurrences normally involve an unbounded summation resulting from a randomized selection of integers from $1$ to $n$ where $n$ is unbounded. Second, techniques like ranking functions (linear or polynomial ranking functions) cannot derive efficient bounds such as $\mathcal{O}(\log n)$ or $\mathcal{O}(n\cdot\log n)$ . While expected-runtime analysis has been considered for combinatorial structures using generating function [20], we are not aware of any automated technique to handle recurrences arising from randomized algorithms.

Analysis problem. We consider the algorithmic analysis problem of recurrences arising naturally for randomized recursive programs. Specifically we consider the following:

•

We consider two classes of recurrences: (a) univariate class with one variable (which represents the array length, or the number of input elements, as required in problems such as Quick-Select, Quick-Sort etc); and (b) separable bivariate class with two variables (where the two independent variables represent the total number of elements and total number of successful cases, respectively, as required in problems such as Coupon-Collector, Channel-Conflict Resolution). The above two classes capture a large class of expected-runtime analysis problems, including all the classical ones mentioned above. Moreover, the main purpose of expected-runtime analysis is to obtain efficient bounds. Hence we focus on the case of logarithmic, linear, and almost-linear bounds (i.e., bounds of form $\mathcal{O}(\log n)$ , $\mathcal{O}(n)$ and $\mathcal{O}(n\cdot\log n)$ , respectively, where $n$ is the size of the input). Moreover, for randomized algorithms, quadratic bounds or higher are rare.

Thus the main problem we consider is to automatically derive such efficient bounds for randomized univariate and separable bivariate recurrence relations.

Our contributions. Our main contribution is a sound approach for analysis of recurrences for expected-runtime analysis. The input to our problem is a recurrence relation and the output is either logarithmic, linear, or almost-linear as the asymptotic bound, or fail. The details of our contributions are as follows:

Efficient algorithm. We first present a linear-time algorithm for the univariate case, which is based on simple comparison of leading terms of pseudo-polynomials. Second, we present a simple reduction for separable bivariate recurrence analysis to the univariate case. Our efficient (linear-time) algorithm can soundly infer logarithmic, linear, and almost-linear bounds for recurrences of one or two variables. 2. 2.

Analysis of classical algorithms. We show that for several classical algorithms, such as Randomized-Search, Quick-Select, Quick-Sort, Coupon-Collector, Channel-Conflict Resolution (see Section 2.2 and Section 2.4 for examples), our sound approach can obtain the asymptotically optimal expected-runtime bounds for the recurrences. In all the cases above, either the worst-case bounds (i) do not exist (e.g., Coupon-Collector), or (ii) are quadratic when the expected-runtime bounds are linear or almost-linear (e.g., Quick-Select, Quick-Sort); or (iii) are linear when the expected-runtime bounds are logarithmic (e.g., Randomized-Search). Thus in cases where the worst-case bounds are either not applicable, or grossly overestimate the expected-runtime bounds, our technique is both efficient (linear-time) and can infer the optimal bounds. 3. 3.

Implementation. Finally, we have implemented our approach, and we present experimental results on the classical examples to show that we can efficiently achieve the automated expected-runtime analysis of randomized recurrence relations.

Novelty and technical contribution. The key novelty of our approach is an automated method to analyze recurrences arising from randomized recursive programs, which are not covered by Master theorem. Our approach is based on a guess-and-check technique. We show that by over-approximating terms in a recurrence relation through integral and Taylor’s expansion, we can soundly infer logarithmic, linear and almost-linear bounds using simple comparison between leading terms of pseudo-polynomials.

2 Recurrence Relations

We present our mini specification language for recurrence relations for expected-runtime analysis. The language is designed to capture running time of recursive randomized algorithms which involve (i) only one function call whose expected-runtime complexity is to be determined, (ii) at most two integer parameters, and (iii) involve randomized-selection or divide-and-conquer techniques. We present our language separately for the univariate and bivariate cases. In the sequel, we denote by $\mathbb{N}$ , $\mathbb{N}_{0}$ , $\mathbb{Z}$ , and $\mathbb{R}$ the sets of all positive integers, non-negative integers, integers, and real numbers, respectively.

2.1 Univariate Randomized Recurrences

Below we define the notion of univariate randomized recurrence relations. First, we introduce the notion of univariate recurrence expressions. Since we only consider single recursive function call, we use ‘ $\mathrm{T}$ ’ to represent the (only) function call. We also use ‘ $\mathfrak{n}$ ’ to represent the only parameter in the function declaration.

Univariate recurrence expressions. The syntax of univariate recurrence expressions $\mathfrak{e}$ is generated by the following grammar:

[TABLE]

where $c\in[1,\infty)$ and $\ln(\centerdot)$ represents the natural logarithm function with base $e$ . Informally, $\mathrm{T}(\mathfrak{n})$ is the (expected) running time of a recursive randomized program which involves only one recursive routine indicated by $\mathrm{T}$ and only one parameter indicated by $\mathfrak{n}$ . Then each $\mathrm{T}(\centerdot)$ -term in the grammar has a direct algorithmic meaning:

•

$\mathrm{T}\left(\mathfrak{n}-1\right)$ may mean a recursion to a sub-array with length decremented by one;

•

$\mathrm{T}\left(\left\lfloor\frac{\mathfrak{n}}{2}\right\rfloor\right)$ and $\mathrm{T}\left(\left\lceil\frac{\mathfrak{n}}{2}\right\rceil\right)$ may mean a recursion related to a divide-and-conquer technique;

•

finally, $\frac{\sum_{\mathfrak{j}=1}^{\mathfrak{n}-1}\mathrm{T}(\mathfrak{j})}{\mathfrak{n}}\mbox{ and }\frac{1}{\mathfrak{n}}\cdot\left(\sum_{\mathfrak{j}=\left\lceil\frac{n}{2}\right\rceil}^{\mathfrak{n}-1}\mathrm{T}(\mathfrak{j})+\sum_{\mathfrak{j}=\left\lfloor\frac{\mathfrak{n}}{2}\right\rfloor}^{\mathfrak{n}-1}\mathrm{T}(\mathfrak{j})\right)$ may mean a recursion related to a randomized selection of an array index.

Substitution. Consider a function $h:\mathbb{N}\rightarrow\mathbb{R}$ and univariate recurrence expression ${\mathfrak{e}}$ . The substitution function, denoted by $\mathsf{Subst}({\mathfrak{e}},h)$ , is the function from $\mathbb{N}$ into $\mathbb{R}$ such that the value for $n$ is obtained by evaluation through substituting $h$ for $\mathrm{T}$ and $n$ for $\mathfrak{n}$ in ${\mathfrak{e}}$ , respectively. Moreover, if $\mathfrak{e}$ does not involve the appearance of ‘ $\mathrm{T}$ ’, then we use the abbreviation $\mathsf{Subst}({\mathfrak{e}})$ i.e., omit $h$ . For example, (i) if ${\mathfrak{e}}=\mathfrak{n}+\mathrm{T}(\mathfrak{n}-1)$ , and $h:n\mapsto n\cdot\log n$ , then $\mathsf{Subst}({\mathfrak{e}},h)$ is the function $n\mapsto n+(n-1)\cdot\log(n-1)$ , and (ii) if ${\mathfrak{e}}=2\cdot\mathfrak{n}$ , then $\mathsf{Subst}({\mathfrak{e}})$ is $n\mapsto 2n$ .

Univariate recurrence relation. A univariate recurrence relation $G=(\mathsf{eq}_{1},\mathsf{eq}_{2})$ is a pair of equalities as follows:

[TABLE]

where $c\in(0,\infty)$ and $\mathfrak{e}$ is a univariate recurrence expression. For a univariate recurrence relation $G$ the evaluation sequence $\mathsf{Eval}(G)$ is as follows: $\mathsf{Eval}(G)(1)=c$ , and for $n\geq 2$ , given $\mathsf{Eval}(G)(i)$ for $1\leq i<n$ , for the value $\mathsf{Eval}(G)(n)$ we evaluate the expression $\mathsf{Subst}(\mathfrak{e},\mathsf{Eval}(G))$ , since in $\mathfrak{e}$ the parameter $\mathfrak{n}$ always decreases and is thus well-defined.

Finite vs infinite solution. Note that the above description gives a computational procedure to compute $\mathsf{Eval}(G)$ for any finite $n$ , in linear time in $n$ through dynamic programming. The interesting question is to algorithmically analyze the infinite behavior. A function $T_{G}:\mathbb{N}\rightarrow\mathbb{R}$ is called a solution to $G$ if $T_{G}(n)=\mathsf{Eval}(G)(n)$ for all $n\geq 1$ . The function $T_{G}$ is unique and explicitly defined as follows: (1) Base Step. $T_{G}(1):=c$ ; and (2) Recursive Step. $T_{G}(n):=\mathsf{Subst}(\mathfrak{e},T_{G})(n)$ for all $n\geq 2$ . The interesting algorithmic question is to reason about the asymptotic infinite behaviour of $T_{G}$ .

2.2 Motivating Classical Examples

In this section we present several classical examples of randomized programs whose recurrence relations belong to the class of univariate recurrence relations described in Section 2.1. We put details of pseudocode and how to derive the recurrence relations in this section in Appendix 0.A. Moreover in all cases the base step is $\mathrm{T}(1)=1$ , hence we discuss the recursive case.

*Example 1** (*Randomized-Search)

Consider the Sherwood’s Randomized-Search algorithm (cf. [39, Chapter 9]). The algorithm checks whether an integer value $d$ is present within the index range $[i,j]$ ( $0\leq i\leq j$ ) in an integer array $ar$ which is sorted in increasing order and is without duplicate entries. The algorithm outputs either the index for $d$ in $ar$ or $-1$ meaning that $d$ is not present in the index range $[i,j]$ of $ar$ . The recurrence relation for this example is as follows:

[TABLE]

We note that the worst-case complexity for this algorithm is $\Theta(n)$ .∎

*Example 2** (*Quick-Sort)

Consider the Quick-Sort algorithm [16, Chapter 7]. The recurrence relation for this example is:

[TABLE]

where $\mathrm{T}(\mathfrak{n})$ represents the maximal expected execution time where $\mathfrak{n}$ is the array length and the execution time of pivoting is represented by $2\cdot\mathfrak{n}$ . We note that the worst-case complexity for this algorithm is $\Theta(n^{2})$ .∎

*Example 3** (*Quick-Select)

Consider the Quick-Select algorithm (cf. [16, Chapter 9]). The recurrence relation for this example is

[TABLE]

We note that the worst-case complexity for this algorithm is $\Theta(n^{2})$ .∎

*Example 4** (*Diameter-Computation)

Consider the Diameter-Computation algorithm (cf. [40, Chapter 9]) to compute the diameter of an input finite set $S$ of three-dimensional points. Depending on Eucledian or $L_{1}$ metric we obtain two different recurrence relations. For Eucledian we have the following relation:

[TABLE]

and for $L_{1}$ metric we have the following relation:

[TABLE]

We note that the worst-case complexity for this algorithm is as follows: for Euclidean metric it is $\Theta(n^{2}\cdot\log n)$ and for the $L_{1}$ metric it is $\Theta(n^{2})$ .∎

Example 5 (Sorting with Quick-Select)

Consider a sorting algorithm which selects the median through the Quick-Select algorithm. The recurrence relation is directly obtained as follows:

[TABLE]

where $T^{*}(\centerdot)$ is an upper bound on the expected running time of Quick-Select (cf. Example 3). We note that the worst-case complexity for this algorithm is $\Theta(n^{2})$ .∎

2.3 Separable Bivariate Randomized Recurrences

We consider a generalization of the univariate recurrence relations to a class of bivariate recurrence relations called separable bivariate recurrence relations. Similar to the univariate situation, we use ‘ $\mathrm{T}$ ’ to represent the (only) function call and ‘ $\mathfrak{n}$ ’, ‘ $\mathfrak{m}$ ’ to represent namely the two integer parameters.

Separable Bivariate Recurrence Expressions. The syntax of separable bivariate recurrence expressions is illustrated by $\mathfrak{e},\mathfrak{h}$ and $\mathfrak{b}$ as follows:

[TABLE]

The differences are that (i) we have two independent parameters $\mathfrak{n},\mathfrak{m}$ , (ii) $\mathfrak{e}$ now represents an expression composed of only $\mathrm{T}$ -terms, and (iii) $\mathfrak{h}$ (resp. $\mathfrak{b}$ ) represents arithmetic expressions for $\mathfrak{n}$ (resp. for $\mathfrak{m}$ ). This class of separable bivariate recurrence expressions (often for brevity bivariate recurrence expressions) stresses a dominant role on $\mathfrak{m}$ and a minor role on $\mathfrak{n}$ , and is intended to model randomized algorithms where some parameter (to be represented by $\mathfrak{n}$ ) does not change value.

Substitution. The notion of substitution is similar to the univariate case. Consider a function $h:\mathbb{N}\times\mathbb{N}\rightarrow\mathbb{R}$ , and a bivariate recurrence expression ${\mathfrak{e}}$ . The substitution function, denoted by $\mathsf{Subst}({\mathfrak{e}},h)$ , is the function from $\mathbb{N}\times\mathbb{N}$ into $\mathbb{R}$ such that $\mathsf{Subst}({\mathfrak{e}},h)(n,m)$ is the real number evaluated through substituting $h,n,m$ for $\mathrm{T},\mathfrak{n},\mathfrak{m}$ , respectively. The substitution for $\mathfrak{h},\mathfrak{b}$ is defined in a similar way, with the difference that they both induce a univariate function.

Bivariate recurrence relations. We consider bivariate recurrence relations $G=(\mathsf{eq}_{1},\mathsf{eq}_{2})$ , which consists of two equalities of the following form:

[TABLE]

where $c\in(0,\infty)$ and $\mathfrak{e},\mathfrak{h},\mathfrak{b}$ are from the grammar above.

Solution to bivariate recurrence relations. The evaluation of bivariate recurrence relation is similar to the univariate case. Similar to the univariate case, the unique solution $T_{G}:\mathbb{N}\times\mathbb{N}\rightarrow\mathbb{R}$ to a recurrence relation $G$ taking the form (8) is a function defined recursively as follows: (1) Base Step. $T_{G}(n,1):=\mathsf{Subst}({\mathfrak{h}})(n)\cdot c$ for all $n\in\mathbb{N}$ ; and (2) Recursive Step. $T_{G}(n,m):=\mathsf{Subst}({\mathfrak{e}},T_{G})(n,m)+\mathsf{Subst}(\mathfrak{h})(n)\cdot\mathsf{Subst}(\mathfrak{b})(m)$ for all $n\in\mathbb{N}$ and $m\geq 2$ . Again the interesting algorithmic question is to reason about the infinite behaviour of $T_{G}$ .

2.4 Motivating Classical Examples

In this section we present two classical examples of randomized algorithms where the randomized recurrence relations are bivariate. We put the detailed illustration for this two examples in Appendix 0.B.

*Example 6** (*Coupon-Collector)

Consider the Coupon-Collector problem [40, Chapter 3] with $n$ different types of coupons ( $n\in\mathbb{N}$ ). The randomized process proceeds in rounds: at each round, a coupon is collected uniformly at random from the coupon types the rounds continue until all the $n$ types of coupons are collected. We model the rounds as a recurrence relation with two variables $\mathfrak{n},\mathfrak{m}$ , where $\mathfrak{n}$ represents the total number of coupon types and $\mathfrak{m}$ represents the remaining number of uncollected coupon types. The recurrence relation is as follows:

[TABLE]

where $\mathrm{T}(\mathfrak{n},\mathfrak{m})$ is the expected number of rounds. We note that the worst-case complexity for this process is $\infty$ .∎

*Example 7** (*Channel-Conflict Resolution)

We consider two network scenarios in which $n$ clients are trying to get access to a network channel. This problem is also called the Resource-Contention Resolution [36, Chapter 13]. In this problem, if more than one client tries to access the channel, then no client can access it, and if exactly one client requests access to the channel, then the request is granted. In the distributed setting, the clients do not share any information. In this scenario, in each round, every client requests an access to the channel with probability $\frac{1}{n}$ . Then for this scenario, we obtain an over-approximating recurrence relation

[TABLE]

for the expected rounds until which every client gets at least one access to the channel. In the concurrent setting, the clients share one variable, which is the number of clients which has not yet been granted access. Also in this scenario, once a client gets an access the client does not request for access again. For this scenario, we obtain an over-approximating recurrence relation

[TABLE]

We also note that the worst-case complexity for both the scenarios is $\infty$ .∎

3 Expected-Runtime Analysis

We focus on synthesizing logarithmic, linear, and almost-linear asymptotic bounds for recurrence relations. Our goal is to decide and synthesize asymptotic bounds in the simple form: $d\cdot\mathfrak{f}+\mathfrak{g},\mathfrak{f}\in\{\ln{\mathfrak{n}},\mathfrak{n},\mathfrak{n}\cdot\ln{\mathfrak{n}}\}$ . Informally, $\mathfrak{f}$ is the major term for time complexity, $d$ is the coefficient of $\mathfrak{f}$ to be synthesized, and $\mathfrak{g}$ is the time complexity for the base case specified in (1) or (8).

Univariate Case: The algorithmic problem in univariate case is as follows:

•

Input: a univariate recurrence relation $G$ taking the form (1) and an expression $\mathfrak{f}\in\{\ln{\mathfrak{n}},\mathfrak{n},\mathfrak{n}\cdot\ln{\mathfrak{n}}\}$ .

•

Output: Decision problem. Output “yes” if $T_{G}\in\mathcal{O}(\mathsf{Subst}(\mathfrak{f}))$ , and “fail” otherwise.

•

Output: Quantitative problem. A positive real number $d$ such that

[TABLE]

for all $n\geq 1$ , or “fail” otherwise, where $c$ is from (1).

Remark 1

First note that while in the problem description we consider the form $\mathfrak{f}$ part of input for simplicity, since there are only three possibilites we can simply enumerate them, and thus have only the recurrence relation as input. Second, in the algorithmic problem above, w.l.o.g, we consider that every $\mathfrak{e}$ in (1) or (8) involves at least one $\mathrm{T}(\centerdot)$ -term and one non- $\mathrm{T}(\centerdot)$ -term; this is natural since (i) for algorithms with recursion at least one $\mathrm{T}(\centerdot)$ -term should be present for the recursive call and at least one non- $\mathrm{T}(\centerdot)$ -term for non-recursive base step. ∎

Bivariate Case: The bivariate-case problem is an extension of the univariate one, and hence the problem definitions are similar, and we present them succinctly below.

•

Input: a bivariate recurrence relation $G$ taking the form (8) and an expression $\mathfrak{f}$ (similar to the univariate case).

•

Output: Decision problem. Output “yes” if $T_{G}\in\mathcal{O}(\mathsf{Subst}(\mathfrak{f}))$ , and “fail” otherwise;

•

Output: Quantitative problem. A positive real number $d$ such that $T_{G}(n,m)\leq d\cdot\mathsf{Subst}(\mathfrak{f})(n,m)+c\cdot\mathsf{Subst}(\mathfrak{h})(n)$ for all $n,m\geq 1$ , or “fail” otherwise, where $c,\mathfrak{h}$ are from (8). Note that in the expression above the term $\mathfrak{b}$ does not appear as it can be captured with $\mathfrak{f}$ itself.

Recall that in the above algorithmic problems obtaining the finite behaviour of the recurrence relations is easy (through evaluation of the recurrences using dynamic programming), and the interesting aspect is to decide the asymptotic infinite behaviour.

4 The Synthesis Algorithm

In this section, we present our algorithms to synthesize asymptotic bounds for randomized recurrence relations.

Main ideas. The main idea is as follows. Consider as input a recurrence relation taking the form (1) and an univariate recurrence expression $\mathfrak{f}\in\{\ln{\mathfrak{n}},\mathfrak{n},\mathfrak{n}\cdot\ln{\mathfrak{n}}\}$ which specifies the desired asymptotic bound. We first define the standard notion of a guess-and-check function which provides a sound approach for asymptotic bound. Based on the guess-and-check function, our algorithm executes the following steps for the univariate case.

First, the algorithm sets up a scalar variable $d$ and then constructs the template $h$ to be $n\mapsto d\cdot\mathsf{Subst}(\mathfrak{f})(n)+c$ for a univariate guess-and-check function. 2. 2.

Second, the algorithm computes an over-approximation $\mathsf{OvAp}(\mathfrak{e},h)$ of $\mathsf{Subst}(\mathfrak{e},h)$ such that the over-approximation $\mathsf{OvAp}(\mathfrak{e},h)$ will involve terms from $\mathfrak{n}^{k},\ln^{\ell}{\mathfrak{n}}$ (for $k,\ell\in\mathbb{N}_{0}$ ) only. Note that $k,\ell$ may be greater than $1$ , so the above expressions are not necessarily linear (they can be quadratic or cubic for example). 3. 3.

Finally, the algorithm synthesizes a value for $d$ such that $\mathsf{OvAp}(\mathfrak{e},h)(n)\leq h(n)$ for all $n\geq 2$ through truncation of $[2,\infty)\cap\mathbb{N}$ into a finite range and a limit behaviour analysis (towards $\infty$ ).

Our algorithm for bivariate cases is a reduction to the univariate case.

Guess-and-Check functions. We follow the standard guess-and-check technique to solve simple recurrence relations. Below we first fix a univariate recurrence relation $G$ taking the form (1). By an easy induction on $n$ (starting from the $N$ specified in Definition 1) we obtain Theorem 4.1.

Definition 1 (Univariate Guess-and-Check Functions)

Let $G$ be a univariate recurrence relation taking the form (1). A function $h:\mathbb{N}\rightarrow\mathbb{R}$ is a guess-and-check function for $G$ if there exists a natural number $N\in\mathbb{N}$ such that: (1) (Base Condition) $T_{G}(n)\leq h(n)$ for all $1\leq n\leq N$ , and (2) (Inductive Argument) $\mathsf{Subst}(\mathfrak{e},h)(n)\leq h(n)$ for all $n>N$ .

Theorem 4.1 (Guess-and-Check, Univariate Case)

If a function $h:\mathbb{N}\rightarrow\mathbb{R}$ is a guess-and-check function for a univariate recurrence relation $G$ taking the form (1), then $T_{G}(n)\leq h(n)$ for all $n\in\mathbb{N}$ .

We do not explicitly present the definition for guess-and-check functions in the bivariate case, since we will present a reduction of the analysis of separable bivariate recurrence relations to that of the univariate ones (cf. Section 4.2).

Overapproximations for Recurrence Expressions. We now develop tight overapproximations for logarithmic terms. In principle, we use Taylor’s Theorem to approximate logarithmic terms such as $\ln{(n-1)},\ln{\lfloor\frac{n}{2}\rfloor}$ , and integral to approximate summations of logarithmic terms. All the results below are technical and depends on basic calculus (the detailed proofs are in the Appendix 0.C).

Proposition 1

For all natural number $n\geq 2$ :

[TABLE]

Proposition 2

For all natural number $n\geq 2$ : $\ln{n}-\frac{1}{n-1}\leq\ln{(n-1)}\leq\ln{n}-\frac{1}{n}$ .

Proposition 3

For all natural number $n\geq 2$ :

•

$\int_{1}^{n}\frac{1}{x}\,\mathrm{d}x-\sum_{j=1}^{n-1}\frac{1}{j}\in\left[-0.7552,-\frac{1}{6}\right]$ ;

•

$\int_{1}^{n}\ln{x}\,\mathrm{d}x-\left(\sum_{j=1}^{n-1}\ln{j}\right)-\frac{1}{2}\cdot\int_{1}^{n}\frac{1}{x}\,\mathrm{d}x\in\left[-\frac{1}{12},0.2701\right]$ ;

•

$\int_{1}^{n}x\cdot\ln{x}\,\mathrm{d}x-\left(\sum_{j=1}^{n-1}j\cdot\ln{j}\right)-\frac{1}{2}\cdot\int_{1}^{n}\ln{x}\,\mathrm{d}x+\frac{1}{12}\cdot\int_{1}^{n}\frac{1}{x}\,\mathrm{d}x-\frac{n-1}{2}\in\left[-\frac{19}{72},0.1575\right]$ .

Note that Proposition 3 is non-trivial since it approximates summation of reciprocal and logarithmic terms up to a constant deviation. For example, one may approximate $\sum_{j=1}^{n-1}\ln{j}$ directly by $\int_{1}^{n}\ln{x}\,\mathrm{d}x$ , but this approximation deviates up to a logarithmic term from Proposition 3. From Proposition 3, we establish a tight approximation for summation of logarithmic or reciprocal terms.

Example 8

Consider the summation $\sum_{j=\left\lceil\frac{n}{2}\right\rceil}^{n-1}\ln{j}+\sum_{j=\left\lfloor\frac{n}{2}\right\rfloor}^{n-1}\ln{j}\quad(n\geq 4)$ . By Proposition 3, we can over-approximate it as

[TABLE]

where $\Gamma_{\ln{\mathfrak{n}}}(n):=\int_{1}^{n}\ln{x}\,\mathrm{d}x-\frac{1}{2}\cdot\int_{1}^{n}\frac{1}{x}\,\mathrm{d}x=n\cdot\ln{n}-n-\frac{\ln{n}}{2}+1$ . By using Proposition 1, the above expression is roughly $n\cdot\ln{n}-(1-\ln{2})\cdot n+\frac{1}{2}\cdot\ln{n}+0.6672+\frac{1}{2\cdot n}$ (for details see Appendix 0.C).∎

Remark 2

Although we do approximation for terms related to only almost-linear bounds, Proposition 3 can be extended to logarithmic bounds with higher degree (e.g., $n^{3}\ln n$ ) since integration of such bounds can be obtained in closed forms.∎

4.1 Algorithm for Univariate Recurrence Relations

We present our algorithm to synthesize a guess-and-check function in form (12) for univariate recurrence relations. We present our algorithm in two steps. First, we present the decision version, and then we present the quantitative version that synthesizes the associated constant. The two key aspects are over-approximation and use of pseudo-polynomials, and we start with over-approximation. We relegate some technical details to Appendix 0.D.

Definition 2 (Overapproximation)

Let $\mathfrak{f}\in\{\ln{\mathfrak{n}},\mathfrak{n},\mathfrak{n}\cdot\ln{\mathfrak{n}}\}$ . Consider a univariate recurrence expression $\mathfrak{g}$ , constants $d$ and $c$ , and the function $h=d\cdot\mathsf{Subst}(\mathfrak{f})+c$ . We define the over-approximation function, denoted $\mathsf{OvAp}(\mathfrak{g},h)$ , recursively as follows.

•

Base Step A. If $\mathfrak{g}$ is one of the following: $c^{\prime},\mathfrak{n},\ln{\mathfrak{n}},\mathfrak{n}\cdot\ln{\mathfrak{n}},\frac{1}{\mathfrak{n}}$ , then $\mathsf{OvAp}(\mathfrak{g},h):=\mathsf{Subst}({\mathfrak{g}})$ .

•

Base Step B. If $\mathfrak{g}$ is a single term which involves $\mathrm{T}$ , then we define $\mathsf{OvAp}(\mathfrak{g},h)$ from over-approximations Proposition 1– 3. In details, $\mathsf{OvAp}(\mathfrak{g},h)$ is obtained from $\mathsf{Subst}(\mathfrak{g},h)$ by first over-approximating any summation through Proposition 3 (i.e., through those $\Gamma_{(\centerdot)}$ functions defined below Proposition 3), then over-approximating any $\ln{(\mathfrak{n}-1)},\left\lfloor\frac{\mathfrak{n}}{2}\right\rfloor,\left\lceil\frac{\mathfrak{n}}{2}\right\rceil,\ln{\left\lfloor\frac{\mathfrak{n}}{2}\right\rfloor},\ln{\left\lceil\frac{\mathfrak{n}}{2}\right\rceil}$ by Proposition 1 and Proposition 2. The details of the important over-approximations are illustrated explicitly in Table 1.

•

Recursive Step. We have two cases: (a) If $\mathfrak{g}$ is $\mathfrak{g}_{1}+\mathfrak{g}_{2}$ , then $\mathsf{OvAp}(\mathfrak{g},h)$ is $\mathsf{OvAp}(\mathfrak{g}_{1},h)+\mathsf{OvAp}(\mathfrak{g}_{2},h)$ . (b) If $\mathfrak{g}$ is $c^{\prime}\cdot\mathfrak{g}^{\prime}$ , then $\mathsf{OvAp}(\mathfrak{g},h)$ is $c^{\prime}\cdot\mathsf{OvAp}(\mathfrak{g}^{\prime},h)$ .

Example 9

Consider the recurrence relation for Sherwood’s Randomized-Search (cf. (2)). Choose $\mathfrak{f}=\ln{\mathfrak{n}}$ and then the template $h$ becomes $n\mapsto d\cdot\ln{n}+1$ . From Example 8, we have that the over-approximation for $6+\frac{1}{\mathfrak{n}}\cdot\left(\sum_{\mathfrak{j}=\left\lceil\frac{\mathfrak{n}}{2}\right\rceil}^{\mathfrak{n}-1}\mathrm{T}(\mathfrak{j})+\sum_{\mathfrak{j}=\left\lfloor\frac{\mathfrak{\mathfrak{n}}}{2}\right\rfloor}^{\mathfrak{\mathfrak{n}}-1}\mathrm{T}(\mathfrak{j})\right)$ when $n\geq 4$ is $7+d\cdot\left[\ln{n}-(1-\ln{2})+\frac{\ln{n}}{2\cdot n}+\frac{0.6672}{n}+\frac{1}{2\cdot n^{2}}\right]$ (the second summand comes from an over-approximation of $\frac{1}{\mathfrak{n}}\cdot\left(\sum_{\mathfrak{j}=\left\lceil\frac{\mathfrak{n}}{2}\right\rceil}^{\mathfrak{n}-1}d\cdot\ln{\mathfrak{j}}+\sum_{\mathfrak{j}=\left\lfloor\frac{\mathfrak{\mathfrak{n}}}{2}\right\rfloor}^{\mathfrak{\mathfrak{n}}-1}d\cdot\ln{\mathfrak{j}}\right)$ ).∎

Remark 3

Since integrations of the form $\int x^{k}\ln^{l}x\,\mathrm{d}x$ can be calculated in closed forms (cf. Remark 2), Table 1 can be extended to logarithmic expressions with higher order, e.g., $\mathfrak{n}^{2}\ln\mathfrak{n}$ .∎

Pseudo-polynomials. Our next step is to define the notion of (univariate) pseudo-polynomials which extends normal polynomials with logarithm. This notion is crucial to handle inductive arguments in the definition of (univariate) guess-and-check functions.

Definition 3 (Univariate Pseudo-polynomials)

A univariate pseudo-polynomial (w.r.t logarithm) is a function $p:\mathbb{N}\rightarrow\mathbb{R}$ such that there exist non-negative integers $k,\ell\in\mathbb{N}_{0}$ and real numbers $a_{i},b_{i}$ ’s such that for all $n\in\mathbb{N}$ ,

[TABLE]

W.l.o.g, we consider that in the form (13), it holds that (i) $a^{2}_{k}+b^{2}_{\ell}\neq 0$ , (ii) either $a_{k}\neq 0$ or $k=0$ , and (iii) similarly either $b_{\ell}\neq 0$ or $\ell=0$ .

Degree of pseudo-polynomials. Given a univariate pseudo-polynomial $p$ in the form (13), we define the degree $\mathrm{deg}(p)$ of $p$ by: $\mathrm{deg}(p)=k+\frac{1}{2}$ if $k\geq\ell\mbox{ and }a_{k}\neq 0$ and $\ell$ otherwise. Intuitively, if the term with highest degree involves logarithm, then we increase the degree by $1/2$ , else it is the power of the highest degree term.

Leading term $\overline{p}$ . The leading term $\overline{p}$ of a pseudo-polynomial $p$ in the form (13) is a function $\overline{p}:\mathbb{N}\rightarrow\mathbb{R}$ defined as follows: $\overline{p}(n)=a_{k}\cdot n^{k}\cdot\ln{n}\mbox{ if }k\geq\ell\mbox{ and }a_{k}\neq 0$ ; and $b_{\ell}\cdot n^{\ell}\mbox{ otherwise }$ ; for all $n\in\mathbb{N}$ . Furthermore, we define $C_{p}$ to be the (only) coefficient of $\overline{p}$ .

With the notion of pseudo-polynomials, the inductive argument of guess-and-check functions can be soundly transformed into an inequality between pseudo-polynomials.

Lemma 1

Let $\mathfrak{f}\in\{\ln{\mathfrak{n}},\mathfrak{n},\mathfrak{n}\cdot\ln{\mathfrak{n}}\}$ and $c$ be a constant. For all univariate recurrence expressions $\mathfrak{g}$ , there exists pseudo-polynomials $p$ and $q$ such that coefficients (i.e., $a_{i},b_{i}$ ’s in (13)) of $q$ are all non-negative, $C_{q}>0$ and the following assertion holds: for all $d>0$ and for all $n\geq 2$ , with $h=d\cdot\mathsf{Subst}({\mathfrak{f}})+c$ , the inequality $\mathsf{OvAp}(\mathfrak{g},h)(n)\leq h(n)$ is equivalent to $d\cdot p(n)\geq q(n)$ .

Remark 4

In the above lemma, though we only refer to existence of pseudo-polynomials $p$ and $q$ , they can actually be computed in linear time, because $p$ and $q$ are obtained by simple rearrangements of terms from $\mathsf{OvAp}(\mathfrak{g},h)$ and $h$ , respectively.

Example 10

Let us continue with Sherwood’s Randomized-Search. Again choose $h=d\cdot\ln{\mathfrak{n}}+1$ . From Example 9, we obtain that for every $n\geq 4$ , the inequality

[TABLE]

resulting from over-approximation and the inductive argument of guess-and-check functions is equivalent to $d\cdot\left[(1-\ln{2})\cdot n^{2}-\frac{n\cdot\ln{n}}{2}-0.6672\cdot n-\frac{1}{2}\right]\geq 6\cdot n^{2}$ .∎

As is indicated in Definition 1, our aim is to check whether $\mathsf{OvAp}(\mathfrak{g},h)(n)\leq h(n)$ holds for sufficiently large $n$ . The following proposition provides a sufficient and necessary condition for checking whether $d\cdot p(n)\geq q(n)$ holds for sufficiently large $n$ .

Proposition 4

Let $p,q$ be pseudo-polynomials such that $C_{q}>0$ and all coefficients of $q$ are non-negative. Then there exists a real number $d>0$ such that $d\cdot p(n)\geq q(n)$ for sufficiently large $n$ iff $\mathrm{deg}(p)\geq\mathrm{deg}(q)$ and $C_{p}>0$ .

Note that by Definition 1 and the special form (12) for univariate guess-and-check functions, a function in form (12) needs only to satisfy the inductive argument in order to be a univariate guess-and-check function: once a value for $d$ is synthesized for a sufficiently large $N$ , one can scale the value so that the base condition is also satisfied. Thus from the sufficiency of Proposition 4, our decision algorithm that checks the existence of some guess-and-check function in form (12) is presented below. Below we fix an input univariate recurrence relation $G$ taking the form (1) and an input expression $\mathfrak{f}\in\{\ln{\mathfrak{n}},\mathfrak{n},\mathfrak{n}\cdot\ln{\mathfrak{n}}\}$ .

Algorithm UniDec: Our algorithm, namely UniDec, for the decision problem of the univariate case, has the following steps.

Template. The algorithm establishes a scalar variable $d$ and sets up the template $d\cdot\mathfrak{f}+c$ for a univariate guess-and-check function. 2. 2.

Over-approximation. Let $h$ denote $d\cdot\mathsf{Subst}(\mathfrak{f})+c$ . The algorithm calculates the over-approximation function $\mathsf{OvAp}(\mathfrak{e},h)$ , where $\mathfrak{e}$ is from (1). 3. 3.

Transformation. The algorithm transforms the inequality $\mathsf{OvAp}(\mathfrak{e},h)(n)\leq h(n)~{}~{}(n\in\mathbb{N})$ for inductive argument of guess-and-check functions through Lemma 1 equivalently into $d\cdot p(n)\geq q(n)~{}~{}(n\in\mathbb{N})$ , where $p,q$ are pseudo-polynomials obtained in linear-time through rearrangement of terms from $\mathsf{OvAp}(\mathfrak{e},h)$ and $h$ (see Remark 4). 4. 4.

Coefficient Checking. The algorithm examines cases on $C_{p}$ . If $C_{p}>0$ and $\mathrm{deg}(p)\geq\mathrm{deg}(q)$ , then algorithm outputs “yes” meaning that “there exists a univariate guess-and-check function”; otherwise, the algorithm outputs “fail”.

Theorem 4.2 (Soundness for UniDec)

If UniDec outputs “yes”, then there exists a univariate guess-and-check function in form (12) for the inputs $G$ and $\mathfrak{f}$ . The algorithm is a linear-time algorithm in the size of the input recurrence relation.

Example 11

Consider Sherwood’s Randomized-Search recurrence relation (cf. (2)) and $\mathfrak{f}=\ln{\mathfrak{n}}$ as the input. As illustrated in Example 9 and Example 10, the algorithm asserts that the asymptotic behaviour is $\mathcal{O}(\ln{n})$ .∎

Remark 5

From the tightness of our over-approximation (up to only constant deviation) and the sufficiency and necessity of Proposition 4, the UniDec algorithm can handle a large class of univariate recurrence relations. Moreover, the algorithm is quite simple and efficient (linear-time). However, we do not know whether our approach is complete. We suspect that there is certain intricate recurrence relations that will make our approach fail.

Analysis of examples of Section 2.2. Our algorithm can decide the following optimal bounds for the examples of Section 2.2.

For Example 1 we obtain an $\mathcal{O}(\log n)$ bound (recall worst-case bound is $\Theta(n)$ ). 2. 2.

For Example 2 we obtain an $\mathcal{O}(n\cdot\log n)$ bound (recall worst-case bound is $\Theta(n^{2})$ ). 3. 3.

For Example 3 we obtain an $\mathcal{O}(n)$ bound (recall worst-case bound is $\Theta(n^{2})$ ). 4. 4.

For Example 4 we obtain an $\mathcal{O}(n\cdot\log n)$ (resp. $\mathcal{O}(n)$ ) bound for Euclidean metric (resp. for $L_{1}$ metric), whereas the worst-case bound is $\Theta(n^{2}\cdot\log n)$ (resp. $\Theta(n^{2})$ ). 5. 5.

For Example 5 we obtain an $\mathcal{O}(n\cdot\log n)$ bound (recall worst-case bound is $\Theta(n^{2})$ ).

In all cases above, our algorithm decides the asymptotically optimal bounds for the expected-runtime analysis, whereas the worst-case analysis grossly over-estimate the expected-runtime bounds.

Quantitative bounds. Above we have already established that our linear-time decision algorithm can establish the asymptotically optimal bounds for the recurrence relations of several classical algorithms. We now take the next step to obtain even explicit quantitative bounds, i.e., to synthesize the associated constants with the asymptotic complexity. To tackle these situations, we derive a following proposition which gives explicitly a threshold for “sufficiently large numbers”. We first explicitly constructs a threshold for “sufficiently large numbers”. Then we show in Proposition 5 that $N_{\epsilon,p,q}$ is indeed what we need.

Definition 4 (Threshold $N_{\epsilon,p,q}$ for Sufficiently Large Numbers)

Let $p,q$ be two univariate pseudo-polynomials $p(n)=\sum_{i=0}^{k}a_{i}\cdot n^{i}\cdot\ln{n}+\sum_{i=0}^{\ell}b_{i}\cdot n^{i}$ , $q(n)=\sum_{i=0}^{k^{\prime}}a^{\prime}_{i}\cdot n^{i}\cdot\ln{n}+\sum_{i=0}^{\ell^{\prime}}b^{\prime}_{i}\cdot n^{i}$ such that $\mathrm{deg}(p)\geq\mathrm{deg}(q)$ and $C_{p},C_{q}>0$ . Then given any $\epsilon\in(0,1)$ , the number $N_{\epsilon,p,q}$ is defined as the smallest natural number such that both $x,y$ (defined below) is smaller than $\epsilon$ :

•

$x=-1+\sum_{i=0}^{k}|a_{i}|\cdot\frac{N^{i}\cdot\ln{N}}{\overline{p}(N)}+\sum_{i=0}^{\ell}|b_{i}|\cdot\frac{N^{i}}{\overline{p}(N)}$ ;

•

$y=-\mathbf{1}_{\mathrm{deg}(p)=\mathrm{deg}(q)}\cdot\frac{C_{q}}{C_{p}}+\sum_{i=0}^{k^{\prime}}|a^{\prime}_{i}|\cdot\frac{N^{i}\cdot\ln{N}}{\overline{p}(N)}+\sum_{i=0}^{\ell^{\prime}}|b^{\prime}_{i}|\cdot\frac{N^{i}}{\overline{p}(N)}$ .

where $\mathbf{1}_{\mathrm{deg}(p)=\mathrm{deg}(q)}$ equals $1$ when ${\mathrm{deg}(p)=\mathrm{deg}(q)}$ and [math] otherwise.

Proposition 5

Consider two univariate pseudo-polynomials $p,q$ such that $\mathrm{deg}(p)\geq\mathrm{deg}(q)$ , all coefficients of $q$ are non-negative and $C_{p},C_{q}>0$ . Then given any $\epsilon\in(0,1)$ , $\frac{q(n)}{p(n)}\leq\frac{\mathbf{1}_{\mathrm{deg}(p)=\mathrm{deg}(q)}\cdot\frac{C_{q}}{C_{p}}+\epsilon}{1-\epsilon}$ for all $n\geq N_{\epsilon,p,q}$ (for $N_{\epsilon,p,q}$ of Definition 4).

With Proposition 5, we describe our algorithm UniSynth which outputs explicitly a value for $d$ (in (12)) if UniDec outputs yes. Below we fix an input univariate recurrence relation $G$ taking the form (1) and an input expression $\mathfrak{f}\in\{\ln{\mathfrak{n}},\mathfrak{n},\mathfrak{n}\cdot\ln{\mathfrak{n}}\}$ . Moreover, the algorithm takes $\epsilon>0$ as another input, which is basically a parameter to choose the threshold for finite behaviour. For example, smaller $\epsilon$ leads to large threshold, and vice-versa. Thus we provide a flexible algorithm as the threshold can be varied with the choice of $\epsilon$ .

Algorithm UniSynth: Our algorithm for the quantitative problem has the following steps:

Calling UniDec. The algorithm calls UniDec, and if it returns “fail”, then return “fail”, otherwise execute the following steps. Obtain the following inequality $d\cdot p(n)\geq q(n)~{}~{}(n\in\mathbb{N})$ from the transformation step of UniDec. 2. 2.

Variable Solving. The algorithm calculates $N_{\epsilon,p,q}$ for a given $\epsilon\in(0,1)$ by e.g. repeatedly increasing $n$ (see Definition 4) and outputs the value of $d$ as the least number such that the following two conditions hold: (i) for all $2\leq n<N_{\epsilon,p,q}$ , we have $\mathsf{Eval}(G)(n)\leq d\cdot\mathsf{Subst}({\mathfrak{f}})(n)+c$ (recall $\mathsf{Eval}(G)(n)$ can be computed in linear time), and (ii) we have $d\geq\frac{\mathbf{1}_{\mathrm{deg}(p)=\mathrm{deg}(q)}\cdot\frac{C_{q}}{C_{p}}+\epsilon}{1-\epsilon}$ .

Theorem 4.3 (Soundness for UniSynth)

If the algorithm UniSynth outputs a real number $d$ , then $d\cdot\mathsf{Subst}(\mathfrak{f})+c$ is a univariate guess-and-check function for $G$ .

Example 12

Consider the recurrence relation for Sherwood’s Randomized-Search (cf. (2)) and $\mathfrak{f}=\ln{\mathfrak{n}}$ . Consider that $\epsilon:=0.9$ . From Example 9 and Example 10, the algorithm establishes the inequality $d\geq\frac{6}{(1-\ln{2})-\frac{\ln{n}}{2\cdot n}-\frac{0.6672}{n}-\frac{1}{2\cdot n^{2}}}$ and finds that $N_{0.9,p,q}=6$ . Then the algorithm finds $d=204.5335$ through the followings: (a) $\mathsf{Eval}(G)(2)=7\leq d\cdot\ln{2}+1$ ; (b) $\mathsf{Eval}(G)(3)=11\leq d\cdot\ln{3}+1$ ; (c) $\mathsf{Eval}(G)(4)=15\leq d\cdot\ln{4}+1$ ; (d) $\mathsf{Eval}(G)(5)=17.8\leq d\cdot\ln{5}+1$ ; (e) $d\geq\frac{\frac{6}{1-\ln{2}}+0.9}{1-0.9}$ . Thus, by Theorem 4.1, the expected running time of the algorithm has an upper bound $204.5335\cdot\ln{n}+1$ . Later in Section 5, we show that one can obtain a much better $d=19.762$ through our algorithms by choosing $\epsilon:=0.01$ , which is quite good since the optimal value lies in $[15.129,19.762]$ (cf. the first item R.-Sear. in Table 2).∎

4.2 Algorithm for Bivariate Recurrence Relations

In this part, we present our results for the separable bivariate recurrence relations. The key idea is to use separability to reduce the problem to univariate recurrence relations. There are two key steps which we describe below.

Step 1. The first step is to reduce a separable bivariate recurrence relation to a univariate one.

Definition 5 (From $G$ to $\mathsf{Uni}(G)$ )

Let $G$ be a separable bivariate recurrence relation taking the form (8). The univariate recurrence relation $\mathsf{Uni}(G)$ from $G$ is defined by eliminating any occurrence of $\mathfrak{n}$ and replacing any occurrence of $\mathfrak{h}$ with $1$ .

Informally, $\mathsf{Uni}(G)$ is obtained from $G$ by simply eliminating the roles of $\mathfrak{h},\mathfrak{n}$ . The following example illustrates the situation for Coupon-Collector example.

Example 13

Consider $G$ to be the recurrence relation (9) for Coupon-Collector example. Then $\mathsf{Uni}(G)$ is as follows: $\mathrm{T}(\mathfrak{n})=\frac{1}{\mathfrak{n}}+\mathrm{T}(\mathfrak{n}-1)$ and $\mathrm{T}(1)=1$ . ∎

Step 2. The second step is to establish the relationship between $T_{G}$ and $T_{\mathsf{Uni}(G)}$ , which is handled by the following proposition, whose proof is an easy induction on $m$ .

Proposition 6

For any separable bivariate recurrence relation $G$ taking the form (8), the solution $T_{G}$ is equal to $(n,m)\mapsto\mathsf{Subst}(\mathfrak{h})(n)\cdot T_{\mathsf{Uni}(G)}(m)$ .

Description of the Algorithm. With Proposition 6, the algorithm for separable bivariate recurrence relations is straightforward: simply compute $\mathsf{Uni}(G)$ for $G$ and then call the algorithms for univariate case presented in Section 4.1.

Analysis of examples in Section 2.4. Our algorithm can decide the following optimal bounds for the examples of Section 2.4.

For Example 6 we obtain an $\mathcal{O}(n\cdot\log m)$ bound, whereas the worst-case bound is $\infty$ . 2. 2.

For Example 7 we obtain an $\mathcal{O}(n\cdot\log m)$ bound for distributed setting and $\mathcal{O}(m)$ bound for concurrent setting, whereas the worst-case bounds are both $\infty$ .

Note that for all our examples, $m\leq n$ , and thus we obtain $\mathcal{O}(n\cdot\log n)$ and $\mathcal{O}(n)$ upper bounds for expected-runtime analysis, which are the asymptotically optimal bounds. In all cases above, the worst-case analysis is completely ineffective as the worst-case bounds are infinite. Moreover, consider Example 7, where the optimal number of rounds is $n$ (i.e., one process every round, which centralized Round-Robin schemes can achieve). The randomized algorithm, with one shared variable, is a decentralized algorithm that achieves $O(n)$ expected number of rounds (i.e., the optimal asymptotic expected-runtime complexity).

5 Experimental Results

We consider the classical examples illustrated in Section 2.2 and Section 2.4. In Table 2 for experimental results we consider the following recurrence relations $G$ : R.-Sear. corresponds to the recurrence relation (2) for Example 1; Q.-Sort corresponds to the recurrence relation (3) for Example 2; Q.-Select corresponds to the recurrence relation (4) for Example 3; Diam. A (resp. Diam. B) corresponds to the recurrence relation (5) (resp. the recurrence relation (6)) for Example 4; Sort-Sel. corresponds to recurrence relation (7) for Example 5, where we use the result from setting $\epsilon=0.01$ in Q.-Select; Coupon corresponds to the recurrence relation (9) for Example 6; Res. A (resp. Res. B) corresponds to the recurrence relation (10) (resp. the recurrence relation (11)) for Example 7.

In the table, $\mathfrak{f}$ specifies the input asymptotic bound, $\epsilon$ and Dec is the input which specifies either we use algorithm UniDec or the synthesis algorithm UniSynth with the given $\epsilon$ value, and $d$ gives the value synthesized w.r.t the given $\epsilon$ ( $\checkmark$ for yes). We describe $d_{100}$ below. We need approximation for constants such as $e$ and $\ln{2}$ , and use the interval $[2.7182,2.7183]$ (resp., $[0.6931,0.6932]$ ) for tight approximation of $e$ (resp., $\ln{2}$ ).

The value $d_{100}$ . For our synthesis algorithm we obtain the value $d$ . The optimal value of the associated constant with the asymptotic bound, denoted $d^{*}$ , is defined as follows. For $z\geq 2$ , let $d_{z}:=\max\left\{\frac{T_{G}(n)-c}{\mathsf{Subst}(\mathfrak{f})(n)}\mid 2\leq n\leq z\right\}$ ( $c$ is from (1)). Then the sequence $d_{z}$ is increasing in $z$ , and its limit is the optimal constant, i.e., $d^{*}=\lim_{z\to\infty}d_{z}$ . We consider $d_{100}$ as a lower bound on $d^{*}$ to compare against the value of $d$ we synthesize. In other words, $d_{100}$ is the minimal value such that (12) holds for $1\leq n\leq 100$ , whereas for $d^{*}$ it must hold for all $n$ , and hence $d^{*}\geq d_{100}$ . Our experimental results show that the $d$ values we synthesize for $\epsilon=0.01$ is quite close to the optimal value.

We performed our experiments on Intel(R) Core(TM) i7-4510U CPU, 2.00GHz, 8GB RAM. All numbers in Table 2 are over-approximated up to $10^{-3}$ , and the running time of all experiments are less than $0.02$ seconds. From Table 2, we can see that optimal $d$ are effectively over-approximated. For example, for Quick-Sort (Eq. (3)) (i.e, Q.-Sort in the table), our algorithm detects $d=4.051$ and the optimal one lies somewhere in $[3.172,4.051]$ . The experimental results show that we obtain the results extremely efficiently (less than $1/50$ -th of a second). For further details see Table 3 in Appendix 0.E.

6 Related Work

Automated program analysis is a very important problem with a long tradition [46]. The following works consider various approaches for automated worst-case bounds [28, 29, 30, 31, 32, 35, 34, 26, 5, 44] for amortized analysis, and the SPEED project [23, 24, 22] for non-linear bounds using abstract interpretation. All these works focus on the worst-case analysis, and do not consider expected-runtime analysis.

Our main contribution is automated analysis of recurrence relations. Approaches for recurrence relations have also been considered in the literature. Wegbreit [46] considered solving recurrence relations through either simple difference equations or generating functions. Zimmermann and Zimmermann [49] considered solving recurrence relations by transforming them into difference equations. Grobauer [21] considered generating recurrence relations from DML for the worst-case analysis. Flajolet et al. [19] considered allocation problems. Flajolet et al. [20] considered solving recurrence relations for randomization of combinatorial structures (such as trees) through generating functions. The COSTA project [2, 3, 4] transforms Java bytecode into recurrence relations and solves them through ranking functions. Moreover, The PURRS tool [6] addresses finite linear recurrences (with bounded summation), and some restricted linear infinite recurrence relations (with unbounded summation). Our approach is quite different because we consider analyzing recurrence relations arising from randomized algorithms and expected-runtime analysis through over-approximation of unbounded summations through integrals, whereas previous approaches either consider recurrence relations for worst-case bounds or combinatorial structures, or use generating functions or difference equations to solve the recurrence relations.

For intraprocedural analysis ranking functions have been widely studied [8, 9, 15, 42, 45, 17, 48, 43], which have then been extended to non-recursive probabilistic programs as ranking supermartingales [10, 18, 13, 12, 14, 11]. Such approaches are related to almost-sure termination, and not deriving optimal asymptotic expected-runtime bounds (such as $\mathcal{O}(\log n)$ , $\mathcal{O}(n\log n)$ ).

Proof rules have also been considered for recursive (probabilistic) programs in [25, 33, 41], but these methods cannot be automated and require manual proofs.

7 Conclusion

In this work we considered efficient algorithms for automated analysis of randomized recurrences for logarithmic, linear, and almost-linear bounds. Our work gives rise to a number of interesting questions. First, an interesting theoretical direction of future work would be to consider more general randomized recurrence relations (such as with more than two variables, or interaction between the variables). While the above problem is of theoretical interest, most interesting examples are already captured in our class of randomized recurrence relations as mentioned above. Another interesting practical direction would be automated techniques to derive recurrence relations from randomized recursive programs.

Acknowledgements

We thank all reviewers for valuable comments. The research is partially supported by Vienna Science and Technology Fund (WWTF) ICT15-003, Austrian Science Fund (FWF) NFN Grant No. S11407-N23 (RiSE/SHiNE), ERC Start grant (279307: Graph Games), the Natural Science Foundation of China (NSFC) under Grant No. 61532019 and the CDZ project CAP (GZ 1023).

Appendix 0.A Omitted Details for Section 2.2

Example 1. [Randomized-Search] Consider the Sherwood’s Randomized-Search algorithm (cf. [39, Chapter 9]) depicted in Fig. 1. The algorithm checks whether an integer value $d$ is present within the index range $[i,j]$ ( $0\leq i\leq j$ ) in an integer array $ar$ which is sorted in increasing order and is without duplicate entries. The algorithm outputs either the index for $d$ in $ar$ or $-1$ meaning that $d$ is not present in the index range $[i,j]$ of $ar$ .

The description of the pseudo-code is as follows. The first four lines deal with the base case when there is only one index in the index range. The remaining lines deal with the recursive case: in line 6, an index $k$ is uniformly sampled from $\{i,i+1,\dots,j\}$ ; line 7–8 check whether $k$ is the output; line 9–12 perform the recursive calls depending on whether $ar[k]<d$ or not; finally, line 13–14 handle the case when $d<ar[i]$ or $d>ar[j]$ .

Let $T:\mathbb{N}\rightarrow\mathbb{N}$ be the function such that for any $n\in\mathbb{N}$ , we have $T(n)$ is the supremum of the expected execution times upon all inputs $(ar,i,j)$ with $j-i+1=n$ . We derive a recurrence relation for $T$ as follows. Let $n\in\mathbb{N}$ and $(ar,i,j),d$ be any input such that $n=j-i+1$ . We clarify two cases below:

there exists an $i\leq k^{*}<j$ such that $ar[k^{*}]\leq d<ar[k^{*}+1]$ , where $ar[j+1]$ is interpreted $\infty$ here; 2. 2.

$ar[j]\leq d$ or $d<ar[i]$ .

In both cases, we have $T(1)=1$ . In Case 1, we deduce from the pseudo-code in Fig. 1 that

[TABLE]

for all $n\geq 2$ , where the maximum ranges over all $\ell^{*}:=k^{*}-i+1$ ’s. In Case 2, similarly we deduce that

[TABLE]

Thus a preliminary version $G^{\prime}$ of the recurrence relation is $\mathrm{T}(1)=1$ and

[TABLE]

for all $n\geq 2$ . Let $T^{\prime}:\mathbb{N}\rightarrow\mathbb{R}$ be the unique solution to $G^{\prime}$ . Then from the fact that $T^{\prime}(2)\geq T^{\prime}(1)$ , by induction $T^{\prime}$ is monotonically increasing. Thus the maximum

[TABLE]

is attained at $\ell^{*}=\left\lfloor\frac{n}{2}\right\rfloor$ for all $n\geq 2$ . Then $G^{\prime}$ is transformed into our final recurrence relation as follows:

[TABLE]

We note that the worst-case complexity for this algorithm is $\Theta(n)$ .∎

Example 2.[Quick-Sort] Consider the Quick-Sort algorithm [16, Chapter 7] depicted in Fig. 2, where every input $(ar,i,j)$ is assumed to satisfy that $0\leq i\leq j$ and $ar$ is an array of integers which does not contain duplicate numbers.

The description of the pseudo-code is as follows: first, line 2 samples an integer uniformly from $\{i,\dots,j\}$ ; then, line 3 calls a subroutine $\mathsf{pivot}$ which (i) rearranges $ar$ such that integers in $ar$ which are less than $ar[k]$ come first, then $ar[k]$ , and finally integers in $ar$ greater than $ar[k]$ , and (ii) outputs the new index $m$ of $ar[k]$ in $ar$ ; and finally, lines 4–7 handle recursive calls to sub-arrays.

From the pseudo-code, the following recurrence relation is easily obtained:

[TABLE]

where $\mathrm{T}(\mathfrak{n})$ represents the maximal expected execution time where $\mathfrak{n}$ is the array length and the execution time of pivoting is represented by $2\cdot\mathfrak{n}$ . We note that the worst-case complexity for this algorithm is $\Theta(n^{2})$ .∎

Example 3. [Quick-Select] Consider the Quick-Select algorithm (cf. [16, Chapter 9]) depicted in Fig. 4 which upon any input $(ar,i,j)$ and $d$ such that $0\leq i\leq j$ , $1\leq d\leq j-i+1$ and $ar$ contains no duplicate integers, finds the $d$ -th largest integer in $ar$ . Note that for an array of size $n$ , and $d=n/2$ , we have the Median-Find algorithm.

The description of the pseudo-code is as follows: line 1 handles the base case; line 3 starts the recursive case by sampling $k$ uniformly from $\{i,\dots,j\}$ ; line 4 rearranges $ar$ and returns an $m$ in the same way as $\mathsf{pivot}$ in Quick-Sort (cf. Example 2); line 5 handles the case when $ar[k]$ happens to be the $d$ -th largest integer in $ar$ ; and finally, line 7–10 handle the recursive calls.

Let $T:\mathbb{N}\rightarrow\mathbb{N}$ be the function such that for any $n\in\mathbb{N}$ , we have $T(n)$ is the supremum of the expected execution times upon all inputs $(ar,i,j)$ with $j-i+1=n$ . By an analysis on where the $d$ -th largest integer lies in $ar$ which is similar to the analysis on $d$ in Example 1, a preliminary recurrence relation is obtained such that $\mathrm{T}(1)=1$ and

[TABLE]

By similar monotone argument in Example 1, the maximum of the right-hand-side expression above is attained at $\ell^{*}=\left\lfloor\frac{n+1}{2}\right\rfloor$ for all $n\geq 2$ . By the fact that $\left\lfloor\frac{n+1}{2}\right\rfloor=\left\lceil\frac{n}{2}\right\rceil$ for all $n\geq 2$ , the following recurrence relation is obtained:

[TABLE]

To fit our univariate recurrence expression, we use over-approximation, and the final recurrence relation for this example is

[TABLE]

We note that the worst-case complexity for this algorithm is $\Theta(n^{2})$ .∎

Example 4.[Diameter-Computation] Consider the Diameter-Computation algorithm (cf. [40, Chapter 9]) to compute the diameter of an input finite set $S$ of three-dimensional points. A pseudo-code to implement this is depicted in Fig. 4. The description of the pseudo-code is as follows: line 1–2 handle the base case; line 3 samples a point $p$ uniformly from $S$ ; line 4 calculates the maximum distance in $S$ from $p$ ; line 5 calculates the intersection of all balls centered at points in $S$ with uniform radius $d$ ; line 6 calculates the set of points outside $U$ ; lines 7–8 handle the situation $S^{\prime}=\emptyset$ which implies that $d$ is the diameter; lines 9–10 handle the recursive call to $S^{\prime}$ . Due to uniform choice of $p$ at line 3, the size of $S^{\prime}$ is uniformly in $[0,|S|-1]$ ; it then follows a pivoting (similar to that in Example 3 and Example 2) by line $5$ w.r.t the linear order over $\{\max_{p^{\prime}\in S}{\mbox{\sl dist}(p,p^{\prime})}\mid p\in S\}$ . Lines 5–6 can be done in $\mathcal{O}(|S|\cdot\log{|S|})$ time for Euclidean distance, and $\mathcal{O}(|S|)$ time for $L_{1}$ metric [40].

Depending on Eucledian or $L_{1}$ metric we obtain two different recurrence relations. For Eucledian we have the following relation:

[TABLE]

with the execution time for lines 5–6 being taken to be $2\cdot\mathfrak{n}\cdot\ln{\mathfrak{n}}$ , and and for $L_{1}$ metric we have the following relation:

[TABLE]

with the execution time for lines 5–6 being taken to be $2\cdot\mathfrak{n}$ . We note that the worst-case complexity for this algorithm is as follows: for Euclidean metric it is $\Theta(n^{2}\cdot\log n)$ and for the $L_{1}$ metric it is $\Theta(n^{2})$ .∎

Example 5.[Sorting with Quick-Select] Consider a sorting algorithm depicted in Fig. 5 which selects the median through the Quick-Select algorithm. The recurrence relation is directly obtained as follows:

[TABLE]

where $T^{*}(\centerdot)$ is an upper bound on the expected running time of Quick-Select (cf. Example 3). We note that the worst-case complexity for this algorithm is $\Theta(n^{2})$ .∎

Appendix 0.B Omitted Details for Section 2.4

Example 6.[Coupon-Collector] Consider the Coupon-Collector problem [40, Chapter 3] with $n$ different types of coupons ( $n\in\mathbb{N}$ ). The randomized process proceeds in rounds: at each round, a coupon is collected uniformly at random from the coupon types (i.e., each coupon type is collected with probability $\frac{1}{n}$ ); and the rounds continue until all the $n$ types of coupons are collected. We model the rounds as a recurrence relation with two variables $\mathfrak{n},\mathfrak{m}$ , where $\mathfrak{n}$ represents the total number of coupon types and $\mathfrak{m}$ represents the remaining number of uncollected coupon types. The recurrence relation is as follows:

[TABLE]

where $\mathrm{T}(\mathfrak{n},\mathfrak{m})$ is the expected number of rounds, $\frac{\mathfrak{n}}{\mathfrak{m}}$ represents the expected number of rounds to collect a new (i.e., not-yet-collected) coupon type when there are still $\mathfrak{m}$ type of coupons to be collected, and $\mathfrak{n}$ (for $\mathrm{T}(\mathfrak{n},1)$ ) represents the expected number of rounds to collect a new coupon type when there is only one new coupon type to be collected. We note that the worst-case complexity for this process is $\infty$ .∎

Example 7.[Channel-Conflict Resolution] We consider two network scenarios in which $n$ clients are trying to get access to a network channel. This problem is also called the Resource-Contention Resolution [36, Chapter 13]. In this problem, if more than one client tries to access the channel, then no client can access it, and if exactly one client requests access to the channel, then the request is granted. While centralized deterministic algorithms exist (such as Round-Robin) for the problem, to be implemented in a distributed or concurrent setting, randomized algorithms are necessary.

Distributed setting. In the distributed setting, the clients do not share any information. In this scenario, in each round, every client requests an access to the channel with probability $\frac{1}{n}$ . We are interested in the expected number of rounds until every client gets at least one access to the channel. At each round, let $m$ be the number of clients who have not got any access. Then the probability that a new client (from the $m$ clients) gets the access is $m\cdot\frac{1}{n}\cdot(1-\frac{1}{n})^{n-1}$ . Thus, the expected rounds that a new client gets the access is $\frac{n}{m}\cdot\frac{1}{(1-\frac{1}{n})^{n-1}}$ . Since the sequence $\left\{(1-\frac{1}{n})^{n-1}\right\}_{n\in\mathbb{N}}$ converges decreasingly to $\frac{1}{e}$ when $n\rightarrow\infty$ , this expected time is no greater than $e\cdot\frac{n}{m}$ . Then for this scenario, we obtain an over-approximating recurrence relation

[TABLE]

for the expected rounds until which every client gets at least one access to the channel. Note that in this setting no client has any information about any other client.

Concurrent setting. In the concurrent setting, the clients share one variable, which is the number of clients which has not yet been granted access. Also in this scenario, once a client gets an access the client does not request for access again. Moreover, the shared variable represents the number of clients $m$ that have not yet got access. In this case, in reach round a client that has not access to the channel yet, requests access to the channel with probability $\frac{1}{m}$ . Then the probability that a new client gets the access becomes $m\cdot\frac{1}{m}\cdot(1-\frac{1}{m})^{m-1}$ . It follows that the expected time that a new client gets the access becomes $\frac{1}{(1-\frac{1}{m})^{m-1}}$ which is smaller than $e$ . Then for this scenario, we obtain an over-approximating recurrence relation

[TABLE]

We also note that the worst-case complexity for both is $\infty$ .∎

Appendix 0.C Details for Overapproximations

To prove results for overapproximations for recurrence expressions, we need the following well-known theorem.

Theorem 0.C.1 (Taylor’s Theorem (with Lagrange’s Remainder) [7, Chapter 6])

For any function $f:[a,b]\rightarrow\mathbb{R}$ ( $a,b\in\mathbb{R}$ and $a<b$ ), if $f$ is ( $k+1$ )-order differentiable, then for all $x\in[a,b]$ , there exists a $\xi\in(a,x)$ such that

[TABLE]

We also recall that

[TABLE]

where $\alpha$ is the Apéry’s constant which lies in $[1.2020,1.2021]$ .

Moreover, we have the following result using integral-by-part technique and Newton-Leibniz Formula.

Lemma 2

For all $a,b\in(0,\infty)$ such that $a<b$ , the following assertions hold:

[TABLE]

Furthermore, we need the following simple lemmas. The following lemma provides a tight approximation for floored expressions, the proof of which is a simple case distinction between even and odd cases.

Lemma 3

For all natural numbers $n$ , we have $\frac{n-1}{2}\leq\left\lfloor\frac{n}{2}\right\rfloor\leq\frac{n}{2}\leq\left\lceil\frac{n}{2}\right\rceil\leq\frac{n+1}{2}$ .

The following lemma handles over-approximation of simple summations.

Lemma 4

For any natural number $n\geq 2$ and real number $c$ , one has that $\frac{\sum_{j=1}^{n-1}c}{n}\leq c\mbox{ and }\frac{\left(\sum_{j=\left\lceil\frac{\mathfrak{n}}{2}\right\rceil}^{n-1}c+\sum_{j=\left\lfloor\frac{\mathfrak{n}}{2}\right\rfloor}^{n-1}c\right)}{n}\leq c$ .

Then we prove the following two propositions.

Proposition 1. For any natural number $n\geq 2$ , we have

[TABLE]

Proof

Let $n\geq 2$ be a natural number. The first argument comes from the facts that

[TABLE]

and

[TABLE]

where we use the fact that

[TABLE]

and $\xi_{n}$ is obtained from Taylor’s Theorem. The second argument comes from the facts that

[TABLE]

and

[TABLE]

where the first inequality is due to the fact that

[TABLE]

and $\xi^{\prime}_{n}$ is obtained from Taylor’s Theorem.∎

Proposition 2. For any natural number $n\geq 2$ , we have

[TABLE]

Proof

The lemma follows directly from the fact that

[TABLE]

for some $\xi\in(n-1,n)$ , which can be obtained through Taylor’s Theorem.∎

Proposition 3. For any natural number $n\geq 2$ , we have:

[TABLE]

Proof

Let $n$ be a natural number such that $n\geq 2$ . We first estimate the difference

[TABLE]

To this end, we deduce the following equalities:

[TABLE]

where $\xi_{j,x}$ is a real number in $(j,j+x)$ obtained from Taylor’s Theorem with Lagrange’s Remainder. The first and fourth equalities come from the linear property of Riemann Integral; the second one follows from the variable substitution $x^{\prime}=x-j$ ; the third one follows from Taylor’s Theorem. Using the fact that $\xi_{j,x}\in(j,j+1)$ , one obtains that

[TABLE]

and

[TABLE]

Then (14) follows from the facts that

[TABLE]

and

[TABLE]

where in both situations we use the fact that $2\cdot j^{2}\leq 3\cdot j^{3}$ for all $j\in\mathbb{N}$ .

Then we consider the difference

[TABLE]

First, we derive that

[TABLE]

where $\xi_{j,x}$ is a real number in $(j,j+1)$ obtained from Taylor’s Theorem. Using the fact that $\xi_{j,x}\in(j,j+1)$ , one can obtain that

[TABLE]

where the second inequality follows from Inequality (18), and

[TABLE]

where the second inequality follows from Inequality (17). Then from Inequality (19) and Inequality (20), one has that

[TABLE]

and

[TABLE]

where in both situations we use the fact that $12\cdot j^{2}\leq 6\cdot j^{3}$ for all $j\geq 2$ . The inequalities above directly imply the inequalities in (15). Finally, we consider the difference

[TABLE]

Following similar approaches, we derive that for all natural numbers $n\geq 2$ ,

[TABLE]

where $\xi_{j,x}\in(j,j+1)$ . Thus, one obtains that

[TABLE]

and

[TABLE]

By plugging Inequalities in (18) and (20) into Inequality (21), one obtains that

[TABLE]

for all natural numbers $n\geq 2$ . Similarly, by plugging Inequalities in (17) and (19) into Inequality (22), one obtains

[TABLE]

Then the inequalities in (16) are clarified.∎

Example 8. Consider the summation

[TABLE]

By Proposition 3, we can over-approximate it as

[TABLE]

which is equal to

[TABLE]

Then using Proposition 1, we can further obtain the following over-approximation

[TABLE]

which is roughly $n\cdot\ln{n}-(1-\ln{2})\cdot n+\frac{1}{2}\cdot\ln{n}+0.6672+\frac{1}{2\cdot n}$ .∎

Appendix 0.D Proofs for Sect. 4.1

Lemma 1. Let $\mathfrak{f}\in\{\ln{\mathfrak{n}},\mathfrak{n},\mathfrak{n}\cdot\ln{\mathfrak{n}}\}$ and $c$ be a constant. For all univariate recurrence expressions $\mathfrak{g}$ , there exists pseudo-polynomials $p$ and $q$ such that coefficients (i.e., $a_{i},b_{i}$ ’s in (13)) of $q$ are all non-negative, $C_{q}>0$ and the following assertion holds: for all $d>0$ and for all $n\geq 2$ , with $h=d\cdot\mathsf{Subst}({\mathfrak{f}})+c$ , the inequality $\mathsf{OvAp}(\mathfrak{g},h)(n)\leq h(n)$ is equivalent to $d\cdot p(n)\geq q(n)$ .

Proof

From Definition 2, $n\mapsto n\cdot(n-1)\cdot\mathsf{OvAp}(\mathfrak{g},h)(n)$ is a pseudo-polynomial. Simple rearrangement of terms in inequality $\mathsf{OvAp}(\mathfrak{g},h)(n)\leq h(n)$ gives the desired pseudo-polynomials. Moreover, the fact that all coefficients in $\mathfrak{g}$ (from (1)) are positive, is used to derive that all coefficients of $q$ are non-negative and $C_{q}>0$ .∎

Proposition 4. Let $p,q$ be pseudo-polynomials such that $C_{q}>0$ and all coefficients of $q$ are non-negative. Then there exists a real number $d>0$ such that $d\cdot p(n)\geq q(n)$ for sufficiently large $n$ iff $\mathrm{deg}(p)\geq\mathrm{deg}(q)$ and $C_{p}>0$ .

Proof

We present the two directions of the proof.

(“If”:) Suppose that $\mathrm{deg}(p)\geq\mathrm{deg}(q)$ and $C_{p}>0$ . Then the result follows directly from the facts that (i) $\frac{q(n)}{p(n)}>0$ for sufficiently large $n$ and (ii) $\lim\limits_{n\rightarrow\infty}\frac{q(n)}{p(n)}$ exists and is non-negative.

(“Only-if”:) Let $d$ be a positive real number such that $d\cdot p(n)\geq q(n)$ for sufficiently large $n$ . Then $C_{p}>0$ , or otherwise $d\cdot p(n)$ is either constantly zero or negative for sufficiently large $n$ . Moreover, $\mathrm{deg}(p)\geq\mathrm{deg}(q)$ , since otherwise $\lim\limits_{n\rightarrow\infty}\frac{q(n)}{p(n)}=\infty$ .∎

Proposition 5. Consider two univariate pseudo-polynomials $p,q$ such that $\mathrm{deg}(p)\geq\mathrm{deg}(q)$ , all coefficients of $q$ are non-negative and $C_{p},C_{q}>0$ . Then given any $\epsilon\in(0,1)$ ,

[TABLE]

for all $n\geq N_{\epsilon,p,q}$ (for $N_{\epsilon,p,q}$ of Definition 4).

Proof

Let $p,q$ be given in Definition 4. Fix an arbitrary $\epsilon\in(0,1)$ and let $N_{\epsilon,p,q}$ be given in Definition 4. Then for all $n\geq N_{\epsilon,p,q}$ , (i) both $p(n),q(n)$ are positive and (ii)

[TABLE]

and

[TABLE]

It follows that for all $n\geq N_{\epsilon,p,q}$ ,

[TABLE]

The desired result follows.∎

Theorem 4.2.[Soundness for UniDec] If UniDec outputs “yes”, then there exists a univariate guess-and-check function in form (12) for the inputs $G$ and $\mathfrak{f}$ . The algorithm is a linear-time algorithm in the size of the input recurrence relation.

Proof

From Definition 1 and the special form (12) for univariate guess-and-check functions, a function in form (12) which satisfies the inductive argument of Definition 1 can be modified to satisfy also the base condition of Definition 1 by simply raising $d$ to a sufficiently large amount. Then the correctness of the algorithm follows from Theorem 4.1 and the sufficiency of Proposition 4. Furthermore, the algorithm runs in linear time since the transformation from the inequality $\mathsf{OvAp}(\mathfrak{g},h)(n)\leq h(n)$ into $d\cdot p(n)\geq q(n)$ (cf. Lemma 1) takes linear time in the size of the input recurrence relation. ∎

Theorem 4.3.[Soundness for UniSynth] If the algorithm UniSynth outputs a real number $d$ , then $d\cdot\mathsf{Subst}(\mathfrak{f})+c$ is a univariate guess-and-check function for $G$ .

Proof

Directly from the construction of the algorithm, Theorem 4.1, Proposition 4 and Proposition 5.∎

Appendix 0.E Detailed Experimental Results

The detailed experimental results are given in Table 3. We use $\checkmark$ to represent yes and $\times$ for fail. In addition to Table 2, we include values for $N_{\epsilon,p,q}$ in Definition 4. For the separable bivariate examples, recall that $n$ does not change, and in these examples, the reduction to the univariate case is the function of $m$ .

Bibliography49

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Akra, M.A., Bazzi, L.: On the solution of linear recurrence equations. Comp. Opt. and Appl. 10(2), 195–210 (1998), http://dx.doi.org/10.1023/A:1018373005182
2[2] Albert, E., Arenas, P., Genaim, S., Gómez-Zamalloa, M., Puebla, G., Ramírez-Deantes, D.V., Román-Díez, G., Zanardini, D.: Termination and cost analysis with COSTA and its user interfaces. Electr. Notes Theor. Comput. Sci. 258(1), 109–121 (2009), http://dx.doi.org/10.1016/j.entcs.2009.12.008
3[3] Albert, E., Arenas, P., Genaim, S., Puebla, G.: Automatic inference of upper bounds for recurrence relations in cost analysis. In: Alpuente, M., Vidal, G. (eds.) Static Analysis, 15th International Symposium, SAS 2008, Valencia, Spain, July 16-18, 2008. Proceedings. Lecture Notes in Computer Science, vol. 5079, pp. 221–237. Springer (2008), http://dx.doi.org/10.1007/978-3-540-69166-2_15
4[4] Albert, E., Arenas, P., Genaim, S., Puebla, G., Zanardini, D.: Cost analysis of java bytecode. In: Nicola, R.D. (ed.) Programming Languages and Systems, 16th European Symposium on Programming, ESOP 2007, Held as Part of the Joint European Conferences on Theory and Practics of Software, ETAPS 2007, Braga, Portugal, March 24 - April 1, 2007, Proceedings. Lecture Notes in Computer Science, vol. 4421, pp. 157–172. Springer (2007), http://dx.doi.org/10.1007/978-3-540-71316-6_12
5[5] Avanzini, M., Lago, U.D., Moser, G.: Analysing the complexity of functional programs: higher-order meets first-order. In: Fisher, K., Reppy, J.H. (eds.) Proceedings of the 20th ACM SIGPLAN International Conference on Functional Programming, ICFP 2015, Vancouver, BC, Canada, September 1-3, 2015. pp. 152–164. ACM (2015), http://doi.acm.org/10.1145/2784731.2784753
6[6] Bagnara, R., , Pescetti, A., Zaccagnini, A., Zaffanella, E.: PURRS: Towards computer algebra support for fully automatic worst-case complexity analysis. Technical report, University of Parma (2005), https://arxiv.org/abs/cs/0512056
7[7] Bartle, R.G., Sherbert, D.R.: Introduction to Real Analysis. John Wiley & Sons, Inc., 4th edn. (2011)
8[8] Bournez, O., Garnier, F.: Proving positive almost-sure termination. In: RTA. pp. 323–337 (2005)

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Automated Recurrence Analysis

Abstract

1 Introduction

2 Recurrence Relations

2.1 Univariate Randomized Recurrences

2.2 Motivating Classical Examples

Example 1* (*Randomized-Search)

Example 2* (*Quick-Sort)

Example 3* (*Quick-Select)

Example 4* (*Diameter-Computation)

Example 5 (Sorting with Quick-Select)

2.3 Separable Bivariate Randomized Recurrences

2.4 Motivating Classical Examples

Example 6* (*Coupon-Collector)

Example 7* (*Channel-Conflict Resolution)

3 Expected-Runtime Analysis

Remark 1

4 The Synthesis Algorithm

Definition 1 (Univariate Guess-and-Check Functions)

Theorem 4.1 (Guess-and-Check, Univariate Case)

Proposition 1

Proposition 2

Proposition 3

Example 8

Remark 2

4.1 Algorithm for Univariate Recurrence Relations

Definition 2 (Overapproximation)

Example 9

Remark 3

Definition 3 (Univariate Pseudo-polynomials)

Lemma 1

Remark 4

Example 10

Proposition 4

Theorem 4.2 (Soundness for UniDec)

Example 11

Remark 5

Definition 4 (Threshold Nϵ,p,qN_{\epsilon,p,q}Nϵ,p,q​ for Sufficiently Large Numbers)

Proposition 5

Theorem 4.3 (Soundness for UniSynth)

Example 12

4.2 Algorithm for Bivariate Recurrence Relations

Definition 5 (From GGG to Uni(G)\mathsf{Uni}(G)Uni(G))

Example 13

Proposition 6

5 Experimental Results

6 Related Work

7 Conclusion

Acknowledgements

Appendix 0.A Omitted Details for Section 2.2

Appendix 0.B Omitted Details for Section 2.4

Appendix 0.C Details for Overapproximations

Theorem 0.C.1 (Taylor’s Theorem (with Lagrange’s Remainder) [7, Chapter 6])

Lemma 2

Lemma 3

Lemma 4

Proof

Proof

Proof

Appendix 0.D Proofs for Sect. 4.1

Proof

Proof

Proof

Proof

Proof

Appendix 0.E Detailed Experimental Results

*Example 1** (*Randomized-Search)

*Example 2** (*Quick-Sort)

*Example 3** (*Quick-Select)

*Example 4** (*Diameter-Computation)

*Example 6** (*Coupon-Collector)

*Example 7** (*Channel-Conflict Resolution)

Definition 4 (Threshold $N_{\epsilon,p,q}$ for Sufficiently Large Numbers)

Definition 5 (From $G$ to $\mathsf{Uni}(G)$ )