Workflow Scheduling in the Cloud with Weighted Upward-rank Priority   Scheme Using Random Walk and Uniform Spare Budget Splitting

Hang Zhang; Xiaoying Zheng; Ye Xia; and Mingqi Li

arXiv:1903.01154·cs.DC·March 5, 2019

Workflow Scheduling in the Cloud with Weighted Upward-rank Priority Scheme Using Random Walk and Uniform Spare Budget Splitting

Hang Zhang, Xiaoying Zheng, Ye Xia, and Mingqi Li

PDF

Open Access

TL;DR

This paper introduces a novel workflow scheduling method in cloud environments that combines Markovian importance-based prioritization with uniform spare budget splitting, outperforming existing algorithms in various workflow scenarios.

Contribution

It proposes a new prioritization scheme using Markovian chain stationary probabilities and a uniform spare budget splitting strategy for improved cloud workflow scheduling.

Findings

01

Markovian prioritization improves workflow makespan.

02

Uniform spare budget splitting outperforms proportional splitting.

03

Algorithms outperform state-of-the-art in diverse workflows.

Abstract

We study a difficult problem of how to schedule complex workflows with precedence constraints under a limited budget in the cloud environment. We first formulate the scheduling problem as an integer programming problem, which can be optimized and used as the baseline of performance. We then consider the traditional approach of scheduling jobs in a prioritized order based on the upward-rank of each job. For those jobs with no precedence constraints among themselves, the plain upward-rank priority scheme assigns priorities in an arbitrary way. We propose a job prioritization scheme that uses Markovian chain stationary probabilities as a measure of importance of jobs. The scheme keeps the precedence order for the jobs that have precedence constraints between each other, and assigns priorities according to the jobs' importance for the jobs without precedence constraints. We finally design a…

Tables20

Table 1. Table I: Leasing cost of the VMs in the example of Fig. 1 .

VM	${VM}_{1}$	${VM}_{2}$	${VM}_{3}$
Price	3	5	6

Table 2. Table II: Execution time of the jobs on each VM.

$n_{i}$	${VM}_{1}$	${VM}_{2}$	${VM}_{3}$
1	16	14	7
2	19	13	16
3	17	11	10
4	13	8	15
5	12	13	8
6	13	16	7
7	6	16	9
8	12	11	5
9	8	9	11
10	21	7	14
11	12	8	16
12	21	7	14

Table 3. Table III: Rank values and scheduling order for jobs under two different priority generation schemes strategies.

Job	Upward	Scheduling	Weighted	Scheduling
	Rank	Order	Rank	Order
$n_{1}$	67	1	73.15	1
$n_{2}$	54	2	58.29	2
$n_{3}$	38	6	40.72	8
$n_{4}$	50	3	54.17	3
$n_{5}$	49	5	53.14	5
$n_{6}$	50	3	54.17	3
$n_{7}$	25	11	27.33	11
$n_{8}$	38	6	41.81	6
$n_{9}$	38	6	41.81	6
$n_{10}$	28	9	31.23	9
$n_{11}$	26	10	28.36	10
$n_{12}$	14	12	16.00	12

Table 4. Table IV: Final scheduling results using HBCS with the plain upward-rank priority scheme (budget =500).

$n_{i}$	Budget	Cost	Saved	Start	Finish	VM Assigned
			Budget	Time	Time
1	100	42	58	0	7	3
2	115	65	50	7	20	2
4	89	39	50	7	20	1
6	89	39	50	20	33	1
5	86	65	21	20	33	2
3	72	55	17	33	44	2
8	47	36	11	33	45	1
9	35	24	11	45	53	1
10	46	35	11	53	60	2
11	47	36	11	53	65	1
7	29	18	11	65	71	1
12	46	35	11	71	78	2
Actual cost = 489, makespan = 78.

Table 5. Table V: Final scheduling results using HBCS with our weighted upward-rank priority scheme (budget =500).

$n_{i}$	Budget	Cost	Saved	Start	Finish	VM Assigned
			Budget	Time	Time
1	100	42	58	0	7	3
2	115	65	50	7	20	2
4	89	39	50	7	20	1
6	89	42	47	7	14	3
5	83	48	35	14	22	3
8	65	55	10	20	31	2
9	34	24	10	22	30	1
3	61	55	6	31	42	2
10	41	35	6	42	49	2
11	42	40	2	49	57	2
7	20	18	2	42	48	1
12	37	35	2	57	64	2
Actual cost = 498, makespan = 64.

Table 6. Table VI: Final scheduling results using the plain upward-rank priority scheme and uniform spare budget splitting (budget =500).

$n_{i}$	Budget	Cost	Saved	Start	Finish	VM Assigned
			Budget	Time	Time
1	47	42	5	0	7	3
2	67	65	2	7	20	2
4	46	39	7	7	20	1
6	50	42	8	7	14	3
5	49	48	1	14	22	3
3	57	55	2	20	31	2
8	37	30	7	22	27	3
9	36	24	12	22	30	1
10	52	35	17	31	38	2
11	57	36	21	30	42	1
7	44	18	26	42	48	1
12	66	35	31	48	55	2
Actual cost = 469 $\leq$ Budget, makespan = 55.

Table 7. Table VII: VM types in the experiments.

VM Type	vCPU	Memory(GiB)	Price($/hour)
t2.micro	1	1	0.0116
t2.medium	2	4	0.0464
m5.xlarge	4	16	0.192
m5.2xlarge	8	32	0.384
m5.4xlarge	16	64	0.768
m5.12xlarge	48	192	2.304
c5.large	2	4	0.085
c5.xlarge	4	8	0.17
c5.2xlarge	8	16	0.34
c5.4xlarge	16	32	0.68
c5.9xlarge	36	72	1.53
c5.18xlarge	72	144	3.06
r4.large	2	15.25	0.133
r4.xlarge	4	30.5	0.266
r4.2xlarge	8	61	0.532
r4.4xlarge	16	122	1.064
r4.8xlarge	32	244	2.128
i3.xlarge	4	30.5	0.312
i3.2xlarge	8	61	0.624
i3.4xlarge	16	122	1.248
i3.8xlarge	32	244	2.496
g3.4xlarge	16	122	1.14
g3.8xlarge	32	244	2.28

Table 8. Table VIII: Shorthands for different algorithms.

BAVE	Algorithm 1 with the plain upward-rank priority scheme
BAVE_M	Algorithm 1 with the weighted upward-rank priority scheme
MSLBL	Algorithm in [12] with the plain upward-rank priority scheme
MSLBL_M	Algorithm in [12] with the weighted upward-rank priority scheme
HBCS	Algorithm in [15]
BHEFT	Algorithm in [14]
Gurobi	the optimal baseline solution generated by Gurobi

Table 9. Table IX: Ranking counts for the FFT workflow, 75 test cases

RANK	1	2	3	AR
Gurobi	15	0	0	/
BAVE	64	9	2	1.17
BAVE_M	63	8	3	1.23
MSLBL	31	35	8	1.72
MSLBL_M	32	34	9	1.69
HBCS	13	9	10	/
BHEFT	9	13	3	/

Table 10. Table X: Ranking counts for the Gaussian workflow, 60 test cases

RANK	1	2	3	AR
Gurobi	15	0	0	/
BAVE	35	14	5	1.70
BAVE_M	41	16	3	1.36
MSLBL	15	10	20	2.60
MSLBL_M	18	14	22	2.26
HBCS	12	0	0	/
BHEFT	3	10	0	/

Table 11. Table XI: Ranking counts for the CyberShake workflow, 15 test cases

RANK	1	2	3	AR
BAVE	6	3	4	2.13
BAVE_M	10	4	1	1.40
MSLBL	6	4	5	1.93
MSLBL_M	8	5	2	1.60
HBCS	3	0	0	/
BHEFT	3	0	0	/

Table 12. Table XII: Ranking counts for the Epigenomics workflow, 15 test cases

RANK	1	2	3	AR
BAVE	6	2	4	2.27
BAVE_M	10	4	1	1.40
MSLBL	4	2	6	2.53
MSLBL_M	7	6	2	1.67
HBCS	3	0	0	/
BHEFT	2	1	0	/

Table 13. Table XIII: Ranking counts for the Inspiral workflow, 15 test cases

RANK	1	2	3	AR
BAVE	7	0	3	2.40
BAVE_M	8	5	1	1.67
MSLBL	4	1	6	2.67
MSLBL_M	6	7	2	1.73
HBCS	3	0	0	/
BHEFT	1	0	0	/

Table 14. Table XIV: Ranking counts for the Montage workflow, 15 test cases

RANK	1	2	3	AR
BAVE	8	3	1	1.93
BAVE_M	11	1	3	1.47
MSLBL	3	4	4	2.60
MSLBL_M	5	3	6	2.20
HBCS	3	0	0	/
BHEFT	1	1	0	/

Table 15. Table XV: Ranking counts for the Sipht workflow, 15 test cases

RANK	1	2	3	AR
BAVE	7	1	0	2.47
BAVE_M	6	3	6	2.00
MSLBL	4	5	4	2.33
MSLBL_M	8	5	1	1.67
HBCS	3	0	0	/
BHEFT	1	1	0	/

Table 16. Table XVI: Ranking counts for the random workflow, 45 test cases

RANK	1	2	3	AR
Gurobi	15	0	0	/
BAVE	32	12	1	1.31
BAVE_M	32	12	1	1.31
MSLBL	11	8	19	2.48
MSLBL_M	11	18	15	2.13
HBCS	8	1	0	/
BHEFT	9	1	0	/

Table 17. Table XVII: Ranking counts for the workflow obtained from an Internet company, 45 test cases

RANK	1	2	3	AR
Gurobi	15	0	0	/
BAVE	20	19	3	1.75
BAVE_M	33	12	0	1.26
MSLBL	10	8	18	2.60
MSLBL_M	10	13	18	2.35
HBCS	7	2	0	/
BHEFT	0	4	1	/

Table 18. Table XVIII: Multiple large-sized workflows; VM Sufficiency: Scarce , N = 9744 𝑁 9744 N=9744

$φ$	BAVE	BAVE_M	ROUND ROBIN	RANDOM
0.0	10334	10334	10334	10334
0.25	7083	6912	7114	7244.06
0.50	3740	3385	3876	4219.71
0.75	1018	1936	1788	2124.2
1.00	878	855	879	1091.17

Table 19. Table XIX: Multiple large-sized workflows; VM Sufficiency: Normal , N = 9744 𝑁 9744 N=9744

$φ$	BAVE	BAVE_M	ROUND ROBIN	RANDOM
0.0	8791	8791	8791	8791
0.25	6015	5843	6400	6344.45
0.50	2908	3195	3342	3211.83
0.75	834	815	1278	1348.84
1.00	815	815	891	894.06

Table 20. Table XX: Multiple large-sized workflows; VM Sufficiency: Sufficient , N = 9744 𝑁 9744 N=9744

$φ$	BAVE	BAVE_M	ROUND ROBIN	RANDOM
0.0	6989	6989	6989	6989
0.25	4971	4616	4447	4709.12
0.50	1900	1707	2195	2179.32
0.75	725	695	970	984.96
1.00	725	695	777	777.74

Equations47

k \in V \sum t \in T \sum x_{j k}^{t} = 1, \forall j \in J .

k \in V \sum t \in T \sum x_{j k}^{t} = 1, \forall j \in J .

k \in V \sum t \in T \sum C_{k} x_{j k}^{t} \geq c_{j}, \forall j \in J .

k \in V \sum t \in T \sum C_{k} x_{j k}^{t} \geq c_{j}, \forall j \in J .

k \in V \sum t \in T \sum M_{k} x_{j k}^{t} \geq m_{j}, \forall j \in J .

(k \in V \sum t \in T \sum t x_{ik}^{t} - k \in V \sum t \in T \sum (t + R_{j k}) x_{j k}^{t}) l_{ij} \geq 0, \forall i, j \in J .

(k \in V \sum t \in T \sum t x_{ik}^{t} - k \in V \sum t \in T \sum (t + R_{j k}) x_{j k}^{t}) l_{ij} \geq 0, \forall i, j \in J .

i \in J \sum r = m a x (0, t - R_{ik} + 1) \sum t x_{ik}^{r} \leq 1, \forall k \in V, t \in T .

i \in J \sum r = m a x (0, t - R_{ik} + 1) \sum t x_{ik}^{r} \leq 1, \forall k \in V, t \in T .

i \in J \sum r = m a x (0, t - R_{ik} + 1) \sum t x_{ik}^{r} = 1.

i \in J \sum r = m a x (0, t - R_{ik} + 1) \sum t x_{ik}^{r} = 1.

k \in V \sum t \in T \sum (t + R_{j k}) x_{j k}^{t} \leq d, \forall j \in J .

k \in V \sum t \in T \sum (t + R_{j k}) x_{j k}^{t} \leq d, \forall j \in J .

k \in V \sum j \in J \sum p_{k} R_{j k} t \in T \sum x_{j k}^{t} \leq D .

k \in V \sum j \in J \sum p_{k} R_{j k} t \in T \sum x_{j k}^{t} \leq D .

\displaystyle\min\ \ \

\displaystyle\min\ \ \

\displaystyle s.t.\ \ \

x, y binary, d integer .

\overset{ˉ}{R}_{j}

\overset{ˉ}{R}_{j}

w_{j_{exit}}

w_{j}

p_{j i} = \frac{1}{∣ s u cc ( j ) ∣}, \forall i, j where l_{ij} = 1.

p_{j i} = \frac{1}{∣ s u cc ( j ) ∣}, \forall i, j where l_{ij} = 1.

j \in J \sum π_{j} = 1,

j \in J \sum π_{j} = 1,

i \sum p_{ij} π_{i} = π_{j}, \forall j .

w_{j_{exit}}

w_{j_{exit}}

w_{j}

D_{j}^{m i n} = k \in V_{j} min {p_{k} R_{j k}} .

D_{j}^{m i n} = k \in V_{j} min {p_{k} R_{j k}} .

D \geq j \in J \sum D_{j}^{m i n} .

D \geq j \in J \sum D_{j}^{m i n} .

D_{j}^{reserve} = D_{j}^{m i n} + \frac{D - \sum _{j \in J} D _{j}^{m i n}}{∣ J ∣} .

D_{j}^{reserve} = D_{j}^{m i n} + \frac{D - \sum _{j \in J} D _{j}^{m i n}}{∣ J ∣} .

\displaystyle\min\ \ \

\displaystyle\min\ \ \

\displaystyle s.t.\ \ \

k \in V_{j},

D_{remain} = D_{remain} + (D_{j}^{reserve} - p_{k} R_{j k}) .

D_{remain} = D_{remain} + (D_{j}^{reserve} - p_{k} R_{j k}) .

D = D_{m i n} + φ (D_{m a x} - D_{m i n}),

D = D_{m i n} + φ (D_{m a x} - D_{m i n}),

A R = \frac{R _{1} + 2 R _{2} + 3 R _{3} + 4 R _{4}}{N _{c a ses}},

A R = \frac{R _{1} + 2 R _{2} + 3 R _{3} + 4 R _{4}}{N _{c a ses}},

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCloud Computing and Resource Management · Distributed and Parallel Computing Systems · Scheduling and Optimization Algorithms

Full text

Workflow Scheduling in the Cloud with Weighted Upward-rank Priority Scheme Using Random Walk

and Uniform Spare Budget Splitting

Hang Zhang, Xiaoying Zheng1, Ye Xia, and Mingqi Li 1Corresponding authorH. Zhang is with the School of Computer Engineering and Science, Shanghai University, Shanghai, 200444, China.H. Zhang is also with Shanghai Advanced Research Institute, Chinese Academy of Sciences, Shanghai, 201210, China. E-mail: ([email protected])X. Zheng and M. Li are with Shanghai Advanced Research Institute, Chinese Academy of Sciences, Shanghai, 201210, China. E-mail: (zhengxy, [email protected])Y. Xia is with Department of Computer and Information Science and Engineering, University of Florida, Gainesville, FL 32611, USA. E-mail: ([email protected])

Abstract

We study a difficult problem of how to schedule complex workflows with precedence constraints under a limited budget in the cloud environment. We first formulate the scheduling problem as an integer programming problem, which can be optimized and used as the baseline of performance. We then consider the traditional approach of scheduling jobs in a prioritized order based on the upward-rank of each job. For those jobs with no precedence constraints among themselves, the plain upward-rank priority scheme assigns priorities in an arbitrary way. We propose a job prioritization scheme that uses Markovian chain stationary probabilities as a measure of importance of jobs. The scheme keeps the precedence order for the jobs that have precedence constraints between each other, and assigns priorities according to the jobs’ importance for the jobs without precedence constraints. We finally design a uniform spare budget splitting strategy, which splits the spare budget uniformly across all the jobs. We test our algorithms on a variety of workflows, including FFT, Gaussian elimination, typical scientific workflows, randomly generated workflows and workflows from an in-production cluster of an online streaming service company. We compare our algorithms with the-state-of-art algorithms. The empirical results show that the uniform spare budget splitting scheme outperforms the splitting scheme in proportion to extra demand in average for most cases, and the Markovian based prioritization further improves the workflow makespan.

Index Terms:

Workflow Scheduling, Heterogeneous clouds, Budget constraints, Precedence constraints, Schedule length.

I Introduction

There is an increasing trend to use the cloud for complex workflows, such as scientific computing workflows and big-data analytics [1] [2] [3]. The customers submit their workflow processing requests together with their budget to the cloud. The workflow management system in the cloud assigns the processing requests to appropriate virtual machines (VM) by jointly considering the requests, the VM capability and the budget. Hopefully, the customers service level agreement will be met and the objective of the cloud provider will be optimized. However, the current workflow management systems are inadequate for scheduling complex workflows with diverse requirements and heterogeneous virtual machines. This has resulted in long processing latency, wasted cloud resources, and poor return on investment.

This paper investigates a workflow scheduling problem in the cloud with budget constraints. More specifically, a set of workflows is to be placed in the cloud. Each workflow has multiple computation jobs, with precedence constraints among themselves. For each workflow, we can use a directed acyclic graph (DAG) to represent the precedence constraints of the jobs. A job has an execution time, which depends on where the job is placed and how much computing resources are allocated to it. A job has a minimum computation resource requirement, including the CPU power and the memory requirements. The jobs are placed on a limited set of VMs. The customer is charged only for the period when a VM is used, i.e., on the pay-as-you-go basis. This describes the use cases of on-demand VMs in Amazon EC2. With respect to the jobs, the decision problem we consider in this paper is to decide where and when to place each job, i.e., which VM will execute each job and when the execution starts. The precedence constraints and the budget constraints must be satisfied. Furthermore, all the resource capacity constraints at the placement targets must also be respected. The optimization objective is to minimize the processing time of the set of workflows, i.e., the makespan of the workflows.

Scheduling of a workflow represented by a directed task graph is a well-known NP-complete problem in general [4] [5]. The precedence constraints among jobs make the scheduling hard and many efforts have been made to find efficient heuristics in the area of parallel computing and grid computing. Topcuoglu et al. proposed the upward-rank based heuristic proposed in [6] to tackle the precedence constraints. In the upward-rank based approached, each job computes its accumulated processing time from the exit job upward to itself along the critical (i.e., the longest) path as the upward rank. Jobs are then scheduled in the non-increasing order of their ranks. For the jobs with precedence constraints, the upward-rank based scheme assigns the priorities in a reasonable way; but, for those jobs with no precedence constraints among themselves, the upward-rank priority scheme assigns priorities in an arbitrary fashion. In this work, we propose to assign priorities for those unrelated jobs considering jobs’ importance in the global DAG topology. We construct a random walk on the (extended) workflow DAG, and apply the random walk stationary distribution probabilities as jobs’ importance (i.e., weights). The rationale is that the stationary probabilities are computed recursively across the global topology and carry the global information of all states (jobs) propagated back to each state (job), and therefore the resulted stationary probabilities reflect the jobs’ importance in the global topology. The other issue is that in parallel computing and grid computing, workflow computing often aims to minimize the makespan without considering the cost of computing facility. In the era of the cloud, the leasing cost of the cloud facility brings a new challenge in scheduling DAG-based workflows in the cloud. Since jobs are scheduled in a prioritized order and often greedily, how the budget is split and reserved for each job remains a heuristic. In this work, we propose to reserve the minimum required budget for each job, and assign the spare budget uniformly across the jobs.

We summarize the contributions of our work.

•

We formulate an integer programming model of the DAG-based workflow scheduling problem with budget constraints. The model can be evaluated by integer programming solvers such as Gurobi [7] and the solution can be used as the performance baseline of different heuristics.

•

We propose a weighted upward-rank priority scheme that assigns the scheduling priorities to the jobs. It leads to improved performance in average when compared with the plain upward-rank priority scheme in [6]. The weights in our scheme are the stationary probabilities of a random walk on the workflow digraphs.

•

We assign the spare budget uniformly across all the jobs. The empirical results show that for most cases, the uniform spare-budget-splitting scheme outperforms the scheme of splitting budget in proportion to extra demand in average.

The remaining of the paper is organized as follows. In Section II, we discuss more related works. In Section III, we formulate the workflow scheduling problem as an integer programming problem. We describe the weighted upward-rank priority scheme based on a random walk and the uniform spare budget splitting heuristic in Section V. We evaluate the heuristic on empirical test cases in Section VI. Finally, we draw the conclusion in Section VII.

II Related Works

DAG-based workflow scheduling has been extensively studied in the literature of parallel computing and grid computing. In the survey paper [8], the authors summarized on a wide spectrum of algorithms on DAG-based workflow scheduling in a multi-processor environment, including branch-and-bound, integer-programming, searching, randomization, and genetic algorithms. Topcuoglu et al. proposed the Heterogeneous Earliest-Finish-Time (HEFT) algorithm in [6]. The HEFT algorithm first computes the upward-rank of each task by traversing the task graph; it then sorts tasks non-increasingly based on the upward-rank values, and assigns the tasks in the sorted list to the available fastest processor. Upward-rank based task prioritization achieves good performance and becomes an important solution in solving DAG-based workflow scheduling. Daoud and Kharma studied a similar problem in [9] and designed the longest dynamic critical path algorithm (LDCP). The LDCP algorithm introduces a DAG for each processor, named DAGP, with the sizes of all the tasks set to the computation costs of on each specified processor. It computes the upward-rank of each task within a DAGP to gain more precise task priorities. A task with the highest upward-rank among all DAGPs is assigned with a priority to the proper processor and all DAGPs will be updated after the assignment. The tie is broken by choosing the task with the largest number of outgoing edges. The LDCP has better scheduling performance than HEFT, but with higher complexity. The work in [10] studies the problem of minimizing the execution time of a workflow in heterogeneous environments and designs an ant-colony based heuristic algorithm. The heuristic generates task sequences considering both the forward and backward (i.e., global) dependency of tasks, where the forward dependency is defined as the number of predecessors, and the backward dependency is defined as the number of successors, respectively. The algorithm searches the suitable machine with a greedy minimum strategy in each round of searching. The work in [10] aligns with our opinion that not only jobs on the critical path but also other jobs should be accounted when we compute the scheduling priority.

As more and more workflows are moved to the cloud, scheduling DAG-based workflows faces a new challenge of scheduling tasks under budget constraints. Recently, several studies have worked on the budget-constrained workflow makespan minimization problem in the cloud environment [2] [11] [12]. Wang and Shi [2] consider a special $\kappa$ -stage MapReduce-like workflow where each stage consists of a batch of concurrent jobs. Their approach is to first greedily allocate budget to the slowest job of each stage across all the stages, hoping to minimize the execution time of each stage. It then gradually refines the budget allocation across the stages and schedules the concurrent jobs of each stage based on the budget. Shu and Wu [11] study a workflow mapping problem to minimize workflow makespan under a budget constraint in public clouds. The work assumes that a job consists of homogeneous tasks and there is an unlimited number of VMs in the cloud. It pre-computes the most expensive schedule and the cheapest schedule based on the concept of the critical path, and applies the binary search to find an approximate solution. The work in [12] considers a budget-constrained workflow scheduling heuristic in a heterogenous cloud environment. The heuristic algorithm schedules the task in a prioritized order based on the upward-rank of each task [6]. The main idea of the algorithm is that it splits and reserves the budget to each individual task. It first assigns each task the minimum budget equal to the cost of using the cheapest VM; then, the remaining budget is split so that each task gets an additional share in proportion to the cost difference between using the cheapest VM and using the most expensive VM. Hence, by reserving the minimum budget to each task, the algorithm guarantees to find a feasible solution. By splitting the extra budget in proportion to each task’s extra cost demand, the heuristic reserves more spare budget for the tasks with lower priorities. These jobs will enjoy more flexibility in selecting better VMs. Sakellariou et al. considered the facility cost in a grid environment [13]. It proposes two approaches to find a minimum makespan solution with budget constraint, LOSS and GAIN, respectively. The LOSS approach starts with the scheduling solution achieved by the HEFT algorithm, and keeps swapping task to cheaper machines until the budget constraint is satisfied. The GAIN approach starts with a solution with the cheapest cost, and keeps swapping tasks to faster machines whenever there is available budget. The work in [14] extends the HEFT algorithm in [6] and proposes a Budget-constrained HEFT algorithm (BHEFT). The BHEFT algorithm assigns scheduling priorities based on the upward rank. It splits the budget to each task based on its average cost over difference resources; if there is additional spare budget, the spare budget will be assigned to each task in proportion to its demand. With the budget for each individual task, the BHEFT algorithm always assigns the affordable fastest resource to a task. Arabnejad and Barbosa worked on a similar DAG scheduling problem in [15] and proposed the HBCS algorithm. The task prioritization is also based on the upward-rank. The HBCS algorithm computes a worthiness indicator which jointly considers the cost, the remaining budget and the speed of each processor and assigns a task to the processor with the highest worthiness.

Some studies consider the min-cost workflow scheduling problem under the processing deadline constraint. Abrishami et al. proposed the IaaS cloud partial critical paths algorithm (IC-PCP algorithm) in [16] to minimize the execution cost of the workflow under a deadline constraint. The key idea is the critical parent and partial critical paths(PCPs). The critical parent of a task is its unassigned parent that has the latest finish time. The PCP consists of a task and its critical parents. The algorithm schedules tasks in a PCP as a pack, and assigns it to the cheapest VM which can meet the sub-deadline of the PCP. Sahni and Vidyarthi proposed the just-in-time (JIT-C) algorithm in a follow-up work of the IC-PCP [17]. It first checks the feasibility of the customer’s deadline requirement. With a feasible deadline, the algorithm starts from the entry tasks and steps into a monitoring control loop. Within each control loop, it identifies the tasks whose parent tasks have been scheduled and are running, and assigns each of these tasks to the cheapest VM satisfying its sub-deadline requirement.

Regarding the scheduling of multiple workflows, several different scheduling strategies were proposed. The work in [18] focuses on how to schedule mutiple workflows onto a set of heterogeneous resources and minimize the makespan. It proposes four policies to create a composite DAG, including common entry and common exit node, level-based ordering, alternating DAGs, and ranking-based composition. It define a slowdown metric as the ratio of the finish time achieved when a workflow is scheduled individually and the finish time achieved when the workflow is scheduled together with other workflows. It aims to achieve fairness across workflows by minimizing the largest slowdown value when scheduling jobs. The work in [19] uses a heterogeneous priority rank value that includes the out-degree of a task as a weight in the evaluation of task priorities. It further proposes three scheduling strategies across multiple workflows including round-robin, priority-based, and trade off between round-robin and priority. Rodriguez and Buyyawe [20] proposed an elastic resource provisioning and scheduling algorithm for multiple workflows, which aims to minimize the overall cost of leasing resources while meeting the independent deadline constraint of workflows.

Wang and Xia explored using mixed integer programming (MIP) to formulate and solve complex workflow scheduling problems as building blocks of large-scale scheduling problems [21]. The scheduling problems considered in [21] are minimization of the cost under the deadline constraint. Meena et al. [22] aimed at finding schedules to minimize the execution cost while meeting the deadline in cloud computing environment. They employed a PerVar parameter to record the variation of performance of VMs and proposed a Cost Effective Genetic Algorithm (CEGA) to generate schedules. Li et al. [23] focused on a similar work of [22] and captured dynamic performance fluctuations of VMs by a time-series-based approach. With the VM performance forecast information, they designed a genetic algorithm that fulfills the Service-Level-Agreement. The work in [24] develops a scheduling system to minimize the expected monetary cost given the user-specified probabilistic deadline guarantees in IaaS clouds. It focuses on dealing with the price and performance dynamics in clouds and does not assume precedence constraints in workflows. Zheng et al. [25] studied the problem of improving utility of cloud computing by allowing partial execution of jobs. The workflows in clouds consist of parallel homogeneous preemptable tasks without precedence constraints. The work proposes efficient online multi-resource allocation algorithms. Champati and Liang considered the job-machine assignment problem in the setting where jobs have placement constraints, and machines are heterogeneous [26], and there is no precedence constraints either. They developed an efficient algorithm to minimize the sum-cost.

III Problem Formulation

In this section, we describe the cloud system and the problem formulation. The formulation here overlaps with the one in [21]. Assume there is a set of cloud computing workflows denoted by $\cal W$ = {1, 2, … , $W$ }. For each workflow $w\in W$ , it contains one or more jobs. The total pool of jobs is denoted by $\cal J$ = {1, 2, … , $J$ }. Each job $j\in\cal J$ can only belong to one workflow $w\in\cal W$ . Let ${\cal J}_{w}$ denote the set of jobs belonging to workflow $w$ . For job $j$ , the minimum CPU requirement of job $j$ is denoted by $c_{j}$ , and the minimum memory requirement of job $j$ is denoted by $m_{j}$ . In a workflow, a job can depend on other jobs, i.e., a job cannot start until some other jobs finish execution. The job dependency is usually captured by a workflow DAG. Each job in the workflow is a node in the graph and the dependency relations are denoted by directed edges between two nodes. It is more convenient for us to represent the job dependency DAG as a matrix $L=(l_{ij})$ , $\forall i,j\in\cal J$ . If job $i$ depends on job $j$ , we set $l_{ij}=1$ ; $l_{ij}=0$ means that job $i$ does not depend on job $j$ . If $l_{ij}=1$ , then the start time of job $i$ should be no earlier than the finish time of job $j$ , which is a precedence constraint.

For the cloud system resource, we consider a set of virtual machines (VMs) $\cal V$ = 1, 2, … , $V$ , possibly of different types and capabilities. Let $C_{k}$ represent the number of vCPUs of VM $k$ , and $M_{k}$ represent the amount of memory of VM $k$ . We assume a discrete time model, where time is divided into a sequence of time slots $1,2,...,T$ , for instance, $5$ minutes per time slot. At any time slot $t$ , there can be at most one job allocated to any VM. We also assume non-preemptive scheduling of jobs. Let us characterize the amount of computation of job $j$ in terms of vCPU-time-slots, denote it by $h_{j}$ . Therefore when job $j$ runs on VM $k$ , the running time of job $j$ , $R_{jk}$ , can be computed as $R_{jk}$ = $h_{j}$ / $C_{k}$ , which is measured in number of time slots. We consider the popular pay-as-you-go cloud computing that charges based on the operating time of VMs. Suppose after running VM $k$ for a unit time, the user will be charged a cost of $p_{k}$ . Suppose all the workflows in question belong to the same user, which has a total budget of $D$ . We consider the problem of minimizing the finish time of all the workflows, i.e., the makespan, subject to the budget constraint and various other constraints. More specifically, for each job, we decide the VM and the starting time slot to which the job is assigned. The goal is that the overall VM leasing cost is within the budget $D$ and the makespan of all the workflows is minimized.

Next, we specify the various constraints. Let us denote the job-VM assignment decision by the binary variables $x_{jk}^{t}$ . We set $x_{jk}^{t}=1$ if and only if job $j$ is assigned to VM $k$ and it starts at time slot $t$ . For each job $j$ , only one of the $x_{jk}^{t}$ is equal to $1$ .

[TABLE]

When we choose the appropriate VM for job $j$ , job $j$ ’ s minimum resource requirement must be satisfied.

[TABLE]

Let us discuss the precedence constraint. We note that the precedence constraint is active only if $l_{ij}=1$ . The start time of job $i$ can be defined as $\sum_{k\in\cal V}\sum_{t\in\cal T}tx_{ik}^{t}$ . The finish time of job $j$ can be described as $\sum_{k\in\cal V}\sum_{t\in\cal T}(t+R_{jk})x_{jk}^{t}$ . The precedence constraint says that if job $i$ depends on job $j$ , then job $i$ cannot start earlier than the finish time of job $j$ .

[TABLE]

There is one additional constraint that at most one job runs on a VM at any time.

[TABLE]

We explain the constraint (5) in more details. If job $i$ ’s execution occupies time slot $t$ of VM $k$ , then job $i$ ’s start time is from the set $\{max(0,t-R_{ik}+1),...,t\}$ . It is equivalent to saying that $\sum_{r=\max(0,t-R_{ik}+1)}^{t}x_{ik}^{r}=1$ for job $i$ . According to the non-preemptive requirement, at any time slot $t$ , for any VM $k$ , there is at most one job that can start execution at time $t$ . Therefore, we have (5). We show that (5) is sufficient to guarantee the existence of an non-preemptive scheduling. Suppose for job $j$ , $x_{jk}^{s}=1$ for some time slot $s$ and some VM $k$ . For each time slot $t$ from $s$ to $s+R_{jk}-1$ , together with (5) and $x_{jk}^{s}=1$ , we have

[TABLE]

Thus, for each $i\not=j$ , $x_{ik}^{r}=0$ for $r\in{\max(0,t-R_{ik}+1),\cdots,t}$ . By varying $t$ from $s$ to $s+R_{jk}-1$ , we see that job $i$ cannot start on $\{max(0,s-R_{ik}+1),...,s+R_{ik}-1\}$ . We conclude that no other jobs can interfere with job $j$ ’s execution.

Let the variable $d$ denote an upper bound of the finish time of all the workflows. We have

[TABLE]

The budget constraint of executing the workflows can be written as:

[TABLE]

The workflow scheduling problem with the pay-as-you-go pricing model can be written as follows:

[TABLE]

Note that data transfer costs between jobs are not directly considered in the formulation (9). We assume that data transfer takes place in the internal network of a datacenter, and the transfer rate is stable. Therefore the data transferring time between each pair of jobs is a constant and can be included as a part of the job’s running time $R_{jk}$ [17] [22] [27].

III-A Solve the problem by MIP software

The Min-Makespan problem (9) is a complex integer programming problem and is usually hard to solve. Gurobi is the state-of-art MIP software, and is capable of solving small to medium sized problems. We will use Gurobi to solve some instances of the Min-Makespan problem. But, the goal is to provide a baseline for performance comparison with the heuristic algorithm that we will propose in Section V.

IV A Motivating Example

Consider a workflow with $12$ jobs shown in Fig. 1. There are $3$ VMs and the leasing cost of each VM is shown in Table I. The execution time of each job on each VM is shown in Table II.

In the well-known priority-based greedy algorithm in [6], each job is assigned an upward-rank, which is a value. The jobs are sorted in a non-increasing order according to the upward-rank, and the resulting ordered list gives the priorities to the jobs according to which the jobs are assigned to the VMs.

IV-A Job scheduling priorities

The upward-rank of a job $j$ is recursively defined as

[TABLE]

In (10), ${\cal V}_{j}=\{k|k\in{\cal V}\text{ and }C_{k}\geq c_{j},M_{k}\geq m_{j}\}$ is the set of the VMs that has the capacity to accept job $j$ . Then, $\bar{R}_{j}$ is the average job processing time over the VM in the set ${\cal V}_{j}$ . The set $\text{succ}(j)$ is the set of all successor jobs of job $j$ in the workflow DAG. The upward-rank of the exit job in the DAG, $w_{j_{\text{exit}}}$ , is defined as its average processing time. The upward-rank of any other job, $w_{j}$ , can be computed recursively by traversing from the exit job upward as in (12). In fact, the upward-rank of a job is the aggregated upward-ranks along the critical (the longest in terms of upward-rank) path from the exit job to the current job. In the upward-rank-based job scheduling in [6], all the jobs are sorted according to the upward-rank non-increasingly; the job with the highest upward-rank is scheduled first, and will be assigned a VM by a separate job-VM matching algorithm, such as the HBCS algorithm in [15]. We will call the priority generation scheme in [6] the plain upward-rank priority scheme.

In the plain upward-rank priority scheme, equation (12) ensures that the upward-rank of a job is higher than all its successors (including the non-immediate successors). Therefore, a job is selected with a higher priority than all its successors for VM assignment. However, for the jobs that have no precedence constraints among each other, the upward-rank is not a good enough indicator of a job’s scheduling priority.

For instance, in Fig. 1, jobs $n_{3}$ , $n_{8}$ and $n_{9}$ do not depend on each other. As shown in Table III, the plain upward-ranks of $n_{3}$ , $n_{8}$ and $n_{9}$ are $38$ , $38$ and $38$ , respectively. Thus, the tie across jobs $n_{3}$ , $n_{8}$ and $n_{9}$ needs to be broken arbitrarily in scheduling. But, based on the DAG in Fig. 1, jobs $n_{8}$ and $n_{9}$ are more intricately related with other jobs in the workflow, and, to shorten the workflow makespan, it might be worthwhile to assign higher priorities to $n_{8}$ and $n_{9}$ . We will later propose a weighted upward-rank priority scheme in Section V. In Table III, we show the ranks and the corresponding order of the jobs. With the weighted ranks, jobs $n_{8}$ and $n_{9}$ have higher priorities than job $n_{3}$ , and will be scheduled earlier than $n_{3}$ . After generating the priority list, we apply the HBCS algorithm from [15] to assign each job to a VM. Tables IV and V show the final scheduling results for the two priority generation schemes, respectively. In Table IV, job $n_{3}$ is scheduled before $n_{8}$ and $n_{9}$ . Job $n_{3}$ occupies the faster $\text{VM}_{2}$ , and the final makespan is $78$ . In Table V, $n_{8}$ and $n_{9}$ are assigned higher priorities because of our new priority generation scheme. Job $n_{8}$ can choose the faster VM, which results in a makespan of $64$ .

Hence, in assigning job scheduling priorities, we need to evaluate the importance of a job by considering not only the jobs on its critical path but also its relationship with other jobs.

IV-B Budget splitting

In HBCS, the spare budget is preferentially assigned to the jobs with the higher priority. Because of the greedy nature of HBCS, the jobs with higher priorities tend to use more expensive and faster VMs, whereas the jobs with lower priorities often do not have too many options because the remaining balance is more limited.

From Table IV and Table V, it can be seen that the available budget for the jobs with lower priorities is very limited under both priority generation schemes. If we split the spare budget evenly as shown in Table VI, more budget will be allocated to jobs with lower priorities. These jobs will enjoy more flexibility in selecting better VMs, which results in shorter makespan, as shown in Table VI. The conclusion is that the spare budget should be split across the jobs more evenly.

V A Heuristic Algorithm

Motivated by the example in Section V, we develop a heuristic algorithm to solve the Min-Makespan problem. The algorithm has two key components. One is the weighted upward-rank priority scheme, which uses the stationary distribution of a random walk on the DAG as the weights. The other is uniform spare budget splitting. For scheduling multiple workflows, we make an extended DAG by adding pseudo entry and exit nodes to connect multiple DAGs. The scheduling priorities of the jobs across all the workflows are computed based on the extended DAG. In Fig. 2, we show two typical workflow DAGs. By adding pseudo entry and exit nodes, job $n_{0}$ and $n_{13}$ , we have an extended DAG shown in Fig. 3.

V-A Weighted upward-rank priority scheme using random walk

According to the discussion in Section IV, when we compute a job’s scheduling priority, it needs to consider both the jobs on the critical path and the other jobs as well. We follow the upward-rank based priority scheme originally proposed in [6]. We propose to construct a random walk on the (extended) workflow DAG, and extend the plain scheme by applying the random walk stationary distribution probabilities as weights to the plain ranks. More specifically, for each job $j$ , the plain upward rank represents the accumulated processing time of successors on its critical path, and its weight (i.e., the stationary probability $\pi_{j}$ ) represents the importance of job $j$ in the global DAG topology. The rationale behind is that if a job is more complicated related with other jobs in the topology, the job is more important and deserves a higher priority as discussed in Section IV. The stationary probability vector $\pi$ of the random walk on the workflow DAG can be interpreted as the recurrence probability of each state in the limited distribution. Generally, if a state $j$ ’s stationary probability $\pi_{j}$ is higher than other states, it implies that the system state prefers to transit from other states to state $j$ and state $j$ is more important. Hence the vector $\pi$ is a good indicator of the importance of jobs and can be used as weights of the plain upward-rank.

We describe the detailed procedure of construction the random walk. Because the DAG is acyclic, we add directed edges to the DAG from each exit node to each entry node. In the new graph, the set of successors of any node $j$ is not empty. The random walk is on this digraph. Let the transition probability from job (state) $j$ to job (state) $i$ be denoted by $p_{ji}$ . We set

[TABLE]

Thus, from state $j$ , the random walk will visit its immediate successors with equal probabilities. Note that, if job $i$ does not depend on job $j$ , then $p_{ji}=0$ . We show the transition probabilities of an example DAG in Fig. 3.

Let $\pi_{j}$ denote the stationary probability for state $j$ . The stationary probabilities can be computed by solving the following equations.

[TABLE]

We use the stationary probability $\pi_{j}$ as a measure of importance of job $j$ . The weighted upward-ranks are defined recursively as follows.

[TABLE]

V-B Uniform spare budget splitting

After the jobs’ scheduling priorities are determined, we need to split the budget across the jobs. In order to guarantee that each job can rent a VM, a job $j$ needs to receive a minimum budget, denoted by $D^{\min}_{j}$ , given by

[TABLE]

Thus, a feasible budget $D$ should be no less than the aggregate minimum budget of each job, i.e.,

[TABLE]

For the spare budget $D-\sum_{j\in\cal J}D^{\min}_{j}$ , we propose to split it evenly across the jobs. Hence, the reserved budget of each job $j$ is computed as

[TABLE]

We summarize the overall scheduling algorithm in Algorithm 1. Note that in Step $1(e)$ , we can also use the plain upward-rank priority scheme. The resulting algorithm is still a new algorithm, compared with the algorithm in [6], because of the new way of splitting the spare budget - uniform splitting.

VI Experiments

In this section, we present the comparative evaluation results of our algorithms, the algorithms of MSLBL [12], HBCS [15] and BHEFT [14], and the optimal baseline solution generated by Gurobi. In Table VIII, we list the shorthands for these algorithms, which will be used throughout this section. We first describe a single workflow scenario, where various experimental cases and algorithms are tested and results are reported. Then, we move to a multiple workflow scenario, where we compare our algorithm with random and round-robin priority generation schemes. In the experiments, we use a broad range of workloads, including workflows from real applications and randomly generated workflows.

VI-A Workflow setup

We use four types of real-world workflows including the Fast Fourier transform parallel application (FFT), Gaussian elimination parallel application [6], scientific workflows, and real in-production workflows from an Internet streaming service company in China.

In generating the FFT workflows, we use a parameter $m$ to set the size of the FFT application. The number of jobs is $N=2m-1+m\log_{2}m$ , where $m=2k$ for some integer $k$ . Furthermore, an FFT workflow enjoys a symmetry. The aggregated execution time of the jobs on any path from the starting job to any of the exiting jobs is equal. Thus, any path in an FFT DAG is a critical path. For the Gaussion elimination workflows, the number of jobs is set to be $N=\frac{n^{2}+n-2}{2}$ , where $n$ is the number of rows of a square matrix. We also evaluate other scientific workflows including Montage, CyberShake, Epigenomics, LIGO Inspiral Analysis and SIPHT, which are by an open source scientific workflow generator [28].

Finally, we obtained a $19$ -hour-long logs of an in-production cluster from an Internet streaming service company in China. The cluster carried $2347$ workflows including MapReduce, Spark, Hive, Shell during the $19$ hours. A workflow may contain multiple jobs, and a job may contain multiple parallel tasks. The logs show that $82.7\%$ workflows only contain no more than $5$ jobs; the remaining $17.3\%$ workflows contain the number of jobs ranging from $6$ to $375$ , and these workflows occupy more than $60\%$ of the CPU and memory resources. We evaluate the algorithms on $5$ typical workflows with different numbers of jobs.

VI-B Other parameters

We resort to simulation to compare the algorithms. All simulation experiments are conducted on a PC platform with an Intel Core $i7$ $2.60$ GHz CPU and $8$ GB memory. We use $23$ VM types as tabulated in Table VII, which follow the VM setup in Amazon’s EC2 as close as we can [29]. To see the influence of the number of available VM, we test with three different levels of VM sufficiency: Scarce, Normal and Sufficient. In the Scarce case, the number of VMs is half of the number of jobs; in the Normal case, the two numbers are equal; in the Sufficient case, the number of VMs is $1.5$ times of the number of jobs. In all the three cases, $2/3$ of the VM instances are assigned to the VM types with no more than $8$ vCPUs in Table VII; and the other $1/3$ of the VM instances are assigned to the VM types with more than $8$ vCPUs. Finally, the number of instances of each VM type is generated randomly.

We also vary the budget as in (23).

[TABLE]

where $D_{\min}$ is the cost of using the cheapest schedule, and $D_{\max}$ is the cost obtained by the HEFT algorithm. The budget level factor $\varphi\in\{0,0.25,0.5,0.75,1.0\}$ is used to vary the budget level.

Finally, sometimes an algorithm may fail to find a feasible schedule, either because of the high complexity of the algorithm, or due to the greedy nature. When that happens, a failure is reported. We report the algorithm success rates in the results.

VI-C Summary of performance ranking

We first summarize the overall performance of Gurobi, BAVE, BAVE_M, MSLBL, MSLBL_M, HBCS and BHEFT by counting their ranks in terms of the obtained makespans. For each test case, we order the algorithms in the increasing order of makespans; then we count their ranks for each type of workflows. We also use an average ranked value (AR) proposed in [14] to evaluate the performance of algorithms. The value AR is defined as

[TABLE]

where $N_{cases}$ is the number of test cases, and $R_{i}$ is the count for rank $i$ . A smaller AR value of an algorithm stands for a better performance in average. In Table IX - XVII, the results of rank counting and the associated AR values are reported. For brevity, only the counting of the first three places are shown in tables. For Gurobi, HBCS and BHEFT algorithms, because they sometimes fail to find feasible solutions, their AR values are not reported. By inspecting the AR values of all workflow types, we draw the following conclusion.

•

For FFT and randomly generated workflows, BAVE algorithms achieves the best performance in average. The uniform extra budget splitting scheme outperforms the scheme of splitting budget in proportion to extra demand. The weighted priority scheme cannot further improve the makespan when it combines with the uniform extra budget splitting scheme. For FFT workflows, the weighted scheme even results in longer makespans in several test cases. Anyway, when we apply the weighted priority scheme to MSLBL algorithm, the makespans are reduced.

•

For Guassian and other scientific workflows, and the workflows obtained from the in-production cluster, BAVE_M algorithm achieves the best performance in average. Both the weighted priority and uniform extra budget splitting schemes help to improve the makespans.

We conclude that the weighted priority scheme using random walk and the uniform spare budget splitting strategy help to improve the makespans in average for most of the test cases.

Finally, in order to separate out the improvement achieved by the weighted priority scheme and that by the uniform spare budget splitting scheme, we compare the algorithm ranking results between BAVE and MSLBL, and the results between BAVE and BAVE_M in Fig. 4 and Fig. 5, respectively. In Fig. 4, algorithm BAVE outperforms algorithm MSLBL on average for most workflow types except the workflow types CyberShake and Sipht. The results can be interpreted as showing the advantage of the uniform spare budget splitting scheme. Fig. 5 shows that the weighted priority scheme improves the average performance compared with the plain priority scheme by decreasing AR for most workflow types except the workflow types FFT and random.

VI-D Detailed experimental results

In this section, we plot and discuss the detailed performance of each test case.

VI-D1 FFT

In Fig. 6, we show the normalized makespans for the FFT workflows. In each test case, the makespans obtained by different algorithms are normalized with respect to the smallest one and are plotted to show the performance. For instance, In Fig. 6(a), we show an FFT workflow with $15$ jobs. All the algorithms are tested with different levels of VM sufficiency and budgets. Gurobi achieves the best makespan with sufficient VMs and budget level of $\varphi=1.0$ , and hence the makespans obtained by other algorithms and settings are normalized with respect to this specific optimal value in the plot shown in Fig. 6(a).

In the small experiment with $N=15$ jobs, Gurobi always achieves the best makespan which can be used as the performance baseline. In the VM sufficiency case of Scarce, the results show that the budget level $\varphi$ has a great impact on the resulted makespan, which drops quickly with the increased budget level $\varphi$ . In the VM sufficiency case of Normal, the makespan improves with $\varphi$ when $\varphi$ is small. The difference of makespans between $\varphi=0.0$ and $\varphi=1.0$ is significantly narrowed. When there are Sufficient supplies of VMs, the makespan is further reduced when the budget is plenty. We notice that Gurobi produces solutions with slightly better makespan only in the cases of Scarce, and the case of $\varphi=0.0$ and VM sufficiency Normal. The algorithms BAVE and BAVE_M achieve almost the same performance as Gurobi. The makespans produced by MSLBL and MSLBL_M are slightly worse in the cases of Scarce. The HBCS and BHEFT algorithms cannot always find a solution.

When the size of workflow increases to $N=95$ jobs, Gurobi fails to find a solution. There are $52,448$ binary variables in the problem formulation, which is extremely large for an integer programming problem. In the experiments with $N=95,223,1151$ and $2559$ jobs, the algorithms BAVE and BAVE_M achieve the best makespan in almost all the test cases. The algorithms HBCS and BHEFT cannot always find a solution.

VI-D2 Gaussian elimination

For the Gaussian elimination workflows, the BAVE and BAVE_M algorithms always achieve the best makespan for all test cases with budget level at $\varphi=0.0$ or $\varphi=1.0$ . For other budget levels, $\varphi=0.25,0.5,0.75$ , BAVE always outperforms MSLBL, and BAVE_M always outperforms MSLBL_M. The HBCS and BHEFT algorithms perform poorly in most cases. For most cases, the algorithms with the weighted upward-rank priority generation scheme achieve better makespan. Overall, the proposed BAVE and BAVE_M work well for Gaussian elimination workflows.

We also observe an unusual test case. In the test case of $N=1175$ , VM Sufficient and $\varphi=0.0$ , the makespans obtained by BAVE, BAVE_M, MSLBL, MSLBL_M are significantly larger than that of the test case where VM sufficiency is Normal. This is because that the set of VM instances are generated randomly and can be substantially different for different VM sufficiency. The increase in VM sufficiency does not necessarily lead to performance improvement. These kind of rare cases happen occasionally in the other tests.

VI-D3 Other scientific workflows

We show the evaluation results of other scientific workflows in Fig. 8. For the CyberShake workflow with $N=1000$ jobs, the BAVE algorithm performs no better than MSLBL. When it is applied with the weighted priority scheme, the makespans are reduced and BAVE_M performs the best in average. For the Sipht workflow, though the BAVE and BAVE_M algorithms perform no better than MSLBL and MSLBL_M, the weighted priority scheme produces shorter makespans than the plain one in most cases. For other kinds of workflows, the BAVE and BAVE_M algorithms perform the best in most test cases. The performance of HBCS and BHEFT is poor when the budget is not sufficient.

VI-D4 Randomly generated workflows

We tested various randomly generated workflows and the main results are shown in Fig. 9. Both the BAVE and BAVE_M algorithms perform the best compared with other algorithms. There is no significant performance difference between the plain and weighted priority schemes.

VI-D5 Workflows from an Internet streaming service company

Finally, we tested workflows obtained from an Internet streaming service company. The workflows obtained from the in-production cluster contain multiple jobs, and each job may contain multiple parallel tasks. Therefore, we evaluate the scale of each workflow by the number of tasks it carries. The workflow with $N=39$ tasks is the largest test case that Gurobi can solve. The workflow with $N=1453$ is a medium-sized one that has the highest occurrence rate among all medium-sized workflows. The workflow with $N=9113$ is the largest workflow we obtained. The results in Fig. 10 show that BAVE and BAVE_M outperform other algorithms, and the weighted priority scheme achieves better performance than the plain one.

VI-D6 Success rate of finding a schedule

Algorithms HBCS and BHEFT cannot find solutions when the budget is limited or the available VMs are limited for all the workflows we tested. The other four algorithms can always produce a solution. In Fig. 11, we plot the scheduling success rates for the FFT workflows. The success rates of other workflows have a similar pattern.

VI-E Multiple workflows

We conducted tests on multiple workflows. For each workflow type discussed in Section VI-A, we generate a workflow with at least $1000$ jobs. All workflows are combined to create a mixed set with $N=9744$ jobs. In the test, we vary the number of VMs and test four algorithms: BAVE, BAVE_M, round robin, and random. For the random strategy, we run $1000$ random tests for each test case and report the average results. The achieved makespan is summarized in Table XVIII, XIX and XX. The results show that with more flexible budgets, the BAVE and BAVE_M algorithms achieve better makespan than the round robin and random strategies. The BAVE_M algorithm outperforms the plain BAVE algorithm in most cases.

VII Conclusion

DAG-based complex workflows are becoming significant workload in the cloud. In scheduling workflows, the budget constraint is an important factor of consideration due to the pay-as-you-go nature of the cloud. In this paper, we formulate the workflow scheduling problem with budget constraints as an integer programming model. Improving upon the plain upward-rank priority scheme, we propose a weighted scheme using the stationary probabilities of a random walk on the digraph as the weights. We further design a uniform spare budget splitting strategy, which assigns the spare budget uniformly across all the jobs. The empirical results show that the uniform spare budget splitting scheme outperforms the earlier scheme that splits the spare budget in proportion to extra demand, and the weighted priority scheme further improves the workflow makespan. The advantage of the weighted priority scheme is due to its ability to evaluate the jobs’ global importance in the workflow, by considering not only the jobs on the critical path but also off the critical path. Because of the diversity and complexity of workflow types in production, there may be some other unknown factors yet to be studied. Deep analysis of the structural characteristics of different workflows may lead to some new discovery and help design a further improved task priority assignment strategy. For instance, we can borrow the idea proposed in LDCP [9] that assigns a higher priority to a job with more children whenever there is a tie. These kinds of refinement that relies on deep analysis of the workflow topologies will be a direction of future research.

Acknowledgment

This work was supported by the Shanghai Committee of Science and Technology, China (Grant No. 14510722300, 18DZ2203900).

Bibliography29

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] G. Juve, E. Deelman, G. B. Berriman, B. P. Berman, and P. Maechling, “An evaluation of the cost and performance of scientific workflows on amazon EC 2,” J. Grid Comput. , vol. 10, no. 1, pp. 5–21, Mar. 2012.
2[2] Y. Wang and W. Shi, “Budget-driven scheduling algorithms for batches of mapreduce jobs in heterogeneous clouds,” IEEE Transactions on Cloud Computing , vol. 2, no. 3, pp. 306–319, July 2014.
3[3] M. A. Rodriguez and R. Buyya, “Deadline based resource provisioningand scheduling algorithm for scientific workflows on clouds,” IEEE Transactions on Cloud Computing , vol. 2, no. 2, pp. 222–235, April 2014.
4[4] J. Lenstra, A. R. Kan, and P. Brucker, “Complexity of machine scheduling problems,” in Studies in Integer Programming , ser. Annals of Discrete Mathematics, P. Hammer, E. Johnson, B. Korte, and G. Nemhauser, Eds. Elsevier, 1977, vol. 1, pp. 343 – 362.
5[5] A. S. Schulz, “Scheduling to minimize total weighted completion time: Performance guarantees of lp-based heuristics and lower bounds,” in Integer Programming and Combinatorial Optimization , W. H. Cunningham, S. T. Mc Cormick, and M. Queyranne, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 1996, pp. 301–315.
6[6] H. Topcuoglu, S. Hariri, and M.-Y. Wu, “Performance-effective and low-complexity task scheduling for heterogeneous computing,” IEEE Transactions on Parallel and Distributed Systems , vol. 13, no. 3, pp. 260–274, Mar 2002.
7[7] Gurobi Optimization: The state-of-the-art mathematical programming solver for prescriptive analytics , Gurobi, http://www.gurobi.com/, accessed on 20.09.2018.
8[8] Y.-K. Kwok and I. Ahmad, “Static scheduling algorithms for allocating directed task graphs to multiprocessors,” ACM Comput. Surv. , vol. 31, no. 4, pp. 406–471, Dec. 1999.

$n_{i}$	${VM}_{1}$	${VM}_{2}$	${VM}_{3}$
1	16	14	7
2	19	13	16
3	17	11	10
4	13	8	15
5	12	13	8
6	13	16	7
7	6	16	9
8	12	11	5
9	8	9	11
10	21	7	14
11	12	8	16
12	21	7	14

$n_{i}$	${VM}_{1}$	${VM}_{2}$	${VM}_{3}$
1	16	14	7
2	19	13	16
3	17	11	10
4	13	8	15
5	12	13	8
6	13	16	7
7	6	16	9
8	12	11	5
9	8	9	11
10	21	7	14
11	12	8	16
12	21	7	14

$n_{i}$	${VM}_{1}$	${VM}_{2}$	${VM}_{3}$
1	16	14	7
2	19	13	16
3	17	11	10
4	13	8	15
5	12	13	8
6	13	16	7
7	6	16	9
8	12	11	5
9	8	9	11
10	21	7	14
11	12	8	16
12	21	7	14