Linear Time Algorithms for Multiple Cluster Scheduling and Multiple   Strip Packing

Klaus Jansen; Malin Rau

arXiv:1902.03428·cs.DS·February 12, 2019

Linear Time Algorithms for Multiple Cluster Scheduling and Multiple Strip Packing

Klaus Jansen, Malin Rau

PDF

TL;DR

This paper introduces an optimal linear-time approximation algorithm for multiple cluster scheduling and strip packing problems, improving computational efficiency while maintaining a 2-approximation ratio, and explores practical heuristics with better ratios in specific cases.

Contribution

The authors develop an $O(n)$ time algorithm achieving a 2-approximation for both problems, improving over previous algorithms with much higher running times, and propose practical heuristics with improved ratios for certain instances.

Findings

01

An $O(n)$ algorithm with a 2-approximation ratio for both problems.

02

A practical $O(n \log n)$ algorithm with a 9/4-approximation for specific instances.

03

The approach of scheduling on one cluster then distributing can be applied broadly.

Abstract

We study the Multiple Cluster Scheduling problem and the Multiple Strip Packing problem. For both problems, there is no algorithm with approximation ratio better than $2$ unless $P = N P$ . In this paper, we present an algorithm with approximation ratio $2$ and running time $O (n)$ for both problems. While a $2$ approximation was known before, the running time of the algorithm is at least $Ω (n^{256})$ in the worst case. Therefore, an $O (n)$ algorithm is surprising and the best possible. We archive this result by calling an AEPTAS with approximation guarantee $(1 + ε) O P T + p_{m a x}$ and running time of the form $O (n lo g (1/ ε) + f (1/ ε))$ with a constant $ε$ to schedule the jobs on a single cluster. This schedule is then distributed on the $N$ clusters in $O (n)$ . Moreover, this distribution technique can be applied to any variant of of Multi Cluster…

Tables1

Table 1. Table 1: Overview of the results for MCS and MSP .

Problem	Ratio	Remarks	Source
MCS, MSP	$2 + ε$	Needs solving of Scheduling on Identical Machines with ratio $1 + ε / 2$	[24]
MCS	$2$	Worst case running time at least $Ω (n^{256})$ ; can handle clusters with different sizes	[17]
MSP	$2$	Worst case running time at least $Ω (n^{256})$	[2]
MCS, MSP	AFPTAS	Additive constant in $𝒪 (1 / ε^{2})$ , and $𝒪 (1)$ for large values for $N$	[2]
MCS	$3$	Fast algorithm that can handle clusters with different sizes	[19]
MCS	$5 / 2$	Fast algorithm	[3]
MCS	$7 / 3$	Fast algorithm	[5]
MCS	$2$	Fast algorithm; requires $\max_{j \in 𝒥} q (j) \leq 1 / 2 \cdot m$	[5]
MCS, MSP	$2 O P T$	Running time $𝒪 (n)$ for $N \geq 3$ and $𝒪 (n \log (n))$ for MCS and $N = 2$ , $𝒪 (n \log^{2} (n))$ for MSP and $N = 2$	This paper
MCS, MSP	AEPTAS	Additive term $p_{\max}$ ; linear in $n$	This paper
MCS		Approximation ratio $9 / 4$ if $N mod 3 = 0$ and if $N$ is large	This paper
PTS, SP	AEPTAS	Additive term $p_{\max}$ ; linear in $n$	This paper

Equations76

p \in P_{L} \sum s \in S (p)_{\leq s^{'}}, s + p > s^{'} \sum x_{s, p}

p \in P_{L} \sum s \in S (p)_{\leq s^{'}}, s + p > s^{'} \sum x_{s, p}

s \in S (p) \sum x_{s, p}

x_{s, p}

T (n, 1)

T (n, 1)

T (n, d)

T (n, d)

T (n, d)

\leq c (n /2) (lo g (d^{'}) + 1) + c (n /2) (lo g (d^{''}) + 1) + c n

\leq c (n /2) \cdot (lo g (d^{'}) + 1 + lo g (d^{''}) + 1) + c n

= c (n /2) (lo g (d^{'} d^{''})) + 2 c n

\leq c (n /2) (lo g ((d /2)^{2})) + 2 c n

= c n lo g (d) + c n

C \in C_{m} \sum x_{C, ∣ S ∣ + 1}

C \in C_{m} \sum x_{C, ∣ S ∣ + 1}

C \in C_{m_{l}} \sum x_{C, l}

l = 1 \sum ∣ S ∣ C \in C_{m_{l}} \sum x_{C, l} a_{j, C}

x_{C, l}

C \in C_{m_{l}} \sum x_{C, l} = (1 + ε^{2}) ε δ T \forall l = 1, \dots, ∣ S ∣,

C \in C_{m_{l}} \sum x_{C, l} = (1 + ε^{2}) ε δ T \forall l = 1, \dots, ∣ S ∣,

C \in C_{m} \sum x_{C, ∣ S ∣ + 1} = (1 + ε^{2}) εT /2 .

C \in C_{m} \sum x_{C, ∣ S ∣ + 1} = (1 + ε^{2}) εT /2 .

B := ⎩ ⎨ ⎧ x \in R_{+}^{∣ S ∣∣ C_{m} ∣} C \in C_{m_{l}} \sum x_{C, l} = ε δ T, \forall l = 1, \dots, ∣ S ∣ + 1 ⎭ ⎬ ⎫ .

B := ⎩ ⎨ ⎧ x \in R_{+}^{∣ S ∣∣ C_{m} ∣} C \in C_{m_{l}} \sum x_{C, l} = ε δ T, \forall l = 1, \dots, ∣ S ∣ + 1 ⎭ ⎬ ⎫ .

max j \in J \sum \frac{q _{j}}{p _{j}^{l}}

max j \in J \sum \frac{q _{j}}{p _{j}^{l}}

j \in J \sum q_{j} a_{j}

a_{j}

O (M (ln (M) + ρ^{- 2}) ∣ S ∣ (∣ \overset{ˉ}{J}_{S, W} ∣ + (lo g (1/ ρ))^{3} / ρ^{2}))

O (M (ln (M) + ρ^{- 2}) ∣ S ∣ (∣ \overset{ˉ}{J}_{S, W} ∣ + (lo g (1/ ρ))^{3} / ρ^{2}))

= O (∣ \overset{ˉ}{J}_{S, W} ∣∣ S ∣ (ln (∣ \overset{ˉ}{J}_{S, W} ∣) + 1/ ε^{4}) (∣ \overset{ˉ}{J}_{S, W} ∣ + (lo g (1/ ε))^{3} / ε^{4}))

= O ((lo g (1/ ε))^{3} / ε^{11} δ),

O ((∣ S ∣∣ \overset{ˉ}{J}_{S, W} ∣ (ln (∣ \overset{ˉ}{J}_{S, W} ∣) + ε^{- 4})) (∣ S ∣ + ∣ \overset{ˉ}{J}_{S, W} ∣)^{1.5356}) \leq O (1/ ε^{12} δ^{3})

O ((∣ S ∣∣ \overset{ˉ}{J}_{S, W} ∣ (ln (∣ \overset{ˉ}{J}_{S, W} ∣) + ε^{- 4})) (∣ S ∣ + ∣ \overset{ˉ}{J}_{S, W} ∣)^{1.5356}) \leq O (1/ ε^{12} δ^{3})

O ((∣ S ∣∣ \overset{ˉ}{J}_{S, W} ∣ (ln (∣ \overset{ˉ}{J}_{S, W} ∣) + ε^{- 4})) ((∣ S ∣ + ∣ \overset{ˉ}{J}_{S, W} ∣)^{1.5356} + (lo g (1/ ε))^{3} / ε^{4})) \leq O (1/ ε^{12} δ^{3}) .

O ((∣ S ∣∣ \overset{ˉ}{J}_{S, W} ∣ (ln (∣ \overset{ˉ}{J}_{S, W} ∣) + ε^{- 4})) ((∣ S ∣ + ∣ \overset{ˉ}{J}_{S, W} ∣)^{1.5356} + (lo g (1/ ε))^{3} / ε^{4})) \leq O (1/ ε^{12} δ^{3}) .

NFDH (L) \leq 2 W (L) / m + p_{ma x} \leq 2 \cdot OPT (L) + p_{ma x} .

NFDH (L) \leq 2 W (L) / m + p_{ma x} \leq 2 \cdot OPT (L) + p_{ma x} .

O (lo g (n) / ε^{2} + n lo g (1/ ε) + lo g (1/ ε δ) ((∣ S ∣^{1/ δ α} (∣ S ∣∣ P_{L} ∣/ α)^{∣ S ∣ + ∣ P_{L} ∣})) (1/ ε^{10} δ^{2}))

O (lo g (n) / ε^{2} + n lo g (1/ ε) + lo g (1/ ε δ) ((∣ S ∣^{1/ δ α} (∣ S ∣∣ P_{L} ∣/ α)^{∣ S ∣ + ∣ P_{L} ∣})) (1/ ε^{10} δ^{2}))

=

idle (τ) := m - j \in J, σ (j) \leq τ < σ (j) + p (j), ρ (j) = i \sum q (j)

idle (τ) := m - j \in J, σ (j) \leq τ < σ (j) + p (j), ρ (j) = i \sum q (j)

τ m - j \in J, σ (j) + p (j) \leq τ \sum q (j) p (j) + j \in J, σ (j) \leq τ < σ (j) + p (j) \sum q (j) (τ - σ (j)) .

τ m - j \in J, σ (j) + p (j) \leq τ \sum q (j) p (j) + j \in J, σ (j) \leq τ < σ (j) + p (j) \sum q (j) (τ - σ (j)) .

C \in C_{W} \sum x_{C, W}

C \in C_{W} \sum x_{C, W}

C \in C_{w} \sum x_{C, w}

l = 1 \sum ∣ S ∣ C \in C_{m_{l}} \sum x_{C, l} a_{j, C}

x_{C, l}

C \in C_{w} \sum x_{C, w} = (1 + ε^{2}) ε δ T \forall w \in W_{B}

C \in C_{w} \sum x_{C, w} = (1 + ε^{2}) ε δ T \forall w \in W_{B}

C \in C_{W} \sum x_{C, W} = (1 + ε^{2}) εT .

C \in C_{W} \sum x_{C, W} = (1 + ε^{2}) εT .

y_{0}

y_{0}

y_{∣ Y ∣ + 1}

y_{j + 1} - y_{j}

y_{π (i, r)} - y_{π (i, l)}

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Linear Time Algorithms for Multiple Cluster Scheduling and Multiple Strip Packing ††thanks: Research was supported by German Research Foundation (DFG) project JA 612 /20-1

Klaus Jansen, Malin Rau

Institute of Computer Science, University of Kiel, 24118 Kiel, Germany

{kj,mra}@informatik.uni-kiel.de

Abstract

We study the Multiple Cluster Scheduling problem and the Multiple Strip Packing problem. For both problems, there is no algorithm with approximation ratio better than $2$ unless $\mathrm{P}=\mathrm{NP}$ . In this paper, we present an algorithm with approximation ratio $2$ and running time $\mathcal{O}(n)$ for both problems. While a $2$ approximation was known before, the running time of the algorithm is at least $\Omega(n^{256})$ in the worst case. Therefore, an $\mathcal{O}(n)$ algorithm is surprising and the best possible. We archive this result by calling an AEPTAS with approximation guarantee $(1+\varepsilon)\mathrm{OPT}+p_{\max}$ and running time of the form $\mathcal{O}(n\log(1/\varepsilon)+f(1/\varepsilon))$ with a constant $\varepsilon$ to schedule the jobs on a single cluster. This schedule is then distributed on the $N$ clusters in $\mathcal{O}(n)$ . Moreover, this distribution technique can be applied to any variant of of Multi Cluster Scheduling for which there exists an AEPTAS with additive term $p_{\max}$ .

While the above result is strong from a theoretical point of view, it might not be very practical due to a large hidden constant caused by calling an AEPTAS with a constant $\varepsilon\geq 1/8$ as subroutine. Nevertheless, we point out that the general approach of finding first a schedule on one cluster and then distributing it onto the other clusters might come in handy in practical approaches. We demonstrate this by presenting a practical algorithm with running time $\mathcal{O}(n\log(n))$ , with out hidden constants, that is a $9/4$ -approximation for one third of all possible instances, i.e, all instances where the number of clusters is dividable by $3$ , and has an approximation ratio of at most $2.3$ for all instances with at least $9$ clusters.

MCS Multiple Cluster Scheduling PTS Parallel Task Scheduling MSP Multiple Strip Packing SP Strip Packing

1 Introduction

In this paper, we study two problems Multiple Cluster Scheduling and Multiple Strip Packing. In the optimization problem Multiple Cluster Scheduling (MCS), we are given $n\in\mathbb{N}$ parallel jobs $\mathcal{J}$ and $N\in\mathbb{N}$ clusters. Each cluster consists of $m\in\mathbb{N}$ identical machines and each job $j\in\mathcal{J}$ has a processing time $p(j)\in\mathbb{N}$ as well as a machine requirement $q(j)\in\mathbb{N}_{\leq m}$ . We define the work of a job $j$ as $\mathcal{W}(j):=p(j)\cdot q(j)$ and define the work of a set of jobs $\mathcal{J}^{\prime}$ as $\mathcal{W}(\mathcal{J}^{\prime}):=\sum_{j\in\mathcal{J}^{\prime}}\mathcal{W}(j)$ . A schedule $S$ of the jobs consists of two functions $\sigma:\mathcal{J}\rightarrow\mathbb{N}$ which assigns jobs to starting points and $\rho:\mathcal{J}\rightarrow\{1,\dots N\}$ , which assigns jobs to the clusters. The objective is to find a feasible schedule of all the jobs, which minimizes the makespan, i.e., which minimizes $\max\{p_{j}+\sigma(j)|j\in\mathcal{J}\}$ . A schedule is feasible if at every time $\tau\in\mathbb{N}$ and any Cluster $i\in\mathbb{N}$ the number of used machines is bounded by $m$ , i.e., if $\sum_{j\in\mathcal{J},\sigma(j)\leq\tau<\sigma(j)+p_{j},\rho(j)=i}q_{j}\leq m$ for all $i\in\mathbb{N}$ and $\tau\in\mathbb{N}$ . If the number of clusters is bounded by one, the problem is called Parallel Task Scheduling (PTS). Note that we can assume that $n>N$ since otherwise an optimal schedule would place each job alone on a personal cluster and thus the problem is not hard.

The other problem that we consider is a closely related variant of MCS, called Multiple Strip Packing (MSP). The main difference is that the jobs have to be allocated on contiguous machines. In the Problem MSP, we are given $n\in\mathbb{N}$ rectangular items $\mathcal{I}$ and $N\in\mathbb{N}$ strips. Each strip has an infinite height and the same width $W\in\mathbb{N}$ . Each item $i\in\mathcal{I}$ has a width $w(i)$ and a height $h(i)$ . The objective is to find a feasible packing of the items into the strips such that the packing height is minimized. A packing is feasible if all the items are placed overlapping free into the strips. If the number of clusters is bounded by one, the problem is called Strip Packing (SP).

Strip Packing and Parallel Task Scheduling are classical optimization problems and the extension of these problems to multiple strips or clusters comes natural. Furthermore, these problems can be motivated by real world problems. One example, as stated in [24], is the following: In operating systems, MSP arises in the computer grid and server consolidation [18]. In the system supporting server consolidation on many-core chip multi processors, multiple server applications are deployed onto virtual machines. Every virtual machine is allocated several processors and each application might require a number of processors simultaneously. Hence, a virtual machine can be regarded as a cluster and server applications can be represented as parallel tasks. Similarly, in the distributed virtual machines environment, each physical machine can be regarded as a strip while virtual machines are represented as rectangles. It is quite natural to investigate the packing algorithm by minimizing the maximum height of the strips. This is related to the problem of maximizing the throughput, which is commonly used in the area of operating systems.

In this paper, we consider approximation algorithms for MCS and Multiple Strip Packing (MSP). We say an approximation algorithm $A$ has an (absolute) approximation ratio $\alpha$ , if for each instance $I$ of the problem it holds that $A(I)\leq\alpha\mathrm{OPT}(I)$ . If an algorithm $A$ has an approximation ratio of $\alpha$ , we say its result is an $\alpha$ -approximation. A family of algorithms consisting of algorithms with approximation ratio $(1+\varepsilon)$ is called polynomial time approximation scheme (PTAS), and a PTAS whose running time is bounded by a polynomial in both the input length $\mathrm{SIZE}(I)$ and $1/\varepsilon$ is called fully polynomial (FPTAS). If the running time of a PTAS is bounded by a function of the form $\mathrm{poly}(\mathrm{SIZE}(I))\cdot f(1/\varepsilon)$ , where $f$ is an arbitrary function, we say the running time is efficient and call it an efficient PTAS or EPTAS. An algorithm $A$ has an asymptotic approximation ratio $\alpha$ if there is a constant $c$ such that $A(I)\leq\alpha\mathrm{OPT}(I)+c$ and we denote a polynomial time approximation scheme with respect to the asymptotic approximation ratio as an A(E)PTAS.

Zhuk [26] proved that MCS and MSP cannot be approximated better than $2$ unless $P=NP$ . There is an algorithm by Ye, Han and Zhang [24] which finds a $2+\varepsilon$ -approximation to the optimal solution for each instance of MCS or MSP. This algorithm needs to solve an EPTAS for Scheduling On Identical Machines as a subroutine. The algorithm with the best running time for this problem is currently given by [15] and it is bounded by $2^{\mathcal{O}(1/\varepsilon\log^{2}(1/\varepsilon))}+\mathrm{poly}(n)$ . As a result the running time of the algorithm from Ye, Han and Zhang [24] is bounded by $2^{\mathcal{O}(1/\varepsilon\log^{2}(1/\varepsilon))}+\mathrm{poly}(n)$ , using [15] and corresponding 2-approximation algorithms for Parallel Task Scheduling , e.g., the List-Scheduling algorithm by Garay and Graham [9], and Strip Packing, e.g., Steinbergs-algorithm [20]. For MCS, the approximation ratio of $(2+\varepsilon)$ was improved by Jansen and Trystram [17] to an algorithm with approximation ratio of 2 and it has a worst case running time of $\Omega(n^{256})$ since it uses an algorithm with running time $n^{\Omega(1/\varepsilon^{1/\varepsilon})}$ with constant $\varepsilon=1/4$ as a subroutine. Furthermore, for MSP there is an algorithm by [2] that has a ratio of $2$ as well. The worst case running time of this algorithm is of the form $\Omega(n^{256})$ as well, for the same reasons.

However, since the worst-case running time for these algorithms with an approximation ratio close to or exactly $2$ is so large, work has been done to improve the runtime at the expense of the approximation ratio. There is a faster algorithm by Bougeret et al. [3] which guarantees an approximation ratio of $5/2$ and has a running time of $\mathcal{O}(\log(np_{\max})n(N+\log(n)))$ . Note that the Multifit algorithm for Schedulin On Identical Machines has an approximation ratio of $13/11$ and a running time of at most $\mathcal{O}(n\log(n)+n\log(N)\log(\mathcal{A}(\mathcal{I})/N))$ , see [25]. Hence using this algorithm as a subroutine in [24], we find a $26/11\approx 2.364$ approximation. In [5] they present an algorithm with approximation ratio $7/3$ with running time $\mathcal{O}(\log(np_{\max})N(n+\log(n)))$ . Furthermore, they present a fast algorithm with approximation ratio $2$ and the same running time for the case that the job with the largest machine requirement needs less than $m/2$ machines. For MCS and MSP, we present $2$ -approximations, where we managed to improve the running time drastically with regard to the $\mathcal{O}$ -notation.

Theorem 1:

There is an algorithm for MCS with approximation ratio $2$ and running time $\mathcal{O}(n)$ if $N>2$ , and running time $\mathcal{O}(n\log(n))$ if $N\in\{1,2\}$ .

Theorem 2:

There is an algorithm for MSP with approximation ratio $2$ and running time $\mathcal{O}(n)$ if $N>2$ , and running time $\mathcal{O}(n\log^{2}(n)/\log(\log(n)))$ if $N\in\{1,2\}$ .

Note that the running time of these algorithms is the best possible from a theoretical point of view with respect to the $\mathcal{O}$ -notation for $N\geq 3$ . Since we need to assign a start point to each job, we cannot assume that there is an algorithm for MCS with running time strictly faster than $\Omega(n)$ .

To achieve these results, we use as a subroutine an AEPTAS for the optimization problem Parallel Task Scheduling (PTS) and Strip Packing (SP) respectively. PTS is similar to the problem MCS for the special case that only one cluster is given, while Strip Packing (SP) corresponds to MSP where $N=1$ . Regarding PTS, we improved the running time of an algorithm by Jansen [11] and developed an AEPTAS. For Strip Packing (SP), we find an AEPTAS as well. However, the running time depending on $1/\varepsilon$ is worse than in the AEPTAS for PTS. Note that this algorithm is the first AEPTAS for SP that has an additive term of $h_{\max}$ .

Theorem 3:

There is an algorithm for PTS with ratio $(1+\varepsilon)\mathrm{OPT}+p_{\max}$ and running time $\mathcal{O}(n\log(1/\varepsilon)+\log(n)/\varepsilon^{2})+\mathcal{O}_{\varepsilon}(1)$ .

Theorem 4:

There is an algorithm for SP with ratio $(1+\varepsilon)\mathrm{OPT}+h_{\max}$ and running time $\mathcal{O}(n\log(1/\varepsilon)+\log(n)/\varepsilon^{2})+\mathcal{O}_{\varepsilon}(1)$ .

This algorithms can be used to find an AEPTAS for MCS and MSP as well by cutting the solution for one cluster or strip into segments of height $(1+\varepsilon)\mathrm{OPT}$ . The jobs overlapping the cluster borders add further $p_{\max}$ to the approximation ratio resulting in a additional algorithm for MCS with approximation guarantee $(1+\varepsilon)\mathrm{OPT}+p_{\max}$ .

Theorem 5:

There are algorithms for MCS and MSP with ratio $(1+\varepsilon)\mathrm{OPT}+p_{\max}$ and running time $\mathcal{O}(n\log(1/\varepsilon)+\log(n)/\varepsilon^{2})+\mathcal{O}_{\varepsilon}(1)$ .

The algorithm from Theorem 1 uses the algorithm from Theorem 3 as a subroutine with a constant value $\varepsilon=1/8$ if $N=2$ , $\varepsilon=1/5$ if $N=5$ , and $\varepsilon\in[1/4,1/3]$ otherwise. As a result, the running time of the algorithm can be rather large, while the $\mathcal{O}$ -notation suggests otherwise since it hides all the constants. Due to this fact, we have developed a truly fast algorithm where the most expensive part is sorting the jobs. However, this improved running time yields a slight loss in the approximation factor.

Theorem 6:

There is a fast $\mathcal{O}(n\log(n))$ algorithm for MCS with approximation ratio $9/4$ if $N=3i$ , $(9i+5)/(4i+2)$ if $N=3i+1$ , and $(9i+10)/(4i+4)$ if $N=3i+2$ for some $i\in\mathbb{N}$ .

Note that the approximation ratio of the algorithm from Theorem 6 is worse than $7/3$ for the cases that $N\in\{2,5\}$ and exactly $7/3$ for the case that $N\in\{4,8\}$ . However if $N\geq 9$ , the approximation ratio is bounded by $2.3$ , and $(9i+5)/(4i+2))\mathrm{OPT}$ as well as $((9i+10)/(4i+4))\mathrm{OPT}$ converge to $9/4$ for $i\rightarrow\infty$ .

1.1 Related Work

We repeat and summarize the results for the variant of MCS and MSP studied in this paper in Table 1.

MCS has also been studied for the case that clusters do not need to have the same number of machines. It is still $NP$ -hard to approximate this problem better than $2$ [26]. Furthermore, it was proven in [19] and [21] that the List Schedule even cannot guarantee a constant approximation ratio for this problem.

The first algorithm was presented by Tchernykh et al. [21] and has an approximation ratio of $10$ . This ratio was improved to a $3$ -approximation by Schwiegelshohn et al. [19], which is given by an online non-clairvoyant algorithm where the processing times are not known beforehand. Later, the algorithm was extended by Tchernykh et al. [22] to the case where jobs have release dates changing the approximation ratio to $2e+1$ . Bougeret et al. [4] developed an algorithm with approximation ratio $2.5$ for this case. This algorithm needs the constraint that the largest machine requirement of a job is smaller than the smallest number of machines available in any given cluster. This ratio was improved by Dutot et al. [8] by presenting an algorithm with approximation ratio $(2+\varepsilon)$ . The currently best algorithm for this problem matches the lower bound of $2$ [17], but has a large running time of $\Omega(n^{256})$ .

Organization of this Paper

The $\mathcal{O}(n)$ algorithm consists of two steps. First, we use an AEPTAS for MCS or MSP to find a schedule on two clusters, one with makespan at most $(1+\varepsilon)N\mathrm{OPT}$ and the other with mackespan at most $p_{\max}\leq\mathrm{OPT}$ . This schedule on the two clusters is then distributed onto the $N$ clusters using a partitioning technique, as we call it. This partitioning technique is the main accomplishment of this paper and presented in Section 2. The AEPTAS for MCS can be found in Section 3 while the AEPTAS for Multiple Strip Packing can be found in Section 5. In Section 4, we present the algorithm from Theorem 6 that finds an approximation without the need to call the AEPTAS as a subroutine but uses te partitioning technique as well.

2 Partitioning Technique

In this section, we describe the central idea which leads to a linear running time algorithm. Indeed this technique can be used for any problem setting where there is an AEPTAS with approximation ratio $(1+\varepsilon)\mathrm{OPT}+p_{\max}$ for the single cluster version. In this context $p_{\max}$ is the largest occurring size in the minimization dimension, e.g. the maximal processing time or maximal height of the packing.

Instead of scheduling the jobs on $N$ clusters, we first schedule them on two clusters $C_{1}$ and $C_{2}$ . In a second step, we distribute the scheduled jobs to $N$ clusters. In the following, let $\mathrm{OPT}$ be the height of an optimal schedule on $N$ clusters for a given instance $I$ . Since there is a schedule with makespan $\mathrm{OPT}$ on $N$ clusters, there exists a schedule on one cluster with makespan at most $N\cdot\mathrm{OPT}$ . Assume there is an algorithm Alg which schedules the jobs on two clusters $C_{1}$ and $C_{2}$ such that the makespan of $C_{1}$ is at most $(1+\varepsilon)N\cdot\mathrm{OPT}$ and $C_{2}$ has a makespan of at most $\mathrm{OPT}$ . The algorithm mentioned in Theorem 3 is an example of such an algorithm and we will present it in Section 3.

Lemma 1:

Let an algorithm Alg be given that schedules the jobs on two clusters $C_{1}$ and $C_{2}$ such that the makespan of $C_{1}$ is at most $(1+\varepsilon)N\mathrm{OPT}$ and $C_{2}$ has a makespan of at most $\mathrm{OPT}$ and which has a running time of $\mathcal{O}(n\cdot f(\varepsilon))$ . Furthermore, let Alg2 be an algorithm that finds for the single cluster variant a schedule or packing with height at most $2\cdot\max\{\mathcal{W}(\mathcal{J}^{\prime}),p_{\max}\}$ in $\mathcal{O}(\texttt{Alg2})$ time for any given set of jobs $\mathcal{J}^{\prime}$ .

We can find a schedule on $N\geq 2$ clusters with makespan $2\mathrm{OPT}$ in $\mathcal{O}(n+n\cdot f(\lfloor N/3\rfloor/N))=\mathcal{O}(n)$ operations if $N>2$ and $\mathcal{O}(\texttt{Alg2}+n\cdot f(1/8))=\mathcal{O}(\texttt{Alg2})$ operations if $N=2$ . (Note that $\lfloor N/3\rfloor/N\in[1/5,1/3]$ , and hence can be handled as a constant)

The case $N>2$

In the following, we will describe how to distribute a schedule given by Alg to $N$ new clusters, and which value we have to choose for $\epsilon$ in Alg to get the desired approximation ratio of $2$ . The partitioning algorithm distinguishes three cases: $N=3i,N=3i+1$ and $N=3i+2$ for some $i\in\mathbb{N}_{\geq 1}$ and chooses the value for $\varepsilon$ dependent on this $N$ , such that $\varepsilon\in[1/5,1/3]$ . In the following, when speaking of a schedule the processing time is on the vertical axis while the machines are displayed on the horizontal axis, see Figure 1.

In the following distributing algorithm, we draw horizontal lines at each multiple of $2T_{A}$ , where $T_{A}\leq\mathrm{OPT}$ is a value which depends on the makespan of the schedule defined by Alg and will be specified dependent on $N$ in the later paragraphs. Let $i\in\mathbb{N}$ and consider the jobs which start at or after $2iT_{A}$ and end at or before $2(i+1)T_{A}$ . We remove these jobs from $C_{1}$ and schedule them on a new cluster such that they keep their relative position. We say these new clusters have type $A$ .

Next, consider the set of jobs cut by the horizontal line at $2iT_{A}$ . All these jobs have a processing time of at most $p_{\max}\leq\mathrm{OPT}$ and they can be scheduled at the same time without violating the machine constraint. In a new cluster, we can schedule two of these sets of jobs with makespan $2p_{\max}\leq 2\mathrm{OPT}$ , by letting the first set of jobs start at [math] and the second set start at $p_{\max}$ . We say, these clusters have type B.

Case 1: $N=3i$ .

If $N=3i$ , we choose $\varepsilon:=\lfloor N/3\rfloor/N=1/3$ : As a result, the schedule on $C_{1}$ given by Alg has a makespan of $T\leq(4/3)N\mathrm{OPT}=4i\mathrm{OPT}$ and we define $T_{A}:=T/(4i)\leq\mathrm{OPT}$ . We partition the given schedule as described above. Since it has a height of $4iT_{A}$ , we get $2i$ clusters of type A, see Figure 1. There are $4iT_{A}/(2T_{A})-1=2i-1$ lines at multiples of $2T_{A}$ . Hence, we get $\left\lfloor\frac{2i-1}{2}\right\rfloor=i-1$ clusters of type B. The jobs intersecting the last line can be scheduled on one new cluster with makespan $T_{A}$ . On this last cluster after the point in time $T_{A}$ , we schedule the jobs from the Cluster $C_{2}$ . Remember, the schedule on $C_{2}$ has a makespan of at most $\mathrm{OPT}$ and, hence, the makespan of this last cluster is bounded by $2\mathrm{OPT}$ as well. In total, we have partitioned the schedule into $2i+i-1+1=3i=N$ clusters each with makespan at most $2\mathrm{OPT}$ .

Case 2: $N=3i+1$ .

If $N=3i+1$ for some $i\in\mathbb{N}$ , we choose $\varepsilon:=\lfloor N/3\rfloor/N=i/(3i+1)\geq 1/4$ . As a result, the makespan of $C_{1}$ generated by the algorithm Alg is given by $T\leq(1+i/(3i+1))N\mathrm{OPT}=(4i+1)\mathrm{OPT}$ and we define $T_{A}:=T/(4i+1)\leq\mathrm{OPT}$ . There are $\lceil(4i+1)/2\rceil-1=2i$ multiples of $2T_{A}$ smaller than $(4i+1)T_{A}$ , see Figure 2. Above the last multiple of $2T_{A}$ smaller than $(4i+1)T_{A}$ namely $4iT_{A}$ , the schedule has a height of at most $T_{A}\leq\mathrm{OPT}$ left. Hence using the above-described partitioning, we generate $2i$ clusters of type A. The jobs intersecting the $2i$ multiples of $2T_{A}$ can be placed into $i$ clusters of type B. We have left the jobs above $4iT_{A}$ , which can be scheduled in a new cluster with makespan $T_{A}\leq\mathrm{OPT}$ . Last, we place the jobs from cluster $C_{2}$ on top of the schedule in the new cluster, such that it has a makespan of at most $T_{A}+\mathrm{OPT}\leq 2\mathrm{OPT}$ in total. Altogether, we have distributed the given schedule on $2i+i+1=3i+1=N$ clusters, such that each of them has a makespan bounded by $2\mathrm{OPT}$ .

Case 3: $N=3i+2$ .

If $N=3i+2$ , we choose $\varepsilon=\lfloor N/3\rfloor/N=i/(3i+2)\geq 1/5$ : As a result, the makespan on $C_{1}$ generated by Alg is bounded by $T\leq(1+i/(3i+2))N\mathrm{OPT}=(4i+2)\mathrm{OPT}$ and we define $T_{A}:=T/(4i+2)\leq\mathrm{OPT}$ . Thus, there are $(4i+2)/2-1=2i$ vertical lines at the multiples of $2T_{A}$ , which are strictly larger than [math] and strictly smaller than $(4i+2)T_{A}$ , see Figure 3. As a consequence, we construct $2i+1$ clusters of type A and $i$ clusters of type B. The cluster $C_{2}$ defines one additional cluster of this new schedule. In total, we have a schedule on $2i+1+i+1=N$ clusters with makespan bounded by $2\mathrm{OPT}$ .

This distribution can be made in $\mathcal{O}(n)$ steps since we have to relocate each job at most once. Therefore the algorithm has a running time of at most $\mathcal{O}(n+n\cdot f(\lfloor N/3\rfloor/N))=\mathcal{O}(n)$ since $\lfloor N/3\rfloor/N$ is a constant of size at least $1/5$ .

The case $N=2$

To find a distribution for this case, we need to make a stronger assumption to the solution of the algorithm Alg. Namely, we assume that the second cluster $C_{2}$ has just $\varepsilon m$ machines. As a consequence, the total work of the jobs contained on $C_{2}$ is bounded by $\varepsilon m\mathrm{OPT}$ .

Let us consider the schedule on cluster $C_{1}$ with makespan $T\leq(1+\varepsilon)2\mathrm{OPT}$ . In the following, we will assume that $T>2p_{\max}$ since otherwise we have $T\leq 2\mathrm{OPT}$ and we do not need to reorder the schedule any further. We draw horizontal lines at $\varepsilon T$ and at $T-\varepsilon T$ . Next, we define two sets of jobs $J_{1}$ and $J_{2}$ . $J_{1}$ contains all jobs starting before $\varepsilon T$ and $J_{2}$ contains all jobs ending after $T-\varepsilon T$ . Note that since $T\leq(1+\varepsilon)2\mathrm{OPT}$ , we have that $(1-\varepsilon)T<2\mathrm{OPT}$ . Furthermore, $J_{1}$ and $J_{2}$ are disjoint if $\varepsilon\leq 1/4$ since $p_{\max}\leq T/2$ and therefore $\varepsilon T+p_{\max}\leq T/4+T/2\leq\nicefrac{{3}}{{4}}T\leq(1-\varepsilon)T$ . Note that the total work of the jobs is bounded by $2\mathrm{OPT}m$ and, hence, $\mathcal{W}(\mathcal{J})/(2m)\leq\mathrm{OPT}$ . We distinguish two cases:

Case 4: $\mathcal{W}(J_{1})\leq(1-\varepsilon)\mathcal{W}(\mathcal{J})/2$ or $\mathcal{W}(J_{2})\leq(1-\varepsilon)\mathcal{W}(\mathcal{J})/2$ .

Let w.l.o.g $\mathcal{W}(J_{2})\leq(1-\varepsilon)\mathcal{W}(\mathcal{J})/2\leq(1-\varepsilon)m\mathrm{OPT}$ . We remove all jobs in $J_{2}$ from the cluster $C_{1}$ . As a result this cluster has a makespan of $(1-\varepsilon)T<2\mathrm{OPT}$ . The total work of the jobs contained in $C_{2}$ combined with the jobs in $J_{2}$ is at most $m\mathrm{OPT}$ . Therefore, we can use the algorithm Alg2 (for example the List-Scheduling algorithm by Garay and Graham [9]) to find a schedule with makespan at most $2\max\{p_{\max},\mathcal{W}(J_{2})\}\leq 2\mathrm{OPT}$ . Hence, we can find a schedule on two clusters in at most $\mathcal{O}(\texttt{Alg2}+n\cdot f(\varepsilon))$ for this case.

Case 5: $\mathcal{W}(J_{1})>(1-\varepsilon)\mathcal{W}(\mathcal{J})/2$ and $\mathcal{W}(J_{2})>(1-\varepsilon)\mathcal{W}(\mathcal{J})/2$ .

Consider the set of jobs $J_{3}$ scheduled on $C_{1}$ but not contained in $J_{1}$ or $J_{2}$ . Since the total work of the jobs is at most $\mathcal{W}(\mathcal{J})\leq m\mathrm{OPT}$ it holds that $\mathcal{W}(J_{3})\leq\mathcal{W}(\mathcal{J})-\mathcal{W}(J_{1})-\mathcal{W}(J_{2})=\varepsilon\mathcal{W}(\mathcal{J})\leq 2\varepsilon m\mathrm{OPT}$ . Let $J(C_{1})$ be the set of jobs scheduled on $C_{1}$ and $J(C_{2})$ be the set of jobs scheduled on $C_{2}$ . We define $J_{4}:=\{j\in J(C_{1})|\sigma(j)+p(j)\leq\varepsilon T\}$ and $J_{5}:=\{j\in J(C_{1})|\sigma(j)\geq(1-\varepsilon)T\}$ . Clearly, both sets have a total work of at most $\varepsilon mT\leq 2(\varepsilon+\varepsilon^{2})m\mathrm{OPT}$ and therefore $\mathcal{W}(J_{3}\cup J_{4}\cup J_{5}\cup J(C_{2}))\leq(7\varepsilon+4\varepsilon^{2})m\mathrm{OPT}$ . If $\varepsilon=\frac{1}{8}$ , these jobs have a total work of at most $m\mathrm{OPT}$ and are scheduled with the algorithm Alg2 to find a schedule on one cluster with makespan at most $2\max\{p_{\max},\mathcal{W}(J_{3}\cup J_{4}\cup J_{5}\cup J(C_{2}))\}\leq 2\mathrm{OPT}$ .

To this point, we have scheduled all jobs except the ones cut by the line $\varepsilon T$ and the jobs cut by the line $(1-\varepsilon)T$ . We schedule them in the second cluster by starting all the jobs cut by the first line at start point [math] and the second set of jobs at the start point $p_{\max}\leq\mathrm{OPT}$ . Note that the partition into the sets $J_{1},\dots,J_{5}$ can be done in $\mathcal{O}(n)$ and hence the partitioning step is dominated by the running time of the algorithm Alg2.

In both cases for $N=2$ , we choose $\varepsilon=1/8$ and, hence, can bound the running time of the algorithm by $\mathcal{O}(\texttt{Alg2}+n\cdot f(1/8))=\mathcal{O}(\texttt{Alg2})$ .

This concludes the proof of Lemma 1. However, to prove Theorem 1, we need to prove the existence of the algorithm Alg, which finds the schedule on the clusters $C_{1}$ and $C_{2}$ . In the next section, we will see one example of such an algorithm.

As Alg2, we can choose Steinbergs-Algorithm [20] in the case of SP. It has a running time that is bounded by $\mathcal{O}(n\log^{2}(n))$ . On the other hand for PTS, we can use the algorithm by Garay and Graham [9], which was optimized by Turek et al. [23] to have a running time of $\mathcal{O}(n\log(n))$ .

A direct conclusion of the lemma is the following corollary.

Corollary 1:

For all $N\geq 3$ , given a schedule on two clusters $C_{1}$ and $C_{2}$ such that the makespan of $C_{1}$ is at most $(1+\lfloor N/3\rfloor/N)N\mathrm{OPT}$ and $C_{2}$ has a makespan of at most $\mathrm{OPT}$ , we can find a schedule on $N$ clusters with makespan at most $2\mathrm{OPT}$ in at most $\mathcal{O}(n)$ additional steps.

Instead of using the algorithm in the next section, first, we can try to use any heuristic or other (fast) approximation algorithm. More precisely, we can do the following: Given a schedule by any heuristic, we remove all the jobs that end after the point in time at which the last job is started and place them on the cluster $C_{2}$ , by starting them all at the same time. The schedule on $C_{2}$ obviously has a makespan bounded by $p_{\max}\leq\mathrm{OPT}$ . Next, we check weather the residual schedule on $C_{1}$ has a makespan of at most $(1+\lfloor N/3\rfloor/N)N\mathrm{OPT}$ . For example, this can be done by comparing the makespan $T$ on $C_{1}$ to the lower bound on the optimal makespan $\max\{p_{\max},\mathcal{W}(\mathcal{J})/m,p(\mathcal{J}_{>m/2})\}$ , where $\mathcal{J}_{>m/2}$ is the set of all jobs with machine requirement larger than $m/2$ . If the makespan $T$ is small enough, i.e., if $T\leq(1+\lfloor N/3\rfloor/N)\max\{p_{\max},\mathcal{W}(\mathcal{J})/m,p(\mathcal{J}_{>m/2})\}$ , we will find a $2$ -approximation by using the partitioning technique from above. Otherwise, we need to use the algorithm from the next section.

3 An AEPTAS for Parallel Task Scheduling

In this section, we will present an $AEPTAS$ for Parallel Task Scheduling (PTS) with an approximation ratio $(1+\varepsilon)\mathrm{OPT}+p_{\max}$ and running time $\mathcal{O}(n\log(1/\varepsilon))+\mathcal{O}_{\varepsilon}(1)$ . We can use this algorithm to find a schedule on the two clusters $C_{1}$ and $C_{2}$ needed for the algorithm in Section 2. It is inspired by the algorithm in [11] but contains some improvements. Furthermore, note the fact that in the following algorithm the processing times of the jobs do not have to be integral. Instead, we will discretize them by rounding.

The algorithm works roughly in the following way. The set of jobs is partitioned into large, medium, and small jobs, depending on their processing times. The medium jobs have a small total work and therefore can be scheduled at the end of the schedule using a 3-approximation algorithm without doing too much harm. The large jobs are partitioned into two sets: wide jobs and narrow jobs depending on their machine requirement. There are few large wide jobs which makes it possible to guess their starting times. The narrow jobs are placed with a linear program for which we guess the number of required machines for each occurring processing time at each possible start point of the schedule. After solving this linear program, a few jobs are scheduled fractionally. These jobs have a total number of required machines of at most $\gamma m$ for any chosen value $\gamma\in(0,1]$ . Notice that the choice of $\gamma$ will affect the running time. We place these jobs on top of the schedule to gain a $(1+\varepsilon)\mathrm{OPT}+p_{\max}$ approximation, or into an extra cluster to find a solution needed for the algorithm in Section 2. The small jobs are scheduled with a linear program. An overview of the algorithm can be found in Section 3.5.

We will now present a more detailed approach. We use an improved rounding strategy for large jobs compared to [11], which enables us to improve the running time. Further, we present a different linear programming approach to schedule the narrow tall jobs.

3.1 Simplify

Let an instance $I=(\mathcal{J},m)$ be given. Note that the value $\max\{p_{\max},W(\mathcal{J})/m\}$ is a lower bound on the makespan of the schedule. On the other hand, we know by Turek et al. [23] that $T:=\max\{p_{\max},W(\mathcal{J})/2\}$ is an upper bound on the optimal makespan. We can find $T$ in $\mathcal{O}(n)$ .

Let $\delta$ and $\mu$ be values dependent on $\varepsilon$ . We partition the set of jobs $\mathcal{J}$ into small $\mathcal{J}_{S}:=\{j\in\mathcal{J}|p_{j}\leq\mu T\}$ , medium $\mathcal{J}_{M}:=\{j\in\mathcal{J}|\mu T<p_{j}<\delta T\}$ , and large ones $\mathcal{J}_{L}:=\{j\in\mathcal{J}|\delta T\leq p_{j}\}$ . Consider the sequence $\sigma_{0}=\varepsilon$ , $\sigma_{i+1}=\sigma_{i}\varepsilon^{3}$ . By the pigeonhole principle there exists an $i\in\{0,\dots,1/\varepsilon-1\}$ such that $W(\mathcal{J}_{M})\leq\varepsilon m\mathrm{OPT}$ , when defining $\delta:=\sigma_{i}$ and $\mu:=\sigma_{i+1}$ . We can find these values for $\delta$ and $\mu$ in $\mathcal{O}(n+1/\varepsilon)$ . Note that $\mu=\varepsilon^{3}\delta\geq\varepsilon^{3/\varepsilon+3}$ .

Resulting in a loss of at most $\varepsilon T$ in the approximation ratio, we can assume that the smallest processing time is at least $\varepsilon T/n$ since adding $\varepsilon T/n$ to each processing time adds at most $n\cdot\varepsilon T/n=\varepsilon T$ to the total makespan. Therefore, the largest $l$ such that $p_{j}\in\{\varepsilon^{l}T,\varepsilon^{l-1}T\}$ is bounded by $\mathcal{O}(\log(n))$ and we know $\delta\geq\min\{\varepsilon/n,\varepsilon^{3/\varepsilon}\}$ . We round the sizes of the jobs by using the following lemma.

Lemma 2 (See [14]):

At a loss of a factor of at most $(1+2\varepsilon)$ in the approximation ratio, we can ensure that each job $j\in\mathcal{J}$ with $\varepsilon^{l}T<p_{j}\leq\varepsilon^{l-1}T$ for some $l\in\mathbb{N}$ has processing time $p_{j}=k_{j}\varepsilon^{l+1}T$ for $k_{j}=\lceil p_{j}/(\varepsilon^{l+1}T)\rceil\in\{1/\varepsilon+1,\dots,1/\varepsilon^{2}\}$ and a starting time, which is a multiple of $\varepsilon^{l+1}T$ as well.

This rounding can be done in $\mathcal{O}(n)$ . Afterward, there are at most $1/\varepsilon^{2}$ different processing times between $\varepsilon^{l}T$ and $\varepsilon^{l-1}T$ for each $l\in\{1,\dots,3/\varepsilon+3\}$ . Therefore, the number of different processing times of large jobs is bounded by $1/\varepsilon^{2}\cdot 3/\varepsilon=3/\varepsilon^{3}$ since $\delta\geq\varepsilon^{3/\varepsilon}$ . Further, the number of different processing times for medium jobs is bounded by $3/\varepsilon^{2}$ since the medium jobs have processing times in $(\mu=\varepsilon^{3}\delta,\delta)$ . Note that the number of different processing times of small jobs is bounded by $\mathcal{O}(\min\{\log(n)/\varepsilon^{2},n/\varepsilon\})$ since the smallest job has processing time $\varepsilon T/n$ . Additionally, there are at most $\nicefrac{{1}}{{\varepsilon\delta}}$ possible starting points for the large jobs. We denote the set of starting points for large jobs as $\mathcal{S}$ and the set of their processing times as $P_{L}$ . After this step, we will only consider the rounded processing times and will denote them as $p_{j}$ for each job $j\in\mathcal{J}$ .

3.2 Large Jobs

Let $\gamma m\leq m$ be the width of the second cluster $C_{2}$ and let $\alpha$ be a constant dependent on $\varepsilon$ and $\gamma$ , which we will specify later on. We say a job $j\in\mathcal{J}_{L}$ is wide if it uses at least $\alpha m$ machines, and we denote the set of large wide jobs by $\mathcal{J}_{L,W}$ . Note that large wide jobs have a processing time larger than $\delta T$ and need at least $\alpha m$ machines while the total work of all jobs in $\mathcal{J}$ is bounded by $mT$ . Hence, the total number of them is bounded by $1/(\delta\alpha)$ . Therefore, there are at most $S^{\nicefrac{{1}}{{(\delta\alpha)}}}$ possibilities to schedule the jobs in $\mathcal{J}_{L,W}$ . In the algorithm, we will try each of these options.

In the next step, we deal with the large narrow jobs $\mathcal{J}_{L,N}:=\mathcal{J}_{L}\setminus\mathcal{J}_{L,W}$ . Consider an optimal schedule $S=(\sigma,\rho)$ , where we have rounded the processing times of the jobs as described in Lemma 2. For the schedule $S$ and each starting time $s\in\mathcal{S}$ , let $m_{s}$ be the number of machines used by jobs in $\mathcal{J}_{L,N}$ that are processed (not only started) at that time, i.e., we define $m_{s}:=\sum_{j\in\mathcal{J}_{L,N,s}}q_{j}$ where $\mathcal{J}_{L,N,s}$ is the set of jobs $j\in\mathcal{J}_{L,N}$ , which have both a start point $s_{j}\leq s$ and an endpoint $e_{j}:=s_{j}+p_{j}>s$ . Note that jobs ending at $s$ , i.e., jobs with $e_{j}=s_{j}+p_{j}=s$ , are not part of the set $\mathcal{J}_{L,N,s}$ .

For each processing time $p\in P_{L}$ let $q(p)$ be the total number of machines used by jobs with this processing time, i.e $q(p):=\sum_{j\in J_{L,N},p_{j}=p}q_{j}$ . Consider the following linear program $LP_{large}$ :

[TABLE]

The variable $x_{s,p}$ defines for each start point $s\in\mathcal{S}$ and each processing time $p\in P$ how many machines are used by jobs with processing time $p$ starting at $s$ . The first inequality ensures that the number of machines required by jobs scheduled at a start point $s$ , i.e., jobs from the set $\mathcal{J}_{L,N,s}$ , equals the number of used machines in the considered optimal schedule. The second inequality ensures that for each processing time all the jobs are scheduled. Given the considered optimal solution, we generate a solution to this linear program by counting for each starting time $s\in\mathcal{S}$ and each processing time $p\in P_{L}$ how many machines are used by jobs with processing time $p$ starting at $s$ . This linear program has $|\mathcal{S}|+|P_{L}|$ conditions and $|\mathcal{S}||P_{L}|$ variables. Since we have $|\mathcal{S}|+|P_{L}|$ conditions, there are at most $|\mathcal{S}|+|P_{L}|$ non zero components in a basic solution and for each $p\in P_{L}$ there has to be at least one non zero component.

In the algorithm, we guess, (i.e., we try out all the possibilities) which variables are non zero variables in the basic solution. There are at most $\mathcal{O}((|\mathcal{S}||P_{L}|)^{|\mathcal{S}|+|P_{L}|})$ options. We cannot guess the exact values of the variables $x_{s,p}$ in polynomial time. Instead, we guess for each non zero variable $x_{s,p}$ the smallest multiple of $\alpha m$ that is larger than the value of $x_{s,p}$ in the basic solution. This can be done in $\mathcal{O}(1/\alpha^{|\mathcal{S}|+|P_{L}|})$ . So to find a schedule for the large jobs $\mathcal{J}_{L}$ , we use at most $\mathcal{O}(|\mathcal{S}|^{|\mathcal{J}_{L,W}|}\cdot(|\mathcal{S}||P_{L}|/\alpha)^{|\mathcal{S}|+|P_{L}|})$ guesses in total.

Note that this optimistic guessing, i.e., using the rounded up values for $m_{s}$ , on the one hand ensures that all the narrow large jobs can be scheduled but on the other hand can cause violations to the machine constraints. To prevent this machine violation, the algorithm test for each guess whether the job condition (2) is fulfilled for each processing time. If this is the case, each value of a non-zero component is reduced by $\alpha m$ . For these down-sized values, the algorithm test the machine constraint (1) for each starting point $s\in S$ . Note that the validation whether the constraints are fulfilled is possible in $\mathcal{O}((|P_{L}|+|\mathcal{S}|)^{2})$ since for each of the $(|P_{L}|+|\mathcal{S}|)$ constraints, we have to add at most $(|P_{L}|+|\mathcal{S}|)$ values for each constraint. If both conditions are fulfilled, the algorithm tries to schedule the small jobs, see Subsection 3.3. If the small jobs can be scheduled the guess was feasible.

The actual narrow large jobs from the set $\mathcal{J}_{L,N}$ are scheduled only once in the final phase of the algorithm. When scheduling the jobs in $\mathcal{J}_{L,N}$ , we use the reduced guessed values. We greedily fill the jobs into the guessed starting positions $x_{s,p}$ , while slicing jobs vertical if they do not fit totally at that starting position (i.e., if the total number of machines required by jobs with processing time $p$ starting at $s$ is larger than $x_{s,p}$ when adding the machine requirement of the currently considered job) and placing the rest of the job at the next starting position for the processing time $p$ . We schedule the jobs which cannot be placed at the starting points defined by the values of $x_{s,p}$ (because we reduced these values) either on top of the schedule or on the second cluster $C_{2}$ depending on what is wanted: the algorithm described in Theorem 3 or the algorithm needed for Lemma 1. The total width of these jobs shifted to the end of the schedule or to Cluster $C_{2}$ is bounded by $\alpha m\cdot(|S|+|P_{L}|)$ since there are at most $|S|+|P_{L}|$ non zero components and before the reduction by $\alpha m$ all the jobs could be scheduled because the job constraint (2) was fulfilled.

In the described placement of the narrow large jobs, we have introduced at most one fractional job for each non zero variable and it has a width of at most $\alpha m$ . We remove all these fractional jobs and place them next to the jobs which did not fit. The machine requirement of the removed fractional jobs can be bounded by $(|S|+|P_{L}|)\alpha m=(1/2\varepsilon\delta+3/\varepsilon^{3})\alpha m$ . Hence, if $\alpha\leq\varepsilon\delta/4$ , we have $2(|S|+|P_{L}|)\alpha m\leq m$ , and we can schedule all the removed jobs (non-fitting ones and fractional ones) at the same time at the end of the schedule without violating the machine constraint, adding at most $p_{\max}$ to the makespan. On the other hand, if $\alpha\leq\gamma\varepsilon\delta/4$ , it follows that $2(|S|+|P_{L}|)\alpha m\leq\gamma m$ , and we can schedule all the removed jobs inside the extra cluster with makespan at most $p_{\max}$ and machine requirement at most $\gamma m$ . In the algorithm, we choose $\alpha$ as needed for the corresponding application. We need at most $\mathcal{O}(n+|\mathcal{S}|+|P_{L}|)$ operations to place the narrow large jobs.

3.3 Small Jobs

We define a layer as the horizontal strip between two consecutive starting points in $\mathcal{S}$ and say layer $l$ is the layer between $l\varepsilon\delta T$ and $(l+1)\varepsilon\delta T$ . Note that during the processing time of a layer $l$ the machine requirement of large jobs will not change since large jobs start and end at multiples of $\varepsilon\delta T$ . Let $m_{l}$ be the number of machines left for small jobs in layer $l$ . Note that this number is fixed by the guesses for the large jobs.

We will partition the small jobs into wide and narrow jobs. A small job is wide if it requires at least $\varepsilon m$ machines and narrow otherwise. Let $\mathcal{J}_{S,W}$ be the set of small wide jobs and $\mathcal{J}_{S,N}$ be the set of small narrow jobs. We will round the machine requirements of the wide jobs using linear grouping, which was first introduced by Fernandez de la Vega [7]. The idea of this technique is to sort all the wide jobs by size and stack them on top of each other, such that the widest job is at the bottom and the narrowest job is at the top, see Figure 5. Let $P(\mathcal{J}_{S,W})$ be the total processing time of all the small wide jobs. We will round the machine requirements of the wide jobs to $1/\varepsilon^{2}$ sizes. For this purpose consider the multiples of $\varepsilon^{2}P(\mathcal{J}_{S,W})$ . We draw a horizontal line at each of these multiples of $\varepsilon^{2}P(\mathcal{J}_{S,W})$ and define for each job intersected by one of these lines two new jobs, by cutting this job at that line in two parts (for the analysis and description of the rounding; in the algorithm no job will be cut). The jobs between two consecutive lines at $i\varepsilon^{2}P(\mathcal{J}_{S,W})$ and $(i+1)\varepsilon^{2}P(\mathcal{J}_{S,W})$ are called jobs of group $i$ . For each group $i$ , we generate one new job that has processing time $\varepsilon^{2}P(\mathcal{J}_{S,W})$ and the machine requirement of the widest job in this group. We call this job a size defining job for the group. Let $\bar{\mathcal{J}}_{S,W}$ be the set of rounded small wide jobs.

When we round the wide jobs as described, we need $\Omega(n\log(n))$ operations, since we sort the jobs. However, we do not need to sort the jobs since we are just interested in the size defining job of each group.

Lemma 3:

We can generate the rounded jobs $\bar{\mathcal{J}}_{S,W}$ in $\mathcal{O}(n\log(1/\varepsilon))$ operations.

Proof.

Define for a given job $j$ the set $\mathcal{J}_{S,W,>q_{j}}:=\{i\in\mathcal{J}_{S,W}|q_{i}>q_{j}\}$ and analogously $\mathcal{J}_{S,W,\geq q_{j}}:=\{i\in\mathcal{J}_{S,W}|q_{i}\geq q_{j}\}$ . For each group, we find the size defining job, by using a modified median algorithm with running time $\mathcal{O}(n)$ . Instead of searching for the $i$ th largest job, we search for a job $j$ with $P(\mathcal{J}_{S,W,>q_{j}})\leq i\varepsilon^{2}P(\mathcal{J}_{S,W})$ and $P(\mathcal{J}_{S,W,>q_{j}})>i\varepsilon^{2}P(\mathcal{J}_{S,W})$ for each $i$ in $\{1,\dots,1/\varepsilon-1\}$ . Simply using this modified median algorithm for each group leads to $\mathcal{O}(n/\varepsilon^{2})$ operations.

However, we improve this approach. First, we search for the job with the median machine requirement in $\mathcal{O}(|\mathcal{J}_{S,W}|)$ . Afterward, we search for the group size of the group containing this job in $\mathcal{O}(|\mathcal{J}_{S,W}|/2)$ and the group size above this group (if existing) in $\mathcal{O}(|\mathcal{J}_{S,W}|/2)$ as well. The set of jobs, where we do not know the rounded sizes, is now partitioned into two sets containing at most $|\mathcal{J}_{S,W}|/2$ jobs each. We iterate the process on both sets separately until each group size is found.

Since there are at most $\mathcal{O}(1/\varepsilon^{2})$ groups, this search can be done in $\mathcal{O}(n\log(1/\varepsilon))$ operations. To see this, we consider the following recurrence equation

[TABLE]

where $n$ denotes the number of jobs and $d$ denotes the number of values we search for and $c\in\mathbb{N}$ . To find the job with the median machine requirement and the group sizes of the group containing this item and the group above we need $\mathcal{O}(n)$ operations and hence there is a $c\in\mathbb{N}$ with these properties. After the set of jobs is partitioned into two sets such that each set contains at most $n/2$ jobs. The total number of sizes we search for is reduced by at least one since in this step we find one or two of them. However, the values we search for do not have to be distributed evenly to the sets. Therefore, this recurrence equation represents the running time of the described algorithm adequately.

We claim that $T(n,d)\leq cn(\log(d)+1)$ . We have $T(n,1)=cn=cn(\log(1)+1)$ and hence the claim is true for $d=1$ . For $d\in\mathbb{N}_{\geq 2}$ it follows that

[TABLE]

Since in our case we have $d=\varepsilon^{2}$ , this concludes the proof. ∎

*Remark 1**.*

If we schedule the rounded jobs $\bar{\mathcal{J}}_{S,W}$ fractionally instead of the original jobs $\mathcal{J}_{S,W}$ , we need to add at most $\varepsilon T$ to the makespan of the schedule.

Proof.

Consider an optimal schedule of the original small jobs. We can schedule the new jobs fractionally, by replacing all jobs contained in group $i$ by the new job generated for the jobs in the group $(i+1)$ . The widest rounded job cannot be scheduled instead of the original jobs, because the machine requirement might be too large. We schedule this job at the end of the schedule. This job has a processing time of $\varepsilon^{2}P(\mathcal{J}_{S,W})$ . We know that $P(\mathcal{J}_{S,W})\cdot\varepsilon m\leq mT$ since $\mathcal{W}(\mathcal{J})\leq mT$ and each wide job needs at least $\varepsilon m$ machines to be scheduled. Hence, it holds that $\varepsilon^{2}P(\mathcal{J}_{S,W})\leq\varepsilon T$ . ∎

We say a configuration of wide jobs is a multiset of wide jobs $C:=\{a_{j,C}:j|j\in\bar{\mathcal{J}}_{S,W}\}$ . We say a configuration $C$ requires at most $q$ machines, if $\sum_{j\in\bar{\mathcal{J}}_{S,W}}a_{j,C}q_{j}\leq q$ and define $q(C):=\sum_{j\in\bar{\mathcal{J}}_{S,W}}a_{j,C}q_{j}$ . Let $\mathcal{C}_{q}$ be the set of configurations with machine requirement at most $q$ , i.e., $\mathcal{C}_{q}:=\{C\in\mathcal{C}\}|q(C)\leq q\}$ .

Consider the following linear program $LP_{small}$ .

[TABLE]

The variable $x_{C,l}$ defines the processing time of the configuration $C$ in layer $l$ . The condition (5) ensures that we do not give a too large processing time to the configurations used in Layer $l$ , while condition (6) ensures that the processing time of each job is covered. Condition (4) is added to place the rounded jobs inside the extra box. This linear program has $|S|+|\bar{\mathcal{J}}_{S,W}|$ conditions and at most $|S||\mathcal{C}_{m}|$ variables. If the values $m_{l}$ are derived from an optimal solution (or are larger than in the corresponding optimal solution), the linear program above has a solution.

To speed up the running time of our algorithm, we do not find a solution to $LP_{small}$ . Instead, we find a solution to a relaxed version of the linear program, where we allow a slightly increased processing time per layer. This linear program is called $LP_{smal,rel}$ and is the same as $LP_{small}$ but we replace equation (5) by

[TABLE]

while equation (4) is replaced by

[TABLE]

Lemma 4:

If there is a solution to $LP_{small}$ , we can find a basic solution to $LP_{small,rel}$ in $\mathcal{O}((|S||\bar{\mathcal{J}}_{S,W}|(\ln(|\bar{\mathcal{J}}_{S,W}|)+\varepsilon^{-4}))((|S|+|\bar{\mathcal{J}}_{S,W}|)^{1.5356}+(\log(1/\varepsilon))^{3}/\varepsilon^{4}))\leq\mathcal{O}(1/\varepsilon^{12}\delta^{3})$ operations.

Proof.

To solve this linear program, we translate it to a Max-Min-Resource-Sharing problem and solve it with approximation ratio $(1-\rho)$ for $\rho=\mathcal{O}(\varepsilon^{2})$ such that $1/(1-\rho)=(1+\varepsilon^{2})$ .

In the Max-Min-Resource-Sharing problem, we are given a nonempty convex compact set $B$ , and a vector $f$ of $M\in\mathbb{N}$ non-negative continuous concave functions $f:B\rightarrow\mathbb{R}_{+}^{M}$ . The objective is to find the value $\lambda^{*}:=\max\{\lambda\,|\,f(x)\geq\lambda 1_{M},x\in B\}$ , where $1_{M}$ is the vector of dimension $M$ with all entries one. In our translation, we define $f_{j}:=\sum_{l=1}^{|S|}\sum_{C\in\mathcal{C}_{m_{l}}}x_{C,l}a_{j,C}/p_{j}$ for all $j\in\bar{\mathcal{J}}_{S,W}$ , i.e., $M=|\bar{\mathcal{J}}_{S,W}|$ and

[TABLE]

We use the algorithm by Grigoriades et al. [10] to solve this problem. This algorithm finds an $x\in B$ that satisfies $f(x)\geq(1-\rho)\lambda^{*}1_{M}$ . To find this solution a so called approximate block solver ( $\mathcal{ABS}(p,\rho/6)$ ) has to be provided, where $p\in\mathbb{R}^{M}_{+}$ . $\mathcal{ABS}(p,\rho/6)$ has to solve for each $l\in\{1,\dots,|S|\}$ the problem

[TABLE]

Intuitively, $\mathcal{ABS}(p,\rho/6)$ computes one configuration for each layer, which is added to the solution $x$ in the next step of the algorithm.

The above integer program is equivalent to the integer program of the Unbounded Knapsack problem and therefore can be solved approximatively with approximation ratio $(1-\rho/6)$ in $\mathcal{O}(|\bar{\mathcal{J}}_{S,W}|+(\log(1/\rho))^{3}/\rho^{2})$ operations [12]. The algorithm needs at most $\mathcal{O}(M(\ln(M)+\rho^{-2}))$ steps where it calls the $\mathcal{ABS}(p,\rho/6)$ exactly $|S|$ times. Hence the total running time to find $x$ is bounded by

[TABLE]

since $|S|=1/(\varepsilon\delta)$ and $|\bar{\mathcal{J}}_{S,W}|=1/\varepsilon^{2}$

Note that if the linear program $LP_{small}$ has a solution, there exists an $x^{\prime}\in B$ with $f_{j}(x^{\prime})\geq 1$ for each $j\in\bar{\mathcal{J}}_{S,W}$ . However, we solved the Max-Min-Resource-Sharing problem just approximately, i.e., if there exist such an $x^{\prime}$ with $f_{j}(x^{\prime})\geq 1$ , it holds for the calculated $x$ that $f_{j}(x)\geq(1-\rho)$ . We scale $x$ with $1/(1-\rho)$ and call it $\tilde{x}$ . If we have that $f_{j}(\tilde{x})<1$ for at least one $j\in\bar{\mathcal{J}}_{S,W}$ , we know that the liner program $LP_{small}$ has no feasible solution and stop. This scaling step extends each layer to $\varepsilon\delta T/(1-\rho)=(1+\varepsilon^{2})\varepsilon\delta T$ and therefore it extends the generated schedule by at most $\varepsilon^{2}T$ .

Another obstacle why the given solution $x$ is not a solution to the linear program $LP_{small}$ , is that the total reserved processing time for a job $j\in\bar{\mathcal{J}}_{S,W}$ in $\tilde{x}$ could be too large, i.e., it could be that $\sum_{l=1}^{|S|}\sum_{C\in\mathcal{C}_{m_{l}}}\tilde{x}_{C,l}a_{j,C}>p_{j}$ for some $j\in\bar{\mathcal{J}}_{S,W}$ . To subduct this surplus, we remove a total processing time of $\sum_{l=1}^{|S|}\sum_{C\in\mathcal{C}_{m_{l}}}\tilde{x}_{C,l}a_{j,C}-p_{j}$ from the configurations for each $j\in\bar{\mathcal{J}}_{S,W}$ . By this step, we create at most one more configuration for each job in $\bar{\mathcal{J}}_{S,W}$ . The vector changed in this way, from now on called $\bar{x}$ , is a solution to $LP_{small,rel}$ .

Since the algorithm in [10] calls the block solver at most $\mathcal{O}(|S||\bar{\mathcal{J}}_{S,W}|(\ln(|\bar{\mathcal{J}}_{S,W}|)+\varepsilon^{-4}))$ times, the generated solution $\bar{x}$ uses at most $\mathcal{O}(|S||\bar{\mathcal{J}}_{S,W}|(\ln(|\bar{\mathcal{J}}_{S,W}|)+\varepsilon^{-4})+|\bar{\mathcal{J}}_{S,W}|)$ configurations in total. We use the algorithm by Beling and Megiddo [1] to find a basic solution with at most $|S|+|\bar{\mathcal{J}}_{S,W}|$ non zero components in

[TABLE]

operations. Hence the total running time needed to find the basic solution to $LP_{small,rel}$ is bounded by

[TABLE]

This concludes the prove. ∎

We will find a schedule of the jobs $\mathcal{J}_{S,W}$ , by placing the configurations into the corresponding layers and greedily filling the jobs into the configurations, see Figure 6. To ensure that each job can be scheduled integrally, we extend each configuration, by $\mu T$ , which is the tallest height a small job can have. Since there are at most $|S|+|\bar{\mathcal{J}}_{S,W}|$ configurations we extend the schedule by at most $(|S|+|\bar{\mathcal{J}}_{S,W}|)\mu T\leq(1/\varepsilon\delta+1/\varepsilon^{2})\delta\varepsilon^{3}T\leq 2\varepsilon^{2}T$ . Note that after this extension the size defining job, which might has been cut for the analysis, can be scheduled in the group where it first appears.

To schedule the small jobs, we use the next fit decreasing height (NFDH) algorithm to place them next to the configurations. We can sort the small jobs by height in $\mathcal{O}(n+\log(n)/\varepsilon^{2})$ since there are at most $\mathcal{O}(\log(n)/\varepsilon^{2})$ possible processing times.

Note that the total work of the small jobs has to fit next to the configurations. The reason is that the configurations have a total work which equals the total work of the wide jobs. Furthermore after scheduling the large jobs, the total idle time of the machines was at least as large as the total work of the small jobs.

The NFDH algorithm sorts the small jobs by height and places them into shelves starting with the tallest job, see Figure 6. In each shelf there are at most $\varepsilon m$ machines which are completely idle since each narrow job requires at most $\varepsilon m$ machines. If there would be more idle machines, another job would have fitted in this shelf.

Furthermore, there can be machines that start to idle before the starting time of the next shelf, namely in the moment when a job with a processing time smaller than the first job in this shelf has finished its processing time. Let $p_{\max,i}$ be the largest processing time in shelf $i$ , then the idle time of the machines which start to idle in shelf $i$ is bounded by $p_{\max,i}-p_{\max,i+1}$ . Therefore in total the processing time of machines starting to idle over all shelves is bounded by $p_{\max}\cdot m$ while the total idle time of machine being idle during the whole shelf is bounded by $\varepsilon m\cdot T$ . Hence the total work of narrow small jobs that cannot be scheduled next to the configurations, is bounded by $\varepsilon m\cdot T+\mu T\cdot m$ . By using NFDH again to schedule these jobs, we add at most $(\mu+\varepsilon)mT/(1-\varepsilon)m+p_{\max}\leq 2\varepsilon T$ to the makespan.

3.4 Medium Jobs

In the last step, we schedule the medium sized jobs. First, we sort them by their processing time. This can be done in $\mathcal{O}(n+1/\varepsilon^{2})$ since there are at most $3/\varepsilon^{2}$ different processing times between $\mu=\varepsilon^{3}\delta$ and $\delta$ . Afterward, we use the NFDH algorithm to place the jobs. Hence, we start with the tallest one and place the jobs one by one in shelves. Coffman et al. [6] have shown the following (slightly adapted) Lemma:

Lemma 5 (See [6]):

For any list $L$ ordered by nonincreasing height,

[TABLE]

We know that $W(\mathcal{J}_{M})$ is bounded by $\varepsilon mT$ and $p_{\max}$ is bounded by $\delta T\leq\varepsilon T$ . Therefore, we add at most $\mathrm{NFDH}(\mathcal{J}_{M})\leq 2\varepsilon T+\varepsilon T\leq 6\varepsilon\mathrm{OPT}$ to the makespan, by scheduling the medium sized jobs this way.

3.5 Summary

Given $\mathcal{J}$ , $m$ and $\varepsilon$ the algorithm can be summarized as follows:

In the first step of the algorithm, we simplify the instance. We define the lower bound $T:=\max\{p_{\max},(\sum_{j\in\mathcal{J}}p_{j}q_{j})/m\}$ and round the processing times such that they are multiples of $\varepsilon T/n$ . Next, we find the correct values for $\delta$ and $\mu$ and partition the jobs into $\mathcal{J}_{L,W}\operatorname{\dot{\cup}}\mathcal{J}_{L,N}\operatorname{\dot{\cup}}\mathcal{J}_{M}\operatorname{\dot{\cup}}\mathcal{J}_{S,W}\operatorname{\dot{\cup}}\mathcal{J}_{S,N}$ accordingly. Afterward, we round the processing times of all jobs using Lemma 2 and generate $\bar{\mathcal{J}}_{L,N}$ . Last, we generate $\bar{\mathcal{J}}_{S,W}$ , i.e., we round the machine requirements of the horizontal jobs. 2. 2.

After the simplification steps area done, we start a binary search for the correct size of $\mathrm{OPT}$ for the rounded instance. Note that to find the correct value for $\mathrm{OPT}$ for the rounded instance, we are only interested in the number of layers needed to place the jobs in $\mathcal{J}_{L,W}\operatorname{\dot{\cup}}\bar{\mathcal{J}}_{L,N}\operatorname{\dot{\cup}}\bar{\mathcal{J}}_{S,W}$ . We know that we need at least $l=1/(\varepsilon\delta)$ layer but at most $u=2(1+\varepsilon)(1+2\varepsilon)/(\varepsilon\delta)$ layer. We start our binary search using $L=\lfloor(l+u)/2\rfloor$ layers. 3. 3.

Given a number of layers $L$ , we try each possibility to schedule $\mathcal{J}_{L,W}\cup\bar{\mathcal{J}}_{L,N}$ using at most this number of layers. For each of these possibilities, we try to solve $LP_{small}$ for $\bar{\mathcal{J}}_{S,W}$ with last allowed layer $L$ . If $LP_{small}$ is solvable, we save the $LP$ -solution and the choice for $\mathcal{J}_{L,W}\cup\bar{\mathcal{J}}_{L,N}$ and set the upper bound $u=L-1$ update $L$ accordingly. Otherwise, we try the next choice for $\mathcal{J}_{L,W}\cup\bar{\mathcal{J}}_{L,N}$ . If all the possibilities to schedule these jobs fail, we set $l=L+1$ and update $L$ accordingly. 4. 4.

The binary search part is finished as soon as $u<l$ . When this is the case, we consider the last solution for $LP_{small}$ and the corresponding choice for $\mathcal{J}_{L,W}$ and $\bar{\mathcal{J}}_{L,N}$ . we scale back all the processing times and assign the jobs $\mathcal{J}_{L,N}$ and $\mathcal{J}_{S,W}$ to this solution and shift the fractional placed jobs to the top or the extra cluster $C_{2}$ as described in Section 3.2 and Section 3.3. We schedule the jobs $\mathcal{J}_{S,N}$ next to the configurations for $\bar{\mathcal{J}}_{S,W}$ using the NFDH algorithm. Finally, we schedule the medium sized jobs on top of the schedule using the NFDH algorithm.

In most of the simplification steps, we have some loss in the approximation ratio of size $\mathcal{O}(\varepsilon T)$ . Since $T\leq\mathrm{OPT}$ it holds that the algorithm has an approximation ratio of the form $(1+\mathcal{O}(\varepsilon))\mathrm{OPT}+p_{\max}$ . To reach a $(1+\varepsilon)\mathrm{OPT}+p_{\max}$ algorithm the value $\varepsilon$ has to be scaled accordingly. Note that for the sake of simplicity, we did not optimize the above algorithm to guarantee the best possible running time with regard to the added $\mathcal{O}(\varepsilon)$ .

The total running time of the algorithm is bounded by

[TABLE]

which concludes the proof of Theorem 3.

Next, we describe how this leads to the algorithm from Theorem 1. As described, we can use this algorithm to find the schedule on the clusters $C_{1}$ and $C_{2}$ as needed for the algorithm described in the proof of Lemma 1. Both algorithms combined have the properties of the algorithm needed for the proof of Theorem 1. The algorithm from Lemma 1 will call the above algorithm with $\varepsilon=1/8$ if $N=2$ or $\varepsilon\geq 1/4$ for the case that $N\geq 6$ . Hence in the worst case the additive constant becomes something like $\Omega(8^{8^{8}})$ for $N=2$ and $\Omega(4^{256})$ for $N\geq 6$ . However, note that the above running time is a worst case running time and that, depending on the instance, we might have $\delta=\varepsilon$ rather than $\delta=\varepsilon^{3/\varepsilon}$ , what will reduce the additive constant significantly.

To prove the MCS part of Theorem 5 note that we can use the algorithm described in this section to find a schedule on $N$ clusters with ratio $(1+\varepsilon)\mathrm{OPT}+p_{\max}$ in the same running time. Let $\mathrm{OPT}$ be the makespan of an optimal schedule on $N\geq 2$ clusters. Consider a solution for an instance of Parallel Task Scheduling generated by the algorithm above. It has a makespan of $T_{\texttt{Alg}}\leq(1+\varepsilon)N\mathrm{OPT}+p_{\max}$ . Define $T_{\texttt{Alg}}^{\prime}:=T_{\texttt{Alg}}-p_{\max}$ . We partition the schedule at multiples of $T_{\texttt{Alg}}^{\prime}/N$ , and schedule each job starting between two of these multiples on the same cluster, such that the jobs remain their relative starting positions. Since $T_{\texttt{Alg}}^{\prime}\leq(1+\varepsilon)N\mathrm{OPT}$ , each of these parts has a height of at most $(1+\varepsilon)\mathrm{OPT}+p_{\max}$ . This concludes the proof of Theorem 5.

4 A Faster Algorithm for a Practical Number of Jobs

Note that in the algorithm described above, we have a running time of $\mathcal{O}(n)$ , but the hidden constant can be extremely large. Hence, in practical applications it can be more useful to use an algorithm with running time $\mathcal{O}(n\log(n))$ or $\mathcal{O}(n^{2})$ , to find an $\alpha\mathrm{OPT}+p_{\max}$ approximation for Parallel Task Scheduling (PTS). For $N\geq 6$ , we use $\varepsilon\in[1/4,1/3]$ and, hence, a fast $\mathrm{poly}(n)$ algorithm without large hidden constants and approximation ratio $(5/4)\mathrm{OPT}+p_{\max}$ would bring an significant improvement for the vast majority of cluster numbers with $2$ and $5$ being the only exceptions. Even an algorithm with approximation ratio $(4/3)\mathrm{OPT}+p_{\max}$ would speed up the algorithm for one third of all the possible instances, namely all the instances where the number of clusters is dividable by three.

To this point, we did not find either of the algorithms, and we leave this as an open question. Instead, we present a fast algorithm with approximation ratio $(3/2)\mathrm{OPT}+p_{\max}$ . This algorithm for PTS leads to an algorithm for MCS with approximation ratio $9/4$ for all instances where $N\bmod 3=0$ .

In the description of the following algorithm, we need the concept of an idle machine. A machine is idle at a time $\tau$ if it does not processes any job at that time. Given a point in time $\tau$ the number of idle machines at that time is given by

[TABLE]

and the total idle time up to $\tau$ is defined by

[TABLE]

Lemma 6:

There is an algorithm for PTS with approximation guarantee $(3/2)\mathrm{OPT}+p_{\max}$ and running time $\mathcal{O}(n\log(n))$ . This schedule can be divided into two clusters $C_{1}$ and $C_{2}$ , where the schedule on $C_{1}$ has a makespan of at most $(3/2)\mathrm{OPT}$ and the makespan of $C_{2}$ is bounded by $p_{\max}$ .

Proof.

In the following, we describe the steps of the algorithm. The first part of the algorithm is to find a schedule for the jobs with machine requirement larger than $m/3$ . In the second part, we schedule the jobs with machine requirement at most $m/3$ in a best fit manner. This second part depends on one property from the schedule for the jobs with resource requirement larger than $m/3$ , as we will see later. This algorithm uses the following optimized variant of List-Scheduling as described in Turek et al [23]: Starting at time $\tau=0$ for every endpoint of a job, schedule the widest job that can be started at this point if there is one; otherwise, go to the next endpoint and proceed as before.

The first part of the algorithm can be summarized as follows:

For a given set of jobs $\mathcal{J}$ , first consider the jobs $j\in\mathcal{J}$ with $q(j)\in[m/3,m]$ and sort them by decreasing size of the machine requirement $q(j)$ . 2. 2.

We stack all the jobs $j\in\mathcal{J}$ with $q(j)>m/2$ ordered by their machine requirement such that the largest starts at time [math], see Figure 7. 3. 3.

Look at the job with the smallest requirement of machines larger than $m/3$ and place it at the first possible point in the schedule next to the jobs with machine requirement larger than $m/2$ . We call this point in time $\tau$ . 4. 4.

Schedule all the other jobs with machine requirement at least $m/3$ with the optimized List-Schedule starting at $\tau$ . The List-Schedule includes the endpoints of the already scheduled jobs.

Let $\tau^{\prime}$ be the point in time, where the last job ends, which needs more than $m/2$ machines and define $\tau^{\prime\prime}$ to be the first point in time where both jobs scheduled at $\tau^{\prime}$ have ended. Furthermore, let $T^{\prime}$ be the last point in the schedule where two jobs are processed and define $T:=\max\{T^{\prime},\tau^{\prime}\}$ . Note that at each point in the schedule between $\tau^{\prime}$ and $T$ there will be scheduled exactly two jobs with machine requirement in $[m/3,m/2]$ , while between $\tau^{\prime}$ and $\tau$ it can happen that there is no job from this set.

We claim that $T\leq\mathrm{OPT}$ . If $T=\tau^{\prime}$ , this is obvious since we never can schedule jobs with machine requirement larger than $m/2$ at the same time. Consider the case that $T=T^{\prime}$ . Between $\tau^{\prime}$ and $T$ there will be scheduled two jobs at each point in the processing time. Hence if $T$ is larger than $\mathrm{OPT}$ there has to be a schedule, where one more of the jobs between this points of time is scheduled below $\tau^{\prime}$ . But since each job is scheduled as early as possible, there can be no such job, which proves the claim.

In the next step, we are going to schedule the residual jobs, which have a machine requirement of at most $m/3$ . In order to schedule these jobs, we might reconstruct the schedule generated so far. This reconstruction is necessary, if the schedule generated so far has a to large amount of idle time on the machines. As a result of this large amount of idle time, we cannot guarantee a small approximation ratio, when scheduling the residual jobs. Furthermore, note that if there are no jobs with machine requirement at most $m/3$ , we do not need to add further steps and have found a schedule with approximation guarantee $\mathrm{OPT}+p_{\max}$ .

Let $a$ be the total processing time before $T$ , where only one job is scheduled. This job has to be a job with machine requirement larger than $m/2$ . Let $b$ be the total processing time, where just two jobs are scheduled. We will now consider two cases: $a>b$ and $a\leq b$ . In the first case, we have to reconstruct the schedule found so far, while in the second case this is not necessary.

We can summarize the second part of the algorithm, where we schedule the jobs with machine requirement at most $m/3$ , as follows:

Find $a$ and $b$ 2. 6.

If $a>b$ , dismantle the schedule and stack all the jobs with machine requirement larger than $m/3$ on top of each other, sorted by machine requirement such that the widest one starts at [math]. Schedule the residual jobs with the modified List-Schedule starting at [math] and using the endpoints of all jobs. 3. 7.

Else if $a\leq b$ , determine $\tau^{\prime\prime}$ and use the optimized List-Schedule to schedule the remaining starting at $\tau^{\prime\prime}$ while using the endpoints of all scheduled jobs.

In the following, we will argue that the second part of the described algorithm delivers a schedule with approximation guarantee $(3/2)\mathrm{OPT}+p_{\max.}$

Case 6: $a>b$ .

In this case, the algorithm performs the following steps: We stack all the jobs with machine requirement larger than $m/3$ on top of each other sorted by decreasing number of required machines. This stack has a height of at most $a+2b+p_{\max}$ and the last job of this stack is starting before or at $a+2b$ . For the remaining jobs, i.e., the jobs with machine requirement at most $m/3$ , we us the the improved List-Schedule algorithm as described in Turek et al. [23]. This means, we go through the schedule from the bottom to the top and look for each end point of jobs $t$ , starting with $t=0$ , at the number of idle machines $\mathrm{idle}(t)$ . We search for the widest unscheduled job with $q(j)\leq\mathrm{idle}(t)$ and start it at this time, if one exists, and calculate the new number of idle machines at this point in time. If no such job exists, we go to the next end point of a job since the number of idle machines only changes at these points.

We claim that this schedule has a makespan of at most $(3/2)\mathrm{OPT}+p_{\max}$ . Let $\rho$ be the last starting point of a job in this schedule. If this point is larger than $a+2b$ , the last scheduled job has a machine requirement of at most $m/3$ . By construction of the schedule, this job could not be scheduled at any earlier time. Hence at each time in the schedule before $\rho$ , we use at least $(2/3)m$ machines and therefore $\rho(2/3)m\leq W(\mathcal{J})$ . Furthermore, we know that $\mathrm{OPT}\geq W(\mathcal{J})/m$ . As a consequence it holds that $\rho\leq(3/2)\mathrm{OPT}$ . Since $\rho$ is the last starting position of all jobs, the makespan of the schedule is bounded by $(3/2)\mathrm{OPT}+p_{\max}$ .

On the other hand if $\rho\leq a+2b$ , the last starting job can be a job with machine requirement larger than $m/3$ . However, the schedule is then bounded by $a+2b+p_{\max}$ . Since $a>b$ and $a+b\leq\mathrm{OPT}$ it holds that $b\leq\mathrm{OPT}/2$ and therefore $a+2b+p_{\max}\leq(3/2)\mathrm{OPT}+p_{\max}$ .

Case 7: $a\leq b$ .

We now consider the case that $a\leq b$ . In this scenario, we do not dismantle the given schedule as we do in Case 1. Instead, we use the improved List-Schedule algorithm as described in Turek et al. [23] to schedule the remaining jobs. To prove that the resulting algorithm has an approximation guarantee of $(3/2)\mathrm{OPT}+p_{\max}$ , we analyze the total idle time up to the point $T$ before we schedule the residual jobs.

Let $t_{a}$ be an arbitrary point in time before $T$ where only one job is scheduled and let $t_{b}$ be an arbitrary point in time where two jobs are scheduled. Note that $\mathrm{idle}(t_{b})<m/3$ since both jobs scheduled at this time have a machine requirement of at least $m/3$ . We differentiate two cases $t_{b}<\tau^{\prime}$ and $t_{b}\geq\tau^{\prime}$ and claim that in both cases the sum of numbers of idle machines at $t_{a}$ and $t_{b}$ is bounded by $\frac{2}{3}m$ . As a a consequence of this claim, the average number of idle machines at all of these pairs of points is bounded by $m/3$ and hence, the total idle time up tp the point $T$ is bounded by $m/3\cdot(a+b)\leq Tm/3$ because $a\leq b$ and at each point $t_{b}$ the idle time is bounded by $m/3$ .

Case 7.1: $t_{b}<\tau^{\prime}$ .

In the case that $t_{b}<\tau^{\prime}$ , the number of idle machines $\mathrm{idle}(t_{b})$ is bounded by $m/6$ since there is scheduled one job with machine requirement at least $m/2$ and one job with machine requirement at least $m/3$ . On the other hand, $\mathrm{idle}(t_{a})$ is bounded by $m/2$ . Therefore, the sum of free machines on both points is bounded by $\frac{2}{3}m$ and hence the average is bounded by $m/3$ .

Case 7.2: $t_{b}\geq\tau^{\prime}$ .

If $t_{b}\geq\tau^{\prime}$ , there are two jobs with machine requirement at least $m/3$ scheduled at this point in time and hence $\mathrm{idle}(t_{b})<m/3$ . Remember that $t_{a}<\tau^{\prime}$ since at each point in time after the point $\tau^{\prime}$ up to the point $T$ there will be two jobs scheduled. Therefore, $t_{a}<t_{b}$ and the jobs scheduled at $t_{b}$ did not fit at the time $t_{a}$ since otherwise they would have been scheduled there. As a consequence, it holds that $\mathrm{idle}(t_{a})\leq(m-\mathrm{idle}(t_{b}))/2$ because the job with the smaller machine requirement scheduled at $t_{b}$ has a machine requirement of at most $(m-\mathrm{idle}(t_{b}))/2$ . Hence it holds that $\mathrm{idle}(t_{a})+\mathrm{idle}(t_{b})\leq m/2+\mathrm{idle}(t_{b})/2$ . Since $\mathrm{idle}(t_{b})\leq m/3$ , we have $\mathrm{idle}(t_{a})+\mathrm{idle}(t_{b})\leq\frac{2}{3}m$ .

In conclusion, we have $\mathrm{idle}(t_{a})+\mathrm{idle}(t_{b})\leq\frac{2}{3}m$ in both cases $t_{b}<\tau^{\prime}$ and $t_{b}\geq\tau^{\prime}$ . Hence the average number of idle machines for each pair of two points $t_{a}$ and $t_{b}$ is bounded by $m/3$ . Since $a\leq b$ and at each point $t_{b}$ , where two jobs are scheduled, there are at most $m/3$ machines idle, the total idle time below $T$ is bounded by $Tm/3$ . The residual jobs are scheduled by the best fit algorithm in [23]. Let $\tau^{\prime\prime}$ be the first point in time where both jobs scheduled at $\tau^{\prime}$ have ended. Note that after this point in time the number of idle machines is monotonically increasing per time step. Hence, we can use the improved List-Schedule algorithm without constructing any machine conflicts.

To analyze the approximation ratio after adding the residual jobs, let $\rho$ be the last point in the schedule where a job is started. If this job has a width of at most $m/3$ at every time before $\rho$ and after $T$ the number of idle machines is at most $m/3$ since otherwise this job would have been started earlier. If this job has a machine requirement larger than $m/3$ it has been started before $T$ . In both cases the total idle time up to $\rho$ is bounded by $\rho m/3$ . As a consequence, we have $\rho\leq 3/2\cdot\mathrm{OPT}$ since all jobs start before $\rho$ and $m\mathrm{OPT}\geq\sum_{j\in\mathcal{J}}p_{j}q_{j}\geq(2/3)\rho m$ . Therefore, the schedule has a makespan of at most $(2/3)\mathrm{OPT}+p_{\max}$ .

We have proven that in both cases $a>b$ and $a\leq b$ the described algorithm produces a schedule with makespan at most $(2/3)\mathrm{OPT}+p_{\max}$ . This algorithm has a running time of the form $\mathcal{O}(n\log(n))$ : The sorting of the items is possible in $\mathcal{O}(n\log(n))$ ; each of the values $a$ , $b$ and $\tau^{\prime\prime}$ can be found in $\mathcal{O}(n)$ ; and last the optimized List-Schedule can be implemented to be in $\mathcal{O}(n\log(n))$ by organizing the relevant points in time as well as the set of items inside a search tree.

Last, we describe how to partition this schedule into the schedule on the two clusters $C_{1}$ and $C_{2}$ as needed for the algorithm in Lemma 1. Note that in all the described cases the additional $p_{\max}$ is added by the last started job. To partition this schedule such that it is scheduled on the two clusters $C_{1}$ and $C_{2}$ , we look at the starting time $\rho$ of the last started job. We remove this last started job and all the jobs which end strictly after $\rho$ and place them into the second cluster $C_{2}$ and leave the rest untouched to be the schedule for $C_{1}$ . As we noted before the schedule up to $\rho$ has a height of at most $(3/2)\mathrm{OPT}$ . Furthermore, since the last job starts at $\rho$ , all the removed jobs have a total machine requirement of at most $m$ , and, hence, we can start them all at the same time. The resulting schedule on $C_{2}$ has a height of at most $p_{\max}$ . ∎

In the next step, we present the technique to divide this schedule on $C_{1}$ and $C_{2}$ to $N$ clusters and prove Theorem 6 in this way. The used technique is similar to the technique in Section 2. However, it is no longer possible to partition the schedule into sections of height at most $2\mathrm{OPT}$ .

4.1 Proof of Theorem 6

In this section, we prove Theorem 6. We start with the schedule given by the $(3/2)\mathrm{OPT}+p_{\max}$ algorithm from Lemma 6 and its partition onto the two clusters $C_{1}$ and $C_{2}$ . To partition the schedule on $C_{1}$ onto the different clusters, we differentiate the three cases $N=3i$ , $N=3i+1$ and $N=3i+2$ .

Case 8: $N=3i$

In this case, the schedule on $C_{1}$ has a height of $T\leq(9i/2)\mathrm{OPT}$ . We partition it into $2i$ parts of equal height $T/(2i)\leq(9/4)\mathrm{OPT}$ . During this partition step, we cut the schedule $2i-1$ times. The jobs intersected by this cut have to be scheduled separately using height $p_{max}$ . Together with the jobs in $C_{2}$ , we have $2i$ sets of jobs with height bounded by $p_{max}$ and machine requirement bounded by $m$ . We schedule these sets pairwise in $i$ additional clusters analogously to the clusters of type B in Section 2. In total, we use $3i=N$ Clusters and the largest one a has height of at most $(9/4)\mathrm{OPT}=2.25\mathrm{OPT}$ .

Case 9: $N=3i+1$

In this case, the schedule on $C_{1}$ has a height of $T\leq(3(3i+1)/2)\mathrm{OPT}=((9i+3)/2)\mathrm{OPT}$ . We partition the schedule into $2i$ parts of equal height and one part with a smaller height. On this part, we schedule the jobs from $C_{2}$ as well. Let $T_{A}:=(2/(9i+3))T\leq\mathrm{OPT}$ . The $2i$ parts of equal height have a size of $((9i+5)/(4i+2))T_{A}$ and the last part has a height of $((5i+3)/(4i+2))T_{A}$ . It is easy to verify the $2i\cdot((9i+5)/(4i+2))T_{A}+((5i+3)/(4i+2))T_{A}=((9i+3)/2)T_{A}=T$ and hence we have partitioned the complete schedule on $C_{1}$ . By partitioning the schedule on $C_{1}$ into these parts, we have cut the schedule $2i$ times. Therefore, together with the jobs on $C_{2}$ , we have to schedule $2i+1$ parts of height $p_{\max}$ . We schedule $C_{2}$ on the cluster with current makespan $((5i+3)/(4i+2))T_{A}$ resulting in a schedule of height $((5i+3)/(4i+2))T_{A}+p_{\max}\leq((9i+5)/(4i+2))\mathrm{OPT}$ , (since $p_{\max}\leq\mathrm{OPT}$ ). We pair the other $2i$ parts and schedule them on $i$ distinct clusters. In total, we generate $2i+1+i=3i+1$ cluster and the largest occurring makespan is bounded by $((9i+5)/(4i+2))\mathrm{OPT}$ .

Case 10: $N=3i+2$

In this case, the schedule on $C_{1}$ has a height of $T\leq(3(3i+2)/2)\mathrm{OPT}=((9i+6)/2)\mathrm{OPT}$ . Again, we partition this schedule into $2i+1$ parts of equal height and one part with a smaller height. On top of this part, we will schedule two parts with processing time $p_{\max}$ . Let $T_{A}:=(2/(9i+6))T\leq\mathrm{OPT}$ . The first $2i+1$ parts of $C_{1}$ have a height of $((9i+10)/(4i+4))T_{A}$ and the last part has a height of at most $((i+2)/(4i+4))T_{A}$ . It is easy to verify that $(2i+1)((9i+10)/(4i+4))T_{A}+((i+2)/(4i+4))T_{A}=((9i+6)/2)T_{A}=T$ and, hence, we have scheduled all parts of $C_{1}$ . Since $((i+2)/(4i+4))T_{A}+2p_{\max}\leq((9i+10)/(4i+4))\mathrm{OPT}$ , we can schedule two parts with processing time at most $p_{\max}$ on this cluster. We have cut the schedule on $C_{1}$ exactly $2i+1$ times. Together with the jobs from $C_{2}$ , we have $2i+2$ parts with processing time at most $p_{\max}$ we have to schedule inside the other clusters. Since we already have scheduled two of these parts, we pair the residual $2i$ parts and generate $i$ new clusters with makespan at most $2p_{\max}\leq 2\mathrm{OPT}$ . In total, we generated $2i+2+i=3i+2$ clusters and the largest makespan occurring on the clusters is bounded by $((9i+10)/(4i+4))\mathrm{OPT}$ .

For each of the three cases $N=3i$ , $N=3i+1$ , and $N=3i+2$ , we have presented a partitioning strategy which partitions the schedule from clusters $C_{1}$ and $C_{2}$ onto $N$ clusters such that each cluster has a makespan of at most $(9/4)\mathrm{OPT}$ , $((9i+5)/(4i+2))\mathrm{OPT}$ or $((9i+10)/(4i+4))\mathrm{OPT}$ respectively. Hence, we have proven Theorem 6.

5 An AEPTAS for Strip Packing

In this section, we present an $(1+\varepsilon)\mathrm{OPT}+h_{\max}$ algorithm for Strip Packing (SP) with running time $\mathcal{O}(n/\varepsilon)+\mathcal{O}_{\varepsilon}(1)$ proving Theorem 4. It is inspired by the algorithm in [16]. However, we made some improvements to guarantee an efficient running time. In the description of this algorithm, we will assume that $1/\varepsilon\in\mathbb{N}$ .

This algorithm combined with the techniques from Section 2 delivers a 2-approximation algorithm for Multiple Strip Packing (MSP). To prove Theorem 2, we need to place some of the jobs on another strip named $C_{2}$ , which has a width of at most $\gamma W$ . We have either $\gamma=1$ for the case $N\geq 3$ or $\gamma=1/8$ for the case that $N=2$ . In the following description of the algorithm, we proof that the total width of the items placed on top of the packing can be bounded by $\gamma W$ , and hence it is possible to place them inside the extra cluster instead of at the top of the packing. When interested solely in the AEPTAS the value $\gamma$ can be set to $1$ .

To find the algorithm for Theorem 5 for the MSP case, we call the algorithm from this section with $\gamma=1$ and cut the resulting schedule with makespan $T\leq(1+\varepsilon)N\mathrm{OPT}+h_{\max}$ into parts $N$ of height $(T-h_{\max})/N$ , such that the items overlapping the cut stick together with the part below the cut. As a result each part has a height of at most $(T-h_{\max})/N+h_{\max}\leq(1+\varepsilon)\mathrm{OPT}+h_{\max}$ .

5.1 Simplify

Similar as in Section 3, we start with defining an upper and a lower bound for the approximation ratio. Let $\mathcal{A}(\mathcal{I}):=\sum_{i\in\mathcal{I}}w(i)h(i)$ be the total area of all the items and let $h_{\max}$ be the largest occurring height in $\mathcal{I}$ . By Steinberg [20], we now that $\max\{\mathcal{A}(\mathcal{I}),h_{\max}\}\leq\mathrm{OPT}\leq 2\max\{\mathcal{A}(\mathcal{I}),h_{\max}\}$ and we define $T:=\max\{\mathcal{A}(\mathcal{I}),h_{\max}\}$ .

In the first step, we partition the items by their size. Other than in the algorithm for Parallel Task Scheduling (PTS), we need a gap between wide and narrow items as well. Hence, we partition the items into large $\mathcal{L}:=\{i\in\mathcal{I}|h(i)\geq\delta T,w(i)\geq\delta W\}$ , vertical $\mathcal{V}:=\{i\in\mathcal{I}|h(i)\geq\delta T,w(i)\leq\mu W\}$ , horizontal $\mathcal{H}:=\{i\in\mathcal{I}|h(i)\leq\mu T,w(i)\geq\delta W\}$ , small $\mathcal{S}:=\{i\in\mathcal{I}|h(i)\leq\mu T,w(i)\leq\mu W\}$ and medium sized items $\mathcal{M}:=\mathcal{I}\setminus(\mathcal{L}\cup\mathcal{V}\cup\mathcal{H}\cup\mathcal{S})$ for some $\delta,\mu\leq\varepsilon$ , see Figure 8.

We will discard the medium sized items and place them at the end of the packing. To make this possible the total area of the medium sized items has to be small. In the next lemma, we show that we can find values for $\delta$ and $\mu$ which guarantee this property.

Lemma 7:

Consider the sequence $\sigma_{0}:=\varepsilon$ , $\sigma_{i+1}=\sigma_{i+1}^{3}\varepsilon^{5}\gamma/x$ . There exists an $j\in\{0,\dots,2/(\varepsilon\gamma)-1\}$ such that when defining $\delta=\sigma_{j}$ and $\mu=\sigma_{j+1}$ the total area of the medium sized items $\mathcal{I}_{M}$ is bounded by $\gamma\varepsilon WT$ .

Proof.

This Lemma follows by a direct application of the pigeon hole principle. Let $\mathcal{M}_{j}$ be the set of medium sized items when defining $\delta=\sigma_{j}$ and $\mu=\sigma_{j+1}$ . Each item $i\in\mathcal{I}$ can appear in at most two of these sets, in the first because its width is between $\mu W$ and $\delta W$ and in the second, because its height is between $\mu T$ and $\delta T$ . Assume that all the sets $\mathcal{M}_{j}$ have an area $\mathcal{A}(\mathcal{M}_{j})>\gamma\varepsilon\cdot W\cdot T$ . As a consequence the total area of all these sets is at least $\sum_{j=0}^{2/(\varepsilon\gamma)-1}\mathcal{A}(\mathcal{M}_{j})>2\cdot W\cdot T$ , a contradiction since the total area of all the items is bounded by $W\cdot T$ . ∎

Furthermore, it holds that $\delta\geq(\varepsilon\gamma/x)^{3^{\mathcal{O}(1/(\varepsilon\gamma))}}$ . We define $\delta^{\prime}:=\varepsilon^{k}$ as the maximum number such that $\varepsilon\delta^{\prime}\geq\delta\geq\delta^{\prime}$ . Note that $k\in 3^{\mathcal{O}(1/(\varepsilon\gamma))}$ and use $\delta^{\prime}$ for the partitioning of the items. As a consequence, the the area of the medium items is still at most $\gamma\varepsilon WT$ , but the distance between $\delta^{\prime}$ and $\mu$ is reduced, i.e. we have $\mu=\delta^{3}\varepsilon^{5}\gamma/x\leq(\delta^{\prime}/\varepsilon)^{3}\varepsilon^{5}\gamma/x=\delta^{\prime 3}\varepsilon^{2}\gamma/x$ . However, for simplicity of notation, we will write $\delta$ instead of $\delta^{\prime}$ in the following and use $\mu\leq\delta\varepsilon^{2}\gamma/x$ respectively.

In the second step, we round the heights of the items. By increasing the packing height by at most $\varepsilon T$ , we can round the heights of the items to multiples of $\varepsilon T/n$ , because adding $\varepsilon T/n$ to each processing time lengthens the packing by at most $n\cdot\varepsilon T/n$ . Hence after this rounding step, we have $T\leq\mathrm{OPT}\leq(2+\varepsilon)T$ . Since each item has a height of at most $T$ , there are at most $n/\varepsilon$ different item sizes, and hence, sorting them by height can be done in $\mathcal{O}(n/\varepsilon)$ using Bucket-Sort. Furthermore, the largest $l$ such that $p_{j}\in\{\varepsilon^{l}T,\varepsilon^{l-1}T\}$ is bounded by $\mathcal{O}(\log(n))$ .

In the next step, we scale the instance with $n/\varepsilon T$ . As a result all the items have a height that is one of the integral values $\{1,2,\dots,n/\varepsilon\}$ and the optimal packing height for this scaled instance is one of the integral values $\{n/\varepsilon,n/\varepsilon+1,\dots,2n/\varepsilon+n\}$ , because for the rounded instance it holds that $T\leq\mathrm{OPT}\leq 2T+\varepsilon T$ and the optimal packing height has to be integral since all the item heights are integral. We scale $T$ accordingly such that $T=n/\varepsilon$ . In the algorithm, we will do a binary search over the packing heights.

In the next step, we use the same geometric rounding as above to round the heights of the items to fewer different sizes using Lemma 2 and loose a factor of at most $(1+2\varepsilon)$ in the approximation ratio with regard to the scaled instance. Now the items have at most $\mathcal{O}(\min\{n/\varepsilon,\log(n)/\varepsilon^{2}\})$ possible different sizes and, without any further loss, we can assume that all large and vertical items start at multiples of $\varepsilon\delta^{\prime}T$ . We call the area between two consecutive multiples of $\varepsilon\delta^{\prime}T$ a layer and number them starting at zero. To ensue the integrity of the item heights, we scale the instance with $1/(\varepsilon\delta)$ before the rounding step and scale $T$ accordingly such that $T=n/(\varepsilon^{2}\delta)$ . Note that $1/(\varepsilon\delta)\in\mathbb{N}$ since $1/\varepsilon\in\mathbb{N}.$ To this point, we know that with out all the scaling steps it holds that $T\leq\mathrm{OPT}\leq(1+2\varepsilon)(2+\varepsilon)T$ . Hence the number of layers $L$ in an optimal solution is at least $1/(\varepsilon\delta)$ and at most $(1+2\varepsilon)(2+\varepsilon)/(\varepsilon\delta)\leq 5/(\varepsilon\delta)$ for $\varepsilon\leq 1/2$ .

In the next step, we remove all small and medium sized items from the optimal packing and use a Lemma from [13] which states that we can partition any optimal packing into a constant number of sub areas, such that each subarea contains just one type of item.

Lemma 8 (See [14]):

We can partition the area $W\times(1+2\varepsilon)\mathrm{OPT}$ into $\mathcal{O}(1/(\varepsilon\delta^{\prime 2}))$ rectangular areas called boxes.

•

Each large item $i\in\mathcal{L}$ is contained in its personal box of height $h(i)$ and width $w(i)$ .

•

There are at most $\mathcal{O}(1/(\varepsilon\delta^{2}))$ many boxes containing horizontal items $i\in\mathcal{H}$ . Each of them has a height of $\varepsilon\delta^{\prime}T$ and a width larger than $\delta^{\prime}W$ .

•

There are at most $\mathcal{O}(1/(\varepsilon\delta^{2}))$ many boxes containing vertical items $i\in\mathcal{V}$ .

•

No item in $\mathcal{H}$ is intersected vertically by any box border, but can be intersected horizontally

•

No item in $\mathcal{V}$ is intersected horizontally by any box border, but can be intersected vertically.

•

Each boxes lower and upper borders are at multiples of $1/(\varepsilon\delta)\mathrm{OPT}$

In the algorithm, we cannot try each of these partitions since then the width of the strip $W$ would appear linear in the running time. Instead, we are interested in the relative positioning of the large items and the boxes for horizontal items.

5.2 Boxes for horizontal rectangles

The last simplification step is the rounding of widths of the horizontal items. We call the set of generated rounded items $\bar{\mathcal{H}}$ .

Lemma 9:

We can round the width of the horizontal items to $\mathcal{O}(\log(1/\delta)/\varepsilon)$ different sizes in at most $\mathcal{O}(n\log(1/\varepsilon))$ operations. These rounded items can be placed fractionally instead of the horizontal items and an extra box of height $\varepsilon T$ .

Proof.

To round the width of the items, we use a similar technique as for rounding the machine requirements of the small wide jobs in Lemma 3 called geometric grouping. This technique was first introduced by [7] as well. The difference to linear grouping is an additional partitioning step prior to the steps of the linear grouping, as described below.

We first partition the set of horizontal items into the following $\mathcal{O}(\log(1/\delta))$ sets $\mathcal{I}_{H,i}:=\{i\in\mathcal{I}_{H}\,|\,W/2^{i+1}<w_{i}\leq W/2^{i}\}$ . For each of these sets, we perform the steps of linear grouping with a customized adjustment to the height of the segments per set. For these adjusted heights, we use the fact that it is possible to place at least $2^{i}$ and at most $2^{i+1}$ items from the set $\mathcal{I}_{H,i}$ next to each other into the strip.

For each $\mathcal{I}_{H,i}$ , we stack the contained items in order of decreasing width and partition this stack into $1/\varepsilon$ segments of size $\varepsilon h(\mathcal{I}_{H,i})$ , where $h(\mathcal{I}_{H,i})$ is the total height of the items in $\mathcal{I}_{H,i}$ using the original item heights. We define a new job for each segment which has height $\varepsilon h(\mathcal{I}_{H,i})$ and width of the widest item intersecting this segment, see Figure 5. The widest item will be placed at the end of the schedule inside a new box. Since we are allowed to place this item fractionally and we can place at least $2^{i}$ of these fractions next to each other, we need at most $(\varepsilon h(\mathcal{I}_{H,i}))/2^{i}=\varepsilon h(\mathcal{I}_{H,i})/2^{i}$ additional height to place this item.

To place all the largest rounded items from each set $\mathcal{I}_{H,i}$ , we introduce a new box for horizontal items. We define the boxes height as $\sum_{i=0}^{\mathcal{O}(1/\log(\delta))}\varepsilon h(\mathcal{I}_{H,i})/2^{i}$ . For each $i\in\mathbb{N}$ , the total width of $2^{i+1}$ items from the set $\mathcal{I}_{H,i}$ is larger than $W$ and hence $2\sum_{i=0}^{\mathcal{O}(\log(1/\delta))}h(\mathcal{I}_{H,i})/2^{i+1}\leq 2T$ since $T\cdot W\leq 2\mathcal{A}(\mathcal{I})$ . Therefore, the height of the introduced box is bounded by $\varepsilon T$ . The total number of different item widths is bounded by $\mathcal{O}(\log(1/\delta)/\varepsilon)$ .

Regrading the running time, as seen above in the proof of Lemma 3, the size defining items can be found in $\mathcal{O}(|\mathcal{I}_{H,i}|\log(1/\varepsilon))$ for each set $\mathcal{I}_{H,i}$ . Therefore, all the sizes can be found in $\mathcal{O}(\sum_{i=0}^{\mathcal{O}(2^{x/\varepsilon}\log(1/\varepsilon))}|\mathcal{I}_{H,i}|\log(1/\varepsilon))=\mathcal{O}(n\log(1/\varepsilon))$ . ∎

In the next step, we show that it is possible to reduce the number of widths for horizontal boxes to be constant depending on $\delta$ . We do this in order to make it possible for the algorithm to guess their sizes in polynomial time.

Lemma 10:

Given a partition of the optimal solution into boxes, we can reduce the number of possible width for the boxes to $|\bar{\mathcal{I}}_{H}|^{1/\delta}$ and guarantee that at most $\mathcal{O}(1/(\varepsilon\delta))$ of these sizes are used in the partition by exactly $1/\delta$ boxes each. This rounding step adds at most $\varepsilon T$ to the packing height.

Proof.

We reduce the number of box sizes in two steps. First, we reduce the possible number of box sizes, by shrinking the boxes to be a combination of widths of the rounded horizontal items. In the second step, we reduce the number of different box sizes per solution by using a linear grouping step.

Look at one box $B$ for horizontal items. We can shift all the horizontal items in this box to the left as much as possible such that all the left borders of the horizontal items are touching either the box border or the right side of another horizontal item. If the left border of the box does not touch the leftmost item, we can move this border to the left until it does. Now the box for horizontal items has a width which is the sum of widths of rounded horizontal items, i.e. $w(B)\in\{\sum_{i=1}^{1/\delta-1}w_{i}|i\in\bar{\mathcal{I}}_{H}\}$ . As a result the total number of possible box widths is bounded by $|\bar{\mathcal{I}}_{H}|^{1/\delta}$ .

Given such a set of boxes, we can use linear grouping to reduce the total number of different box widths. Since the optimal packing has a height of at most $(1+2\varepsilon)(1+\varepsilon)2T$ and each box has a height of $\varepsilon\delta T$ and there are at most $1/\delta$ boxes for horizontal items in each layer, a sorted stack of all the boxes has a total height of at most $\varepsilon\delta T\cdot(1+2\varepsilon)(1+\varepsilon)2/\varepsilon\delta^{2}\leq(1+2\varepsilon)(1+\varepsilon)2T/\delta$ . We partition the set of boxes such that the the $1/\delta$ widest boxes are contained in the first set, the $1/\delta$ next most wide boxes are contained in the second set and so on. As a result, the total height of each set of boxes is bounded by $\varepsilon T$ and the set of boxes is partitioned into at most $(1+2\varepsilon)(1+\varepsilon)2/(\varepsilon\delta)=\mathcal{O}(1/(\varepsilon\delta))$ groups. Note that the last group might contain less boxes than $1/\delta$ . To enforce that after the rounding there are $1/\delta$ boxes of each width, we assume that the last group has additional boxes with width zero. We round the box widths to the largest box width of the corresponding set. Again the last rounded group of boxes has to be positioned at the end of the packing adding at most $\varepsilon T$ to the packing height. ∎

Let $\mathcal{W}_{B}$ be the set of rounded widths of the boxes. Note that $\mathcal{W}_{B}$ can contain less than $2(1+2\varepsilon)(1+\varepsilon)/(\varepsilon\delta)$ sizes if there are less than $2(1+2\varepsilon)(1+\varepsilon)/(\varepsilon\delta^{2})-1/(2\delta)$ boxes in the partition of the optimal instance. To place the horizontal items, we first guess the set $\mathcal{W}_{B}$ . There are at most $\mathcal{O}((|\bar{\mathcal{I}}_{H}|^{1/\delta})^{\mathcal{O}(1/(\varepsilon\delta))})\leq\mathcal{O}((\log(1/\delta)/\varepsilon)^{\mathcal{O}(1/(\varepsilon\delta^{2}))})$ possibilities for this set.

After we guessed the set of boxes, we check with a linear program whether all the rounded horizontal items can be placed into the boxes. Similar to the placing of small jobs in Section 3.3, we use configurations to place the horizontal items into the boxes. A configuration of horizontal items is a multiset $C:=\{a_{i,C}:i|i\in\bar{\mathcal{I}}_{H}\}$ . Let $\mathcal{C}$ be the set of all configurations. We say a configuration $C$ has width $w(C):=\sum_{i\in\bar{\mathcal{I}}_{H}}a_{i,C}w(i)$ . Let $\mathcal{C}_{w}$ be the set of configurations with width at most $w$ , i.e., $\mathcal{C}_{w}:=\{C\in\mathcal{C}|w(C)\leq w\}$ .

Consider the following linear program $LP_{small}$ .

[TABLE]

The variables $x_{C,w}$ represent the height of a configuration $C$ inside the boxes of width $w$ . The sum of these heights should equal the total height of the boxes having this width, which is ensured by the equation (9). Equation (8) is introduced to represent the extra box for the horizontal items we need due to the rounding of these items. In the other hand each horizontal item should be covered by the configurations, which is ensured by the equation (10).

Similar as for placing the small narrow jobs in Section 3.3, we solve a relaxed version of this linear program called $LP_{small,rel}$ . In this relaxed version, we replace equation (9) by the equation

[TABLE]

and, similarly, we replace the equation (8) by

[TABLE]

Lemma 11:

If there is a solution to $LP_{small}$ , we can find a basic solution to $LP_{small,rel}$ in $\mathcal{O}((|\mathcal{W}_{B}||\bar{\mathcal{I}}_{H}|(\ln(|\bar{\mathcal{I}}_{H}|)+\varepsilon^{-4}))^{1.5356}(|\mathcal{W}_{B}|+|\bar{\mathcal{I}}_{H}|+(\log(1/\varepsilon))^{3}/\varepsilon^{4}))\leq\mathcal{O}(\log(1/\delta)^{1.5356}/\varepsilon^{6}\delta^{6})$ operations.

Proof.

Note that the described linear program and the described configurations are equivalent to the ones for the small narrow jobs. Hence, we can use the algorithm proposed in Lemma 4 to find the desired basic solution. ∎

We call the set of guessed boxes for horizontal items $\mathcal{B}_{H}$ . In the end of the algorithm, we place the configurations inside the boxes and the horizontal items (fractionally) into the configurations similar to the placement of small wide jobs in Section 3.3. A basic solution of the above linear program has at most $|\mathcal{W}_{B}|+|\bar{\mathcal{I}}_{H}|+1$ non zero components. When filling the configurations inside the boxes $\mathcal{B}_{H}$ , we have to cut the configurations at the box borders of boxes with the same size. Hence inside the boxes, we have at most $|\mathcal{B}_{H}|+|\bar{\mathcal{I}}_{H}|+1$ configurations. At each configuration border, we generate fractionally placed horizontal items. However these items all fit next to each other since they are inside one configuration. Hence, we can remove the cut items and shift them up to the top of the packing. This step adds at most $\mu T\cdot(|\mathcal{B}_{H}|+|\bar{\mathcal{I}}_{H}|+1)\leq\mu T(\log(1/\delta)/\varepsilon+\mathcal{O}(1/\varepsilon\delta^{2}))=\mathcal{O}(\varepsilon T)$ to the packing height.

5.3 Positioning containers as well as large and vertical rectangles

In this section, we handle the positioning of the boxes for horizontal items and the placement of large and vertical items. These boxes and items are positioned by guessing the x-coordinate of the lower left corner, which has to be a multiple of $\varepsilon\delta T$ . Afterward, we guess the order from left to right in which these items and boxes will appear. The technique described in this section is inspired by the techniques described in [16] Chapter 4.

In the first step, we guess the position of the lower corners of the items and boxes in $\mathcal{I}_{L}$ and $\mathcal{B}_{H}$ . Note that since the boxes have an area of at least $\varepsilon\delta T\cdot\delta W$ and the large items have an area of at least $\delta T\cdot\delta W$ and the packing has an area of at most $(1+2\varepsilon)(1+\varepsilon)TW$ , there are at most $\mathcal{O}(1/(\varepsilon\delta^{2}))$ boxes and items. Hence, the total number of possible guesses for positions of their bottom edges is bounded by $(1/\varepsilon\delta)^{\mathcal{O}(1/(\varepsilon\delta^{2}))}$ .

Consider an optimal packing where all the items are rounded and the horizontal items are positioned in the rounded boxes. For each large item or box $i\in\mathcal{I}_{L}\cup\mathcal{B}_{H}$ , we can determine the value of the $y$ -coordinates of their left and right borders $y_{i,l}$ and $y_{i,r}$ . Let $\mathcal{Y}$ be the set of all these $y$ -coordinates $y_{i,l}$ and $y_{i,r}$ . We order $\mathcal{Y}$ by value of the coordinates in the optimal packing. This gives us a permutation $\pi:\mathcal{Y}\to\{1,\dots,|\mathcal{Y}|\}$ from the left and right corners of items and boxes to positions in the ordered list. Since the value of $W$ is not logarithmically bounded in the input size, we cannot guess the values of the $y$ -coordinates in polynomial time. However, it is possible to guess the correct permutation $\pi$ in $|\mathcal{Y}|!\in(1/(\varepsilon\delta^{2}))^{\mathcal{O}(1/(\varepsilon\delta^{2}))}$ guesses. For a given item or box $i\in\mathcal{I}_{L}\cup\mathcal{B}_{H}$ , we write $\pi(i,l)$ to refer to the position of $y_{i,l}$ and analogously $\pi(i,r)$ for the position of $y_{i,r}$ and write $y_{j}$ to refer to the $y$ -coordinate which is mapped to position $j$ in the ordered list.

After these two guesses, the guess of the positions of lower borders and the guess of order of the items, the algorithm tests if this guess was feasible, by testing if it is possible at all to position the items as forced by this guess. This can be done in $\mathcal{O}(n)$ by starting with the left most item and position the items one by one in order of the $y$ -coordinates as most to the left as possible by the constraints guessed. As soon as a constraint has to be violated, we stop and discard the guess. Possible violations of the constraints can be, e.g., that an items left border has to be placed between a left and a right border of another item but this item and the to be placed item overlap the same horizontal line that an item has to be placed such that it overlaps the right border of the strip that $\pi(i,l)>\pi(i,r)$ .

Consider a feasible guess of starting positions and permutation. The next step of the algorithm is to find values for the $y$ -coordinates of the left and right borders. It determines these values by using a linear program as described below. Indeed, since the vertical items have to be placed correctly as well, the linear program is not only concerned about determining the $y$ -coordinates, but to place the vertical items as well. Consider two consecutive $y$ -coordinates $y_{j}$ and $y_{j+1}$ and the segments of the layers between these. Some of them are occupied by an item or a box in $\mathcal{I}_{L}\cup\mathcal{B}_{H}$ and some are not. We will use the not occupied layers to place the vertical items. We scan the area between $y_{j}$ and $y_{j+1}$ from bottom to top and fuse each set of contiguous unoccupied layers to a box for vertical items. Let $\mathcal{B}_{V,j}$ be the set of constructed boxes for the area between the coordinates $y_{j}$ and $y_{j+1}$ . Note that there can be at most $\mathcal{O}(1/\varepsilon\delta)$ of them.

Similar as for the horizontal items, we define configurations for the vertical items. However instead of placing these items next to each other, we will stack the items inside a configuration for vertical items on top of each other. Note that in each optimal packing a vertical line through the packing intersects at most $1/\delta$ of these items and hence configurations should contain at most this number of items. We define a new set of vertical items called $\bar{\mathcal{I}}_{V}$ . For each appearing item height $h\in\{h(i)|i\in\mathcal{I}_{V}\}$ , the set $\bar{\mathcal{I}}_{V}$ contains one job of height $h$ and width $\sum_{i\in\mathcal{I}_{V},h(i)=j}w(i)$ . To reduce the running time, we will schedule the jobs in the set $\bar{\mathcal{I}}_{V}$ fractionally instead of the original vertical items. Note that $|\bar{\mathcal{I}}_{V}|\leq\log_{\varepsilon}(1/\delta)/\varepsilon^{2}$ due to the rounding of the vertical items.

A configuration for vertical items is a multiset $C:=\{a_{i,C}:i|i\in\bar{\mathcal{I}}_{V}\}$ such that $\sum_{i\in\bar{\mathcal{I}}_{V}}a_{i,C}\cdot h(i)\leq 1/\delta$ and we define its height as $h(C):=\sum_{i\in\bar{\mathcal{I}}_{V}}a_{i,C}\cdot h(i)$ . Let $\mathcal{C}_{V}$ be the set of all these configurations and let $\mathcal{C}_{V,h}$ be the set of all configurations with height at most $h$ . These configurations for vertical items are combined to hyper configurations which represent the distribution of vertical items in a vertical line through the packing. For each segment between two coordinates $y_{j}$ and $y_{j+1}$ , we define a configuration $C_{j}$ as a tuple of configurations, such that there is exactly one configuration for each of the boxes in $\mathcal{B}_{V,j}$ , i.e., $C_{j}=(C\in\mathcal{C}_{V,h(b)}:b\in\mathcal{B}_{V,j})$ . Let $\mathcal{C}_{V,j}$ be the set of all configurations for the section between the coordinates $y_{j}$ and $y_{j+1}$ . We define $a_{i}(C)$ at the number of appearances of item $i\in\bar{\mathcal{I}}_{V}$ inside the configuration $C\in\mathcal{C}_{V,j}$ . Note that the configurations for the boxes each have a maximum amount of vertical items they can contain and the sum of these numbers is bounded by $1/\delta$ . Hence the total number of different configurations in $\mathcal{C}_{V,j}$ is bounded by $|\bar{\mathcal{I}}_{V}|^{1/\delta}$ . To find fitting values for the $y$ -coordinates the algorithm solves the following linear program:

[TABLE]

In this linear program there are three types of variables: $x$ , $y$ and $w$ . The variables $y_{j}$ for $j\in\{0,\dots,|\mathcal{Y}|+1\}$ represent the values of the $y$ -coordinates of the item and box borders in $\mathcal{I}_{L}\cup\mathcal{B}_{H}$ , wheres $y_{0}$ represents the left border of the strip and $y_{|\mathcal{Y}|+1}$ represents the right borer of the strip. The variables $w_{j}$ for $j\in\{0,\dots,|\mathcal{Y}|\}$ represent the distance between the consecutive $y$ -coordinates $y_{j}$ and $y_{j+1}$ . Last, the variables $x_{C,j}$ represent the width of the configuration $C$ in box $b$ which is positioned between $y_{j}$ and $y_{j+1}$ .

The first three constraints (12) to (14) ensure that the $y$ -coordinates are positioned in the right order and that we use exactly the width of the strip. Furthermore, the variables $w_{j}$ for the width between the $y$ -coordinates are defined. The equation (15) ensures the $y$ -coordinates of the items and boxes in $\mathcal{I}_{L}\cup\mathcal{B}_{H}$ are positioned such that their distance equals the widths of the corresponding item. Equations (16) and (17) ensure that the vertical items are placed correctly. The first equation ensures that we do not use a to large width for the configurations inside the boxes while the second equation ensures that all the vertical items can be placed.

The total number of constraints is bounded by

[TABLE]

While the total number of variables is bounded by

[TABLE]

Furthermore, all appearing values in the linear program are integer, the largest one on the left hand side is bounded by $1/\delta$ while the right hand side is bounded by $W$ . We can solve this linear program by guessing the right set of at most $\mathcal{O}(1/(\varepsilon\delta^{2}))$ non-zero components and then solving the corresponding equation system using Gauß-Jordan elimination in $\mathcal{O}((2^{\mathcal{O}(1/\varepsilon\delta)})^{\mathcal{O}(1/(\varepsilon\delta^{2}))}\cdot(1/(\varepsilon\delta^{2}))^{3})=2^{\mathcal{O}(1/(\varepsilon^{2}\delta^{3}))}$ .

After we have found such a solution, we fix the values for the variables $y_{j}$ and $w_{j}$ for each $j\in\{1,\dots,|\mathcal{Y}|+1\}$ and find a basic solution to the linear program consisting just of the equations (16), (17), and (19). Such a basic solution has at most $|\bar{\mathcal{I}}_{V}|+|\mathcal{Y}|\in\mathcal{O}(1/(\varepsilon\delta^{2}))$ non zero components and hence uses at most this number of configurations.

In the very end of the algorithm, these configurations are filled (fractionally) with the rounded vertical items analogously as small wide jobs items are filed into their configurations, see Section 3.3. Since each configuration contains at most $1/\delta$ items and we use at most $\mathcal{O}(1/(\varepsilon\delta^{2}))$ of them, there are at most $\mathcal{O}(1/(\varepsilon\delta^{3}))$ fractionally placed vertical items which have a total width of at most $\mathcal{O}(\mu W/(\varepsilon\delta^{3}))$ . Since $\mu\leq\gamma\delta^{3}\varepsilon/x$ for a large enough constant $x$ , it holds that the total width of the discarded items is smaller than $\gamma W/2$ . These items are placed on top of the packing, adding at most $p_{\max}$ to the packing height, or in the additional container $C_{2}$ .

5.4 Placing the Small Items

Note that the configurations for vertical and horizontal items might be smaller in height or width as the box they are placed inside, i.e., if a configuration $C$ for vertical items is placed in side a box $b\in\mathcal{B}_{V,j}$ there is a box of free area of width $X_{C,b,j}$ and height $h(b)-h(C)$ . We will use this area to place the small items. The total free area of this kind has to have the size of $\mathcal{A}(\mathcal{I}_{S})$ , since the configurations contain exactly the total area of the corresponding items and the total area of all items is at most $(1+2\varepsilon)(1+\varepsilon)TW$ while the packing has a height of at least $(1+2\varepsilon)(1+\varepsilon)T$ .

Since we use at most $\mathcal{O}(1/(\varepsilon\delta^{3}))$ configurations for vertical items and at most $|\mathcal{B}_{H}|+|\bar{\mathcal{I}}_{H}|=\mathcal{O}(1/(\varepsilon\delta^{2}))$ configurations for horizontal items, there are at most $\mathcal{O}(1/(\varepsilon\delta^{3}))$ boxes for small items. We call the set of these boxes $\mathcal{B}_{S}$ .

Lemma 12:

We can place the small items inside the $\mathcal{O}(1/(\varepsilon\delta^{3}))$ boxes $\mathcal{B}_{S}$ and one additional box of width $W$ and height $2\varepsilon T+\mu T$ .

Proof.

Remember that the total area of the boxes is at least $\mathcal{A}(\mathcal{I}_{S})$ . The algorithm first sorts the small items by height in $\mathcal{O}(n+\log(n)/\varepsilon^{2})$ time since the small items have at most $\mathcal{O}(\log(n)/\varepsilon^{2})$ different sizes. Afterward it considers the boxes for the small items $\mathcal{B}_{S}$ one by one and fills the small items inside them using the NFDH algorithm. If an item does not fit inside the considered box, because the item is to wide or has a to large height, the algorithm is finished with this box and considers the next. All the items that cannot be placed inside the boxes $\mathcal{B}_{S}$ are placed inside the newly introduced box of width $W$ and height $2\varepsilon T+\mu T$ .

Let us consider the boxes next to the configurations and the free area inside them. Let $B$ be such a box. In $B$ there is a free area of at most $\mu W\cdot h(B)$ on one side of $B$ since the small items have a width of at most $\mu W$ . Additionally, there can be free area of at most $\mu T\cdot w(B)$ on the top of the box since the items have a height of at least $\mu T$ . Lastly there can be free area between the items. However as indirectly shown by Coffman et al. in [6] in the proof of Lemma 5, the free area provoked this way over all the boxes is bounded by $\mu T\cdot W$ since the items have a maximal height of at most $\mu T$ and the boxes have a maximal width of at most $W$ . In total the free area inside the boxes $\mathcal{B}_{S}$ is bounded by $\mu TW+\mu T\sum_{B\in\mathcal{B}_{S}}w(B)+\mu W\sum_{B\in\mathcal{B}_{S}}h(B)\leq\mu TW\cdot\mathcal{O}(1/(\varepsilon\delta^{3}))$ . Since it holds that $\mu\leq\varepsilon^{2}\delta^{3}/x$ for a suitable large constant $x$ , the total area of the non placed small items has to be bounded by $\varepsilon TW$ . Using Lemma 5, we can place these non placed items with a total height of at most $2\varepsilon T+\mu T$ inside the extra box. ∎

5.5 Packing medium sized items

To place the medium sized items, we partition them into two sets, $\mathcal{I}_{M,V}$ which contains all the items taller than $2\varepsilon T$ and $\mathcal{I}_{M,S}:=\mathcal{I}_{M}\setminus\mathcal{I}_{M,V}$ . Since the total area of the medium sized items is bounded by $\gamma\varepsilon TW$ , the total width of the items in $\mathcal{I}_{M,V}$ is bounded by $\gamma W/2$ . Hence, we can place all these items at the end of the schedule next to the discarded vertical items. In total this adds at most $h_{\max}$ to the schedule.

The jobs in $\mathcal{I}_{M,V}$ have a height of at most $2\varepsilon T$ and an area of at most $\varepsilon TW$ . Hence by Lemma 5, when using the NDFH algorithm to place these items, we add at most $4\varepsilon T$ to the packing height.

5.6 Summary of the algorithm

In the following, we summarize the steps of the algorithm and give a short overview of the running time. An overview of the generated packing can be found in Figure 9.

In the first step of the algorithm, we perform the simplification steps. We define $T:=\max\{h_{\max},(\sum\nolimits_{i\in\mathcal{I}}h(i)w(i))/W\}$ , find the correct values for $\delta$ and $\mu$ as described in Lemma 7, and partition the set of items into $\mathcal{L}$ , $\mathcal{V}$ , $\mathcal{H}$ , $\mathcal{S}$ , and $\mathcal{M}$ accordingly. Afterward, we round the heights and the widths of the items. First, we round the height of the items to multiples of $\varepsilon T/n$ and scale the items, such that they have heights in $\{1,\dots,n/\varepsilon\}\subseteq\mathbb{N}$ and scale $T$ accordingly such that $T=n/\varepsilon$ . Next, we scale the instance and $T$ again with $1/\varepsilon\delta$ and use Lemma 2 to round heights of the items in $\mathcal{L}\cup\mathcal{V}$ , such that we can assume that they start at multiples of $1/\varepsilon\delta T$ . Furthermore, introduce the set of rounded items $\bar{\mathcal{H}}$ using Lemma 9. 2. 2.

In the next step, we do a binary search over all the possible numbers of layers $L\in[1/(\varepsilon\delta),5/(\varepsilon\delta)]\cap\mathbb{N}$ . Let $T^{\prime}$ be the currently considered number of layers. For this number of layers, we try to find a packing by performing the following steps. 3. 3.

For each guess of the set $\mathcal{W}_{B}$ and each guess of y-coordinates and permutation for boxes and large items: try to solve the configuration linear program $LP_{small}$ to place the horizontal items. If this is not possible try the next guess otherwise try to solve the $LP$ to find the correct positions for the boxes, large items, and vertical items. If this LP is solvable save the guess and LP solutions and try the next smaller value for $T^{\prime}$ in binary search fashion, otherwise try the next guess. If all guesses fail try the next larger value for $T^{\prime}$ in binary search fashion. 4. 4.

After use the saved guess and LP solutions to assign the corresponding items. First, we revert the scaling of the items and scale the solution and guess accordingly. Then, we place the large, vertical, and horizontal items inside the guess as described in Section 5.3. Afterward, place the small items inside the resulting boxes for small items as described in Section 5.4. Finally, we place the medium sized items as described in 5.5.

The step 1 takes $\mathcal{O}(n\log(1/\varepsilon)+1/\varepsilon\gamma)$ operations: The set of items needs to be enumerated once to find $T$ , i.e., its can be found in $\mathcal{O}(n)$ . The correct values for $\delta$ and $\mu$ can be found in $\mathcal{O}(n+1/\varepsilon\gamma)$ and the corresponding partition can be found in $\mathcal{O}(n)$ . The scaling and rounding of the item heights can be done in $\mathcal{O}(n)$ . Finally the rounding of the item widths can be done in $\mathcal{O}(n\log(1/\varepsilon))$ .

The binary search described in Step 2 can be done in $\mathcal{O}(\log(1/(\varepsilon\delta)))$ . For each of the values given by the binary search framework, there are at most $\mathcal{O}((\log(1/\delta)/\varepsilon)^{\mathcal{O}(1/(\varepsilon\delta^{2}))})$ possibilities to guess $\mathcal{W}_{B}$ , at most $(1/\varepsilon\delta)^{\mathcal{O}(1/(\varepsilon\delta^{2}))}$ possibilities to guess y-coordinates, and at most $(1/(\varepsilon\delta^{2}))^{\mathcal{O}(1/(\varepsilon\delta^{2}))}$ possibilities to guess the right permutation for boxes and large items. The resulting LP can be solved in $2^{\mathcal{O}(1/(\varepsilon^{2}\delta^{3}))}$ . Therefore the total running time of steps 2 and 3 can be summarized as

[TABLE]

In the final step, we place the original items inside the packing. The placement of large, vertical, and horizontal items can be done in $\mathcal{O}(n+1/(\varepsilon\delta^{3}))$ since there are at most $1/(\varepsilon\delta^{3})$ places for vertical and horizontal items. To place the small items, we use the NFDH algorithm and hence have a running time of at most $\mathcal{O}(1/(\varepsilon\delta^{3})+n+\log(n)/\varepsilon^{2})$ since the items have at most $\log(n)/\varepsilon^{2}$ sizes and are placed inside at most $\mathcal{O}(1/(\varepsilon\delta^{3}))$ boxes. The medium sized items can be placed in at most $\mathcal{O}(n+1/\varepsilon^{2})$ since they have at most $\mathcal{O}(1/\varepsilon^{2})$ (possible) different sizes. Hence the total running time of the algorithm is bounded by $\mathcal{O}(n\log(1/\varepsilon)+1/\varepsilon\gamma+2^{\mathcal{O}(1/\varepsilon^{2}\delta^{3})}+1/(\varepsilon\delta^{3})+n+\log(n)/\varepsilon^{2})\leq\mathcal{O}(n\log(1/\varepsilon)+\log(n)/\varepsilon^{2})+2^{1/(\varepsilon\gamma)^{3^{\mathcal{O}(1/(\varepsilon\gamma))}}}$ .

As a consequence, we end up with a running time of $\mathcal{O}(n\log(1/\varepsilon)+\log(n)/\varepsilon^{2})+2^{(1/\varepsilon)^{3^{\mathcal{O}(1/(\varepsilon))}}}$ for the AEPTAS and wehen using it as a subroutine for MSP for $N=3$ because we can choose $\gamma=1$ in these cases. On the other hand, when using this algorithm as a subroutine for MSP for $N=2$ , we end up with a running time of $\mathcal{O}(n\log(1/\varepsilon)+\log(n)/\varepsilon^{2})+2^{(1/\varepsilon)^{3^{\mathcal{O}(1/\varepsilon^{2})}}}$ because we have to choose $\gamma=\varepsilon$ in this case.

6 Conclusion

In this paper, we presented an algorithm for Multiple Cluster Scheduling (MCS) and Multiple Strip Packing (MSP) with best possible absolute approximation ratio of $2$ and best possible running time $\mathcal{O}(n)$ for the case $N\geq 3$ . Still open remains the question if for the case $N=2$ the running time of $\mathcal{O}(n\log(n))$ or $\mathcal{O}(n\log^{2}(n)/(\log(\log(n)))$ for MCS and MSP respectively can be improved to $\mathcal{O}(n)$ .

Furthermore, we presented a truly fast algorithm for Multiple Cluster Scheduling (MCS) with running time $\mathcal{O}(n\log(n))$ that does not have any hidden constants. Since the running time of the $\mathcal{O}(n)$ algorithm hides large constants, it would be interesting to improve the running time of the underlying $AEPTAS$ or even to find a faster asymptotic algorithm with approximation guarantee $(5/4)\mathrm{OPT}+p_{\max}$ .

Bibliography26

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Peter A. Beling and Nimrod Megiddo. Using fast matrix multiplication to find basic solutions. Theor. Comput. Sci. , 205(1-2):307–316, 1998. URL: https://doi.org/10.1016/S 0304-3975(98)00003-6 , doi:10.1016/S 0304-3975(98)00003-6 . · doi ↗
2[2] Marin Bougeret, Pierre-François Dutot, Klaus Jansen, Christina Otte, and Denis Trystram. Approximation algorithms for multiple strip packing. In Approximation and Online Algorithms, 7th International Workshop, WAOA 2009, Copenhagen, Denmark, September 10-11, 2009. Revised Papers , pages 37–48, 2009. URL: https://doi.org/10.1007/978-3-642-12450-1_4 , doi:10.1007/978-3-642-12450-1\_4 . · doi ↗
3[3] Marin Bougeret, Pierre-François Dutot, Klaus Jansen, Christina Otte, and Denis Trystram. Approximating the non-contiguous multiple organization packing problem. In Theoretical Computer Science - 6th IFIP TC 1/WG 2.2 International Conference, TCS 2010, Held as Part of WCC 2010, Brisbane, Australia, September 20-23, 2010. Proceedings , pages 316–327, 2010. URL: https://doi.org/10.1007/978-3-642-15240-5_23 , doi:10.1007/978-3-642-15240-5\_23 . · doi ↗
4[4] Marin Bougeret, Pierre-François Dutot, Klaus Jansen, Christina Otte, and Denis Trystram. A fast 5/2-approximation algorithm for hierarchical scheduling. In Euro-Par 2010 - Parallel Processing, 16th International Euro-Par Conference, Ischia, Italy, August 31 - September 3, 2010, Proceedings, Part I , pages 157–167, 2010. URL: https://doi.org/10.1007/978-3-642-15277-1_16 , doi:10.1007/978-3-642-15277-1\_16 . · doi ↗
5[5] Marin Bougeret, Pierre-François Dutot, Denis Trystram, Klaus Jansen, and Christina Robenek. Improved approximation algorithms for scheduling parallel jobs on identical clusters. Theor. Comput. Sci. , 600:70–85, 2015. URL: https://doi.org/10.1016/j.tcs.2015.07.003 , doi:10.1016/j.tcs.2015.07.003 . · doi ↗
6[6] Edward G. Coffman Jr., Michael R. Garey, David S. Johnson, and Robert Endre Tarjan. Performance bounds for level-oriented two-dimensional packing algorithms. SIAM Journal on Computing , 9(4):808–826, 1980. doi:10.1137/0209062 . · doi ↗
7[7] Wenceslas Fernandez de la Vega and George S. Lueker. Bin packing can be solved within 1+epsilon in linear time. Combinatorica , 1(4):349–355, 1981. URL: https://doi.org/10.1007/BF 02579456 , doi:10.1007/BF 02579456 . · doi ↗
8[8] Pierre-François Dutot, Klaus Jansen, Christina Robenek, and Denis Trystram. A (2 + ϵ italic-ϵ \epsilon )-approximation for scheduling parallel jobs in platforms. In Euro-Par 2013 Parallel Processing - 19th International Conference, Aachen, Germany, August 26-30, 2013. Proceedings , pages 78–89, 2013. URL: https://doi.org/10.1007/978-3-642-40047-6_11 , doi:10.1007/978-3-642-40047-6\_11 . · doi ↗

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Linear Time Algorithms for Multiple Cluster Scheduling and Multiple Strip Packing ††thanks: Research was supported by German Research Foundation (DFG) project JA 612 /20-1

Abstract

1 Introduction

Theorem 1**:**

Theorem 2**:**

Theorem 3**:**

Theorem 4**:**

Theorem 5**:**

Theorem 6**:**

1.1 Related Work

Organization of this Paper

2 Partitioning Technique

Lemma 1**:**

The case N>2N>2N>2

The case N=2N=2N=2

Corollary 1**:**

3 An AEPTAS for Parallel Task Scheduling

3.1 Simplify

Lemma 2** (See [14]):**

3.2 Large Jobs

3.3 Small Jobs

Lemma 3**:**

Proof.

Remark 1*.*

Proof.

Lemma 4**:**

Proof.

3.4 Medium Jobs

Lemma 5** (See [6]):**

3.5 Summary

4 A Faster Algorithm for a Practical Number of Jobs

Lemma 6**:**

Proof.

4.1 Proof of Theorem 6

5 An AEPTAS for Strip Packing

5.1 Simplify

Lemma 7**:**

Proof.

Lemma 8** (See [14]):**

5.2 Boxes for horizontal rectangles

Lemma 9**:**

Proof.

Lemma 10**:**

Proof.

Lemma 11**:**

Proof.

5.3 Positioning containers as well as large and vertical rectangles

5.4 Placing the Small Items

Lemma 12**:**

Proof.

5.5 Packing medium sized items

5.6 Summary of the algorithm

6 Conclusion

Theorem 1:

Theorem 2:

Theorem 3:

Theorem 4:

Theorem 5:

Theorem 6:

Lemma 1:

The case $N>2$

The case $N=2$

Corollary 1:

Lemma 2 (See [14]):

Lemma 3:

*Remark 1**.*

Lemma 4:

Lemma 5 (See [6]):

Lemma 6:

Lemma 7:

Lemma 8 (See [14]):

Lemma 9:

Lemma 10:

Lemma 11:

Lemma 12: