HEATS: Heterogeneity- and Energy-Aware Task-based Scheduling
Isabelly Rocha, Christian G\"ottel, Pascal Felber, Marcelo Pasin,, Romain Rouvoy, Valerio Schiavoni

TL;DR
HEATS is a novel energy-aware task scheduling system for heterogeneous cloud clusters that learns hardware performance and energy profiles to optimize container deployment, reducing energy consumption with minimal impact on runtime.
Contribution
This work introduces HEATS, a new energy-aware, task-based scheduler integrated into Kubernetes that exploits hardware heterogeneity and energy profiles for improved efficiency.
Findings
Up to 8.5% energy savings achieved.
Runtime impact limited to 7%.
Effective in heterogeneous cloud environments.
Abstract
Cloud providers usually offer diverse types of hardware for their users. Customers exploit this option to deploy cloud instances featuring GPUs, FPGAs, architectures other than x86 (e.g., ARM, IBM Power8), or featuring certain specific extensions (e.g, Intel SGX). We consider in this work the instances used by customers to deploy containers, nowadays the de facto standard for micro-services, or to execute computing tasks. In doing so, the underlying container orchestrator (e.g., Kubernetes) should be designed so as to take into account and exploit this hardware diversity. In addition, besides the feature range provided by different machines, there is an often overlooked diversity in the energy requirements introduced by hardware heterogeneity, which is simply ignored by default container orchestrator's placement strategies. We introduce HEATS, a new task-oriented and energy-aware…
| Arch. | Cores | Frequency | TDP | Mem. | |
|---|---|---|---|---|---|
| ARM Cortex-A53 | big.LITTLE | 4 | 5 | ||
| AMD Epyc 7281 | amd64 | 32 | 155 | ||
| Intel Xeon E3-1270 v6 | x86 | 4 | 72 | ||
| Intel Xeon E5-2683 v4 | x86 | 32 | 120 |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Heats: Heterogeneity- and Energy-Aware
Task-based Scheduling
Isabelly Rocha1, Christian Göttel1, Pascal Felber1, Marcelo Pasin1, Romain Rouvoy2 and Valerio Schiavoni1
1University of Neuchâtel, Switzerland, [email protected]
2Inria Lille – Nord Europe, [email protected]
Abstract
Cloud providers usually offer diverse types of hardware for their users. Customers exploit this option to deploy cloud instances featuring GPUs, FPGAs, architectures other than x86 (\eg, ARM, IBM Power8), or featuring certain specific extensions (\eg, Intel SGX). We consider in this work the instances used by customers to deploy containers, nowadays the de facto standard for micro-services, or to execute computing tasks. In doing so, the underlying container orchestrator (\eg, Kubernetes) should be designed so as to take into account and exploit this hardware diversity. In addition, besides the feature range provided by different machines, there is an often overlooked diversity in the energy requirements introduced by hardware heterogeneity, which is simply ignored by default container orchestrator’s placement strategies. We introduce Heats, a new task-oriented and energy-aware orchestrator for containerized applications targeting heterogeneous clusters. Heats allows customers to trade performance vs. energy requirements. Our system first learns the performance and energy features of the physical hosts. Then, it monitors the execution of tasks on the hosts and opportunistically migrates them onto different cluster nodes to match the customer-required deployment trade-offs. Our Heats prototype is implemented within Google’s Kubernetes. The evaluation with synthetic traces in our cluster indicate that our approach can yield considerable energy savings (up to 8.5%) and only marginally affect the overall runtime of deployed tasks (by at most 7%). Heats is released as open-source.
I Introduction
Cloud providers nowadays provide access to a wide range of heterogeneous resources to their customers. Hence, the diversity of resources encourages application developers and deployers to program for, and offload even more workloads to, the cloud. There, specialized hardware (\eg, GPU, FPGA) can be rented for limited time, reducing upfront costs and allowing for better scalability.
To illustrate this diversity, Table I shows an overview of the commercial offering of heterogeneous resources at 6 major public cloud providers. For each, we list the CPU architecture (x86, IBM Power, ARM), and the availability of GPU, FPGA or ASIC units. We further indicate if such resources can be accessed using bare metal (BM) or virtual machine (VM) instances. Additionally, we show whether the operating frequency of the processor can be dynamically scaled up or down, a feature that could be leveraged to reduce the generated energy costs of a node. This quick survey reveals that it is possible to combine a very heterogeneous ensemble of machines, each offering specific hardware feature sets. This capability represents the ideal case for applications that have different resource demands, as it is sometimes better to migrate the execution from a machine of one kind to a different one, in order to better match the expected trade-off requested by the customer. Resource diversity can also be exploited to deploy applications and workloads of different nature.
Containers (e.g., Docker [1]) have recently become the de facto standard to deploy applications on the cloud, executed by specialized container orchestrators, such as Google’s Kubernetes [2]. Current policies of container orchestrators often ignore the diversity found in hardware, leading to subtle trade-off between energy and performance. To better understand this aspect and motivate our work, we conducted a simple experimental study (Figure 1). We set up an on-premise cluster composed of 4 different types of nodes: three server-grade machines (two Intel and one AMD) and one ARM-based low-energy device (a Raspberry Pi). Each machine has different hardware characteristics (e.g., number and type of CPU cores, memory and operating frequency) and energy requirements.
While these properties are known by the cluster owner at deployment time, the energy requirements as well as the raw computing power of the machines for a specific workload are not. Typically, customers are only able to evaluate those at runtime, while executing their applications. Because of that, they can face unexpected costs or missed deadlines upon completion of tasks.
In our scenario, we developed and deployed a simple task implementing the popular k-means clustering algorithm. At first, the task is deployed on the AMD node (Figure 1, top-most plot). Given our cluster settings, with the default Kubernetes scheduler, we observe the deployment on the machine with more cores and memory. When remaining in the same host, the task completes after 69 seconds, consuming 1,047 Joules.
Next, we consider customers wishing to compromise the running time for energy costs. This requires a dynamic container rescheduling policy that can migrate a task into the ARM node after it has made some progress but before completion (e.g., 30 seconds after startup, as highlighted by the vertical line in each plot). In doing so, the net energy savings are important (up to 34%) but at the cost of a increase of the task’s running time.
Such trade-offs are often desirable (especially for deadline-free, low-priority workloads), but difficult to achieve in practice. As a matter of fact, a task (or container) orchestrator would need to be aware of several factors and able to: (1) know or learn the characteristics of the underlying cluster and its hardware resources; (2) understand the trade-off that a customer is willing to accept; (3) observe if a better placement opportunity exists for the currently executing tasks; and (4) migrate the task accordingly. In this paper we introduce Heats, a scheduling system geared toward heterogeneous clusters that achieves these goals.
The key mechanism used by Heats consists in offering to clients the ability to indicate, at deployment time, their intended energy-performance ratio (the acceptable trade-off), in the form of an H value. Thereafter, Heats continuously matches the demanded H value to the available resources, considering the resources themselves, pre-built performance and energy models, and the possibly conflicting requirements from other concurrent tasks. As shown in Section V, this has consequences on the task throughput of the underlying cluster.
In summary, our contributions are as follows:
- •
We present a probing framework, which we use to build a model of the underlying hardware resources;
- •
We design and implement Heats, a new container scheduler system that, by leveraging the underlying model, places application tasks onto the best matching nodes among the currently available hardware resources for the intended energy/performance ratio;
- •
We thoroughly evaluate our prototype by means of an in-depth experimental evaluation.
The rest of this paper is organized as follows. The rational of the Heats scheduling policy is presented in Section II. We describe the architecture of Heats in Section III. We then provide insights on the implementation of the Heats prototype in Section IV. We extensively evaluate the performance of our prototype in Section V, where we also detail the synthetic traces used to show the benefits of Heats. We survey related work in Section VI, before concluding in Section VII.
II Heats Scheduling Policy
In this section we describe the scheduling algorithm implemented by Heats. Algorithm 1 describes the main functions, which we detail next.
The resource requirements of a task, as for instance memory or number of cores, are specified before submission. Resource availability in the hardware nodes is monitored (in our practical experiment we used Heapster [9]) and reported to Heats monitoring module. Then, Heats computes suitable nodes for execution considering the resource requirements for all previously running tasks as well as the availability reported by the underlying system. Next, the algorithm executes a profiling phase and estimates the performance and energy requirements of the given task in each of the previously computed available nodes. Finally, the scheduling module relies on these estimations to compute scores for each node, to be weighted by the energy/performance ratio defined by the client ( and in Algorithm 1). The best fitting node is chosen to deploy the given task.
In summary, the Heats strategy will attempt to place tasks on the most efficient host that still has enough resources to run the given task. We define most efficient as the closest match to the demanded energy/performance trade-off. However, the ideal node for a task will not always be available at scheduling time. Therefore, we recompute our scheduling decision every now and then. When a better fit than the current host of a task is found, the scheduler performs a migration.
The scheduling phase is triggered for the queue of all pending tasks. The algorithm starts by finding the best fit for the next task (lines 4 and 11–15). It identifies its resource requirements, \eg, CPU and memory, as well as the available nodes for these resources (lines 12–13). Then, it computes the score for each of the nodes (lines 16–22). The model (described in 2) is used for the profiling of nodes (line 18). The scores are computed by normalizing the predictions and adding the demanded weights (line 20). Every seconds the rescheduling phase is triggered for the set of all running tasks. If the re-execution of the best fit decides on a different target node, the task is migrated to the new host and removed from the current one (lines 9–10). We show in our evaluation that , for our specific workload and cluster settings, has minimal impacts on the runtime or the energy efficiency of Heats. We will study this further in future work.
III Architecture
The architecture of Heats is composed of several interacting components. Figure 2 depicts these interactions. We describe each of them in details in the remainder of this section.
Modeling. The modeling component executes two main operations, namely probing and learning, descibed below.
The probing phase discovers the properties and capabilities of the cluster, \ie, the machines composing it. This probing phase is executed upon the initial setup of Heats, as well as for every major hardware reconfiguration (such as the integration of new machine types in the cluster pool). We implemented this probing so that it also takes care of exploring the performance of the nodes by scaling up and down the frequency of the CPUs [10]. We report that, in a typical setup, to produce an accurate model of a new machine usually requires a few hours. Figure 3 shows the results of possible characterizations that this phase can produce, when applied to the machines of our cluster. In particular, it outputs the runtime and energy requirements of two different families of probing tasks. The energy requirements reported here do not consider the idle state of the machines but of the task itself only. In this way we can better understand the tasks energy requirements for the differnt types of hardware given. We show the results with two of such CPU-bound tasks: the aforementioned k-means clustering algorithm, as well as a typical matrix multiplication operation. For both types of probing tasks, we observe that the energy requirements can be reduced on a given performance cost for almost every machine type. The framework further executes these probing tasks by frequency scaling of the underlying CPUs. We achieve this by leveraging two different Linux’s CPU governors [11], powersave and performance, respectively running the CPU at the minimum and maximum frequency. We can observe that, within the same machine type, the energy and performance are largely affected by scaling the CPU frequency. The output of this phase is used next.
The data collected by the probing phase is used to train a multiple linear regression model [12]. Given a task and its CPU and memory requirements, a fitted regression model is used to predict its energy and performance for each machine type available in the cluster. We did a preliminary analysis of different machine learning techniques and, for the workload used, TensorFlow[13] presented better results. We plan a more extensive comparative evaluation of different machine learning algorithms for further work. While the probing component constantly records new data, Heats uses it to refine the predictions at a given frequency. In our evaluation, we execute the learning phase every 24 hours.
Monitoring. Kubernetes is equipped with several tools to monitor resources: cAdvisor [14] has been partially integrated into Kubernetes’ node agent kubelet [15], and it is capable of measuring resources used by containers. Heapster [16] exploits the measurements from cAdvisor, aggregates them and provides means to analyze and monitor the state of the Kubernetes cluster using Grafana [17]. Furthermore, Heapster allows us to store the aggregated data in InfluxDB [18], a time-series database that supports SQL-like queries to retrieve historical resource measurements of the Kubernetes cluster. Future versions of Heats will support metrics-server [19].
In order to decide whether a task has to be migrated from one node to a different heterogeneous node, the Heats scheduler has to be able to rely on a fine grained resource monitoring system. Despite the potential capability to gather resource measurements every , we found out that Heapster cannot reliably deliver these resource measurements at a fixed rate. A custom resource measurement system was therefore implemented and installed on the Kubernetes nodes, which queries every second the local Docker instances for up-to-date resources used by the containers. These resource measurements can then be aggregated and used by the Heats scheduler to provide the needed support for migrating tasks.
The monitoring component is responsible for actively gathering information regarding the resources currently being consumed at each node by the tasks in execution. This information is required by the scheduling component (described below) to know which node has sufficient resources for the pending tasks. Heats leverages some default software probes from Heapster to continuously fetch the hardware resources available on any given node.
Additionally, to access in real-time the current power and energy levels of a node, we assume the availability of hardware monitors that are remotely accessible. We experimented with two different types of energy monitors, one for server-grade machines and one for low-energy profiles (see Section V).
Scheduling. Finally, the scheduling component is in charge of orchestrating the inputs received by the modeling and monitoring components. To that end, it first ensures that a prediction for the resources used by the task on the different set of machines is completed. Then, it combines this prediction with the energy and performance trade-offs, as defined by the end-user, to decide on the best fitting node. Periodically, the scheduling component reconsiders its past decisions: when a better fitting node is found, a migration decision is taken and the corresponding task is moved to the target node.
IV Implementation
We base our implementation on Kubernetes (v1.8), itself implemented in Go [20]. Custom schedulers can however be implemented in any programming language and connected to the main orchestrator engine via the Kubernetes Scheduler API [21]. Heats is implemented in Python (v3.6.3) and leverages the Kubernetes Python Client [22], a client library for the Kubernetes API [23]. The modeling component (Section 2) leverages the Python bindings for TensorFlow (v1.11). Heats is released as open-source and is readily available at https://github.com/legato-project/heats-scheduler.
V Evaluation
This section presents the experimental evaluation of our Heats prototype. We first describe the experimental settings. Then, we describe the synthetic trace used to compare Heats against the default k8s settings. We compare both schedulers in terms of energy and resource utilization. We analyse how the user demands (energy/performance ratios) affect the observed performanes. Finally, we look at the impact of the rescheduling frequency on the overall job runtime.
Evaluation settings. We deploy and conduct our experiments over a cluster composed of 4 different types of machines (see Table II). Our cluster is composed of 9 machines, where one is the Kubernetes master, orchestrating the deployments and the remaining nodes are workers executing the tasks. The 8 worker nodes consist of one AMD, 3 Intel and 4 ARM machines. The energy consumption is measured using a LINDY iPower Control 2x6M power distribution unit (PDU) for the server type machines and PowerSpy [24] devices for the three Raspberry Pi. The PDU records up-to-date measurements for the active power at a resolution of and with a precision of 1.5%. We query it up to every second via HTTP.
Synthetic trace. We use a synthetic trace to evaluate the gains and trade-offs of our system. Figure 4 shows the workload injected by this trace. We use it to deploy multithreaded tasks executing an iterative implementation of the k-means algorithm in the C programming language. The program, shipped as statically linked binary for Alpine Linux [25], executes over a predefined dataset of data points along dimensions. Once deployed, the tasks will compute clusters by splitting the dataset into blocks processed by two worker threads for a specified maximum number of iterations, chosen randomly in the range of . The result is stored as file inside the container’s image. In total, k-means jobs are deployed following four bursts over minutes, executed randomly within a timeframe of seconds. The same sequence of pseudo-random numbers is ensured upon every run of a trace by using a fixed random seed.
Kubernetes vs. Heats. First, we compare the CPU load induced on the cluster by Heats against the default scheduling policy of Kubernetes. Figure 6 shows these results. We observe how the load patterns are very similar and closely follow the arrival pattern of the tasks in the trace Figure 4). We conclude that the Heats scheduler does not deteriorate the lifetime of the processors by artificially stressing them. Next, we look at the memory usage across the cluster. Figure 6 shows this for two different Heats configurations, one for performance ( = 0, = 1), the other for energy-efficiency ( = 1, = 0). The memory load of two schedulers induce similar patterns. We compare the energy efficiency of the default Kubernetes scheduler against Heats. Figure 5 presents the total runtime and the cumulative energy consumption of the cluster throughout the execution of the trace, including their idle state requirements. We compare five different approaches: (1) the default scheduler (k8s), (2) Heats configured to deploy tasks on the fastest possible machines, ignoring any energy concerns (e0p1), (3) Heats trying to be as energy-efficient as possible. Moreover, for the sake of comparison, we include the results achieved by (4) a fixed H value chosen out of our practical experience (rand, for = 0.618, = 0.382) and (5) other variations of the H-value. When compared to the default scheduler (k8s), (2) performs better on an energy cost of 1.5% while (3) performs worse but presents 7.1% of energy savings. Besides, when compared to each other, (2) performs better while (3) is more energy efficient. Finally, for approach (4) we can observe that the runtime as well as the energy consumption are in between the observations for approach (2) and (3). Therefore, we can conclude that our observations follow the expected behaviour.
Energy vs. performance weights. The value chosen for the H parameter is of paramount importance, especially when considering the resulting energy costs and impact on the overall runtime of the jobs. To better understand this aspect, we choose 6 different configurations (from 0 to 1, by increments of 0.2), for different energy/performance ratios, 0 being the least and 1 the most energy-efficient versions. We compare the achieved results with a Heats configuration that randomly select the value of H, mimicking a customer with no particular requirements. Figure 5 shows our results. For each configuration, we show the cumulative energy costs (in kJ) and the achieved runtime, respectively on the left and right vertical bars. We observe how the configurations achieve similar results, with a sensible deviation only with the less energy efficient variant. While these results require further investigations, we believe them to be of practical interest for end-users. We intend to confirm these by evaluating the same configurations on real-world traces, where the variations of the H parameter might have more impact.
VI Related Work
There is a large body of literature on scheduling, deployment and migration policies, mainly driven by the strong momentum on green computing [26]. Here we focus on energy-related scheduling policies in a container/task-based deployment setting, leaving out research on optimization problems specifically geared toward reducing energy costs [27].
Wang \etal [28] proposes two polynomial-time algorithms, one for energy-aware heterogeneous data allocation and another for task ratio greedy algorithm. The algorithms schedule real-time constrainted tasks of applications on heterogeneous multiprocessor systems using integer linear programming. Simulations compare these two algorithms against a greedy algorithm on two heterogeneous multiprocessor systems. Heats does not support real-time constrains. On the other hand, we fully implemented Heats and perform deployments of real code as well leveraging software and hardware monitors to gather energy-related metrics.
GenPack [29] proposes an energy-saving mechanism inspired by the JVM’s garbage-collectors. It migrates containers from young (unstable) to old (stable) generations of machines. Its architecture, based on Docker Swarm, is similar to the one built for Heats. However, GenPack ignores the user-demanded trade-offs of the jobs, and containers are migrated across the cluster only by observing the stability of the jobs.
Partial Optimal Slacking (POS) [30] is an energy-efficient scheduling approach based on the concept of task slacking with the objective to lower the processing speed of a processor executing a task without affecting other tasks. POS achieves this by using DVFS techniques [31]. The frequency scaling support in Heats is currently exploited during the modeling phase. We intend to further leverage this feature, for instance while migrating tasks for isolation within the processor, in case two unlike tasks are running on the same core but require different frequency and voltage levels.
VII Conclusion and Future Work
We presented Heats, a novel task-based scheduling system for heterogeneous clusters. Heats learns about the properties of the machines in the cluster, schedules and possibly migrates tasks to the best-fitting node currently available. Our experimental evaluation reveals that Heats can yield considerable energy savings depending on the type of resources at hand, the workload and the desired energy/performance ratio.
We envision to extend this work along the following directions. First, we will explore how per-core pinning and per-core frequency scaling can further improve the achievable energy savings. This will have side-effects in the learning (\ie, probing) phase, which will need to be extended. Second, we will extend the design and implementation of our prototype to account for migrations between heterogeneous devices, i.e., from an x86 processor onto a GPU. This will require producing binaries targeting different architectures and managed by the same scheduler. Finally, we will evaluate Heats against real-world traces, e.g., Borg [32] and Azure [33].
Acknowledgments
The research leading to these results has received funding from the European Union’s Horizon 2020 research and innovation programme under the LEGaTO Project (legato-project.eu), grant agreement No 780681.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Dirk Merkel “Docker: lightweight linux containers for consistent development and deployment” In Linux Journal 2014.239 Belltown Media, 2014, pp. 2
- 2[2] Medel et al. “Modelling performance & resource management in kubernetes” In UCC , 2016, pp. 257–262 IEEE
- 3[3] Amazon Web Services, Inc. “Amazon EC 2 Instance Types”, Available: https://aws.amazon.com/ec 2/instance-types , 2018
- 4[4] Microsoft Corporation “Pricing - Linux Virtual Machines”, Available: https://azure.microsoft.com/en-us/pricing/details/virtual-machines/linux , 2018
- 5[5] Google LLC “Google Compute Engine Pricing”, Available: https://cloud.google.com/compute/pricing , 2018
- 6[6] IBM “Bare metal servers”, Available: https://www.ibm.com/cloud/bare-metal-servers , 2018
- 7[7] Oracle Corporation “Bare Metal Cloud Computing”, Available: https://cloud.oracle.com/compute/bare-metal/features , 2018
- 8[8] Scaleway “Bare Metal SSD Cloud Servers”, Available: https://www.scaleway.com/baremetal-cloud-servers , 2018
