Choosing an effective setup for stream processing

Federico Ruilova; Aleksandar Yonchev

arXiv:2302.14463·cs.DC·March 1, 2023

Choosing an effective setup for stream processing

Federico Ruilova, Aleksandar Yonchev

PDF

Open Access 1 Repo

TL;DR

This study compares edge computing and public cloud for stream data processing in IoT manufacturing, finding edge computing more cost-effective and slightly better in latency and resource utilization.

Contribution

It provides an empirical evaluation of edge versus cloud computing for IoT stream processing, highlighting cost and performance benefits of edge solutions.

Findings

01

Edge computing achieves slightly higher throughput.

02

Edge node uses less resources than cloud node.

03

Edge setup is more cost-efficient with IaaS providers.

Abstract

This project aims to study the feasibility and cost-effectiveness of using edge computing for stream data processing in the context of Internet of Things (IoT) in manufacturing in Europe. Two scenarios were considered: using edge computing to reduce latency and using a popular public cloud provider. Both scenarios demonstrated high throughput, with the edge computing scenario slightly outperforming the public cloud scenario. The impact on resource utilization was also measured, with the edge node showing slightly lower resource usage than the cloud node. The experiment concluded that running the system at the edge is more cost-efficient, but only using any Infrastructure as a Service (IaaS) provider acting as the infrastructure provider. IaaS providers will be crucial in offering edge solutions and identifying geographical areas where regional data centers could be used as points of…

Tables4

Table 1. Table 1: Throughput measured on each scenario

Scenario	Throughput (Mbps)
IoT Source to Edge Node	3.03
IoT Source to Cloud	3.06

Table 2. Table 2: Comparison of system performance, IoT source in Scenarios 1 and 2

Scenario	CPU Usage	Memory Usage
IoT Source (Scenario 1)	41%	20 MB
IoT Source (Scenario 2)	41.34%	20.09 MB

Table 3. Table 3: Comparison of system performance, Edge vs. Cloud

Node	CPU Usage	Memory Usage
Edge Node (Scenario 1)	42.67%	21.41 MB
Cloud Node (Scenario 2)	77.87%	16.95 MB

Table 4. Table 4: Comparison of the costs for the different nodes

Computational Node	Cost (USD/mo)	Cost (SEK/mo)	Cost (EUR/mo)
IoT Source	-	215.60	19.18
Edge-Node	-	402.88	35.94
Cloud-Node	44.15	-	47.42

Equations4

T h r o ug h p u t (M b p s) = (T o t a l d a t a t r an s f er r e d (M B) / T o t a l t im e e l a p se d (seco n d s)) * 8 bi t s / b y t e =

T h r o ug h p u t (M b p s) = (T o t a l d a t a t r an s f er r e d (M B) / T o t a l t im e e l a p se d (seco n d s)) * 8 bi t s / b y t e =

(378.0 M B /1000 seco n d s) * 8 bi t s / b y t e = 3.03 M b p s

(378.0 M B /1000 seco n d s) * 8 bi t s / b y t e = 3.03 M b p s

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

fredocr/mosquitto-edge-cloud-experiment
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIoT and Edge/Fog Computing

Full text

Choosing an effective setup for stream processing

Federico Ruilova, Aleksandar Yonchev

[email protected] | yonchev @kth.se

Abstract

This project aims to study the feasibility and cost-effectiveness of using edge computing for stream data processing in the context of Internet of Things (IoT) in manufacturing in Europe. Two scenarios were considered: using edge computing to reduce latency and using a popular public cloud provider. Both scenarios demonstrated high throughput, with the edge computing scenario slightly outperforming the public cloud scenario. The impact on resource utilization was also measured, with the edge node showing slightly lower resource usage than the cloud node. The experiment concluded that running the system at the edge is more cost-efficient, but only using any Infrastructure as a Service (IaaS) provider acting as the infrastructure provider. IaaS providers will be crucial in offering edge solutions and identifying geographical areas where regional data centers could be used as points of presence for low-latency applications.

1 List of Acronyms and Abbreviations

Acronyms

AWS Amazon Web Services GCP Google Cloud Platform IaaS Infrastructure as a Service IIoT Industrial Internet of Things IoT Internet of Things KB Kilobytes MQTT Message Queuing Telemetry Transport ms milliseconds VM Virtual Machine

2 Introduction

2.1 Background

The number of devices connected to the Internet has been growing exponentially in the last decades [1]. Novelties in communication technologies such as the evolution of cellular networks into 5th generation networks with cloud computing, edge computing and a widely deployed network of high-speed fiber-optic infrastructure worldwide have opened many doors for new technologies to emerge.

In its essence, edge computing or fog computing (the terms are used interchangeably [2]), serves the mission of bringing the cloud computing services closer to the client, by offloading some calculations to a nearby computing node. It is widely recommended for low latency and fast processing [3], [4], [5]. However, there could be different approaches in terms of designing the most appropriate solution. There are three main architectures that are commonly used- a single edge-computing node, hybrid edge-cloud scenario and cloud-computing only.

2.2 Literature review

In [5] an experiment is used where sample data is sent to the cloud, either directly or through an edge gateway. The two approaches are compared based on metrics like latency, throughput and available bandwidth. Different benefits of using edge computing are outlined at the end of the article. However, this experiment does not take into consideration the operating costs, neither the geographical location of the endpoints. In [6] a similar experiment is run and there the latency, scalability and performance are compared between an edge-only, edge-cloud and cloud-only architectures. Similarly, strategies for selecting the best infrastructure by using an organization and management goal-based approach are recommended in [7]. Furthermore, there is a conducted research where the cloud-only scenario is used to evaluate the performance of transferring protocols for long distances [8].

The costs for using edge and cloud resources have been studied in other research projects. A detailed cost analysis on edge computing is performed in [9]. Similarly, in [10] a cost analysis is conducted over the usage of cloud computing resources. However, in [11] the Amazon EC2 service is compared to running in-house facilities for high-performance computing. We will refer to these resources when conducting the cost analysis.

The novelty of our approach comes from the cost analysis of running the different computing resources combined with the consideration of the geographical distances between the source of data and the computational resources in Europe. This project can serve as the basis for a broader research project which examines the demand for edge computing in relation to the requirements of the use case, the geographical distances (regardless of the continent) and the costs.

2.3 Problem statement

The continent of Europe is the target of this research project because of its specific economical and geographical characteristics. Geographically speaking, Europe is the second-smallest continent on Earth, with a size of 10.18 km2 [12]. The distances between the extreme points in Europe are illustrated on Figures 2 and 2. second-smallest

On the economic side, Europe is a very industrialized and technologically developed area. Furthermore, all the major public cloud providers have one or more points of presence on the continent. Figure 3 shows the points of presence of Azure, Amazon Web Services (AWS), Google Cloud Platform (GCP), IBM and Oracle in Europe. It can be clearly seen that there is a high density of public cloud data centers in the western, central and northern parts of Europe. However, there are no points of presence in Eastern Europe. This means that in Europe there still could be distances of more than 1000 kilometers between the client and the cloud provider.

According to Statista, there are 2904 data centers in total in Europe [13]. This means that there is a large amount of regional data centers which are not part of the big public cloud providers.

To summarize, three main observations are made: (a) cloud providers cover geographically main areas, (b) the major public cloud providers lack presence in Eastern Europe, (c) there is a high density of data centers within Europe, which do not belong to the popular public cloud providers.

As mentioned earlier, edge computing brings cloud services closer to the client. This is done to ensure ultra-low latencies of less than 10 milliseconds (ms) for the client. However, we will explore the applicability of edge computing when the distance between the client and the cloud is short. This leads to the main research question—”Is edge computing applicable in Europe?”.

Our hypothesis is that edge computing is not cost-efficient in Europe because of the short distances and easy access to public cloud services.

3 Method(s)

To answer the research question, a quantitative experiment was conducted. In this experiment, two IoT environments were simulated – an edge and a cloud environment.

To build these environments, the authors used three Virtual Machines (VMs)—one serving as the source of sensor data, another one serving as the edge node and a third node running in the cloud.

In Industrial Internet of Thingss (IIoTs), it is common to have an edge processing node on-premises or in a nearby location. For the edge scenario, both machines were placed within the same data center, but they were not in the same network segment and had to use the public internet to communicate. The IaaS provider for this setup is Glesys, a Swedish provider with a point of presence in Stockholm.

In 2.3 it was mentioned that the major public cloud providers are still not present in Eastern Europe. For that reason, a more distant point of presence of GCP was selected for the cloud scenario, to simulate a realistic scenario where the client is not in the center of Europe. In this scenario, the target node ran on the GCP, specifically in their point of presence in St. Ghislain, Belgium. The distance between the source and the cloud nodes is 1,338 kilometers.

Through scripting, we could observe and gather information on latency, throughput, and resource utilization in each scenario.

3.1 Technical specifications

The “IoT Source” server runs on the KVM virtualization platform and has 2 CPU cores, 2048 MB of memory, and 20 GB of storage. It is also running the Ubuntu 22.04 LTS operating system.

The “Edge-Node” server runs on the VMWare virtualization platform and has 2 CPU cores, 2048 MB of memory, and 30 GB of storage, and it is running the Ubuntu 22.04 LTS operating system.

The “Cloud-Node” is a virtual machine with 2 virtual CPUs, 8 GB of memory and runs the Ubuntu 22.04 LTS operating system. It has a machine type of “n2-standard-2” and costs USD62.39 monthly. Google Compute Engine uses the KVM hypervisor to create and run VMs.

In our experiment we used the Eclipse Mosquitto message broker which implements the Message Queuing Telemetry Transport (MQTT) protocol for stream processing. Mosquitto provides a light-weight communication between the devices using a publishing/subscribe messaging model and is a perfect match for IIoTs scenarios [14] [15]. It also provides the ability to subscribe to multiple topics, authentication, encrypted communication and more. Furthermore, we used the Eclipse Paho client libraries for Mosquitto because the broker has only a command-line interface. Paho gave us the ability to manipulate the flow of MQTT messages using Python code [16]. For this experiment, we created 2 python scripts—one running in the data source and the second one running in the target, which is either tpublishing or the cloud node. The first script generates exemplary sensor data and sends it to the target, while the second captures the incoming messages, measures the latency for each message, and outputs the throughput and resource utilization at the end of the data transfer.

The remote brokers at the edge and the cloud subscribed to three topics called “sensor1”, “sensor2” and “sensor3”. We ran the scripts with different configurations to measure the different variables. Furthermore, we used a standard bandwidth configuration in both scenarios with a 100 Mbps connection.

First, we ran the experiment for 900 seconds (or 15 minutes) with a 1-second interval between each message transmission. The messages were randomly assigned to each sensor. This way, we measured the latency between the communicating sides.

Second, we simulated 1000 data transfer iterations at the IoT source and on each iteration 200 messages were published to the target. The experiment ran for 1000 seconds, and each message was 10Kilobytes (KB) in size. In this scenario, the interval between the messages was set to 0, but there was 1 second interval after each iteration.

For analyzing the data, we used Python because it is well suited for data analysis and scripting. The libraries we used are Matplotlib and NumPy to create the different plots (e.g. Figure 4) and find patterns in the data. Furthermore, for analyzing the distances and the position of the public cloud data centers, we used JavaScript with the open-source library Leaflet, which is perfect for creating interactive maps [17]. Leaflet uses the open-source geographic database OpenStreetMap [18].

4 Results

4.1 Latency

The normal distribution of the latencies measured through the experiment is visualized on Figure 4. In the first scenario, the median of the values is 0.87 ms. The maximum latency recorded is 1.65 ms. In the other scenario, however, the median is 13.19 ms. The maximum recorded value is 14.94 ms.

4.2 Throughput and Resource utilization

For Scenario 1 (IoT Source to Edge Node), the authors measured the resource utilization impact on the IoT Source with an average CPU usage of 40.24 percent, an average memory usage of 18.87 MB, and an average running time of 1094 seconds2. The performance impact on the Edge node was also measured, with an average CPU usage of 42.67 percent, an average memory usage of 21.41 MB, and an average running time of 1002 seconds3.

The system’s Throughput in Scenario 1 (IoT Source to Edge Node) was approximately 3780 messages per second, equivalent to approximately 3.03 Mbps 1. The authors calculated the throughput by dividing the total data transferred (378.0 MB) by the total time elapsed (1000 seconds) and multiplying the result by 8 to convert it to bits per second:

[TABLE]

For Scenario 2 (IoT Source to Cloud), the performance impact on the IoT Source had an average CPU usage of 41.34 percent, an average memory usage of 20.09 MB, and an average running time of 1001 seconds 2. The performance impact on the Cloud node was also measured, with an average CPU usage of 77.87 percent, an average memory usage of 16.95 MB, and an average running time of 1032 seconds 3.

The system’s Throughput in Scenario 2 (IoT Source to Cloud) was approximately 3820 messages per second, equivalent to approximately 3.06 Mbps 1. The authors calculated the throughput with the same method mentioned in 1 and 2.

4.3 Cost

As presented in 4 “IoT Source” costs 215,60 SEK/mo, or approximately €19.18 (using the same exchange rate). “Edge-Node” costs 402,88 SEK/mo, or approximately €35.94 (using an exchange rate of 1 SEK = €0.0886354 EUR). According to the provider, the price difference lies in the virtualization platform VMWare compared to KVM.

The “Cloud-Node” cost was €44.15 (using an exchange rate of 1 EUR = €1.06449 USD) The ”Cloud-Node” cost was €44.15 (using an exchange rate of 1 EUR = €1.06449 USD).

5 Discussion

Looking at the measured latencies, it is realistic to experience such low values (median of 0.87 ms) for the source-edge environment, as both Virtual Machines (VMs) are positioned in the same data center in Stockholm. In the other scenario, the median latency measured is 13.19 ms. This is a reasonable value, considering the 1,338 km distance between the hosts.

While there are different scenarios in IoT streaming platforms or other stream processing use cases, we have identified an important player in the cloud computing ecosystem, the IaaS providers, which do not own the infrastructure but offer “public-cloud” alike products such as VMs and make use of regional data centers to operate and offer their services.

We have also noticed by conducting the experiment that it is more cost-efficient to run the infrastructure with an IaaS provider that uses KVM virtualization. There could be different providers offering their clouds on the top of similar virtualization platforms as the ones used by the major Public cloud providers.

5.1 Conclusion

Based on the observations, both scenarios demonstrated high throughput, with a slightly higher throughput in Scenario 2 (IoT Source to Cloud).

These high throughput scenarios are likely due to the efficient design and implementation of the system, as well as favorable network conditions and workload.

In both scenarios, the performance impact on the IoT Source was relatively similar, with an average CPU usage of around 41 percent and an average memory usage of around 20 MB.

The performance impact on the Edge node and Cloud node was also measured, with the Edge node having slightly lower resource utilization compared to the Cloud node. It is important to note that the throughput of a system can be affected by various factors, including the hardware and software used, network conditions, and workload.

We could replicate a low-latency setup not running on the cloud with a nearby presence without incurring acquisition costs. We achieved it by using the services of a IaaS provider with the desired point of presence.

Our hypothesis was proven wrong. Under the experiment’s circumstances, we conclude that running the system at the edge is more cost-efficient but without incurring the cost of acquisition or ownership, meaning that an IaaS provider needs to act as the infrastructure provider. The point of processing needs to be identified by the system architects without relying upon only in major public cloud providers.

IaaS Providers will be essential in offering edge solutions and identifying geographical areas where regional data centers could be used as points of presence for low-latency applications.

5.2 Future work

The research project could be extended by comparing a hybrid architecture, consisting of an edge and a cloud computing node. In addition, the experiment could be opened widely by simulating a real-world scenario where Apache Flink is used for processing the data stream. Real IoT devices, like Raspberry Pi or Arduino nodes, could also be used in the future.

Further experimentation and analysis are necessary to fully understand the factors influencing the system’s performance. In this experiment, we compared KVM and VMWare, but were unable to see if that affected the slight difference in throughput in favor of the KVM platform at the cloud node.

Another consideration for the future would be to increase the available bandwidth. Currently, there are VMs in GCP specifically designed for streaming applications. We could utilize such a VM and increase the bandwidth on the source and edge nodes, to see if this will provide new findings. However, because of the economical aspect, we couldn’t afford to use the best-performing resources in this project.

Bibliography18

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] “Edge computing deployments by region 2028.” [Online]. Available: https://www.statista.com/statistics/1104059/worldwide-edge-computing-infrastructure-region/
2[2] W. Shi, J. Cao, Q. Zhang, Y. Li, and L. Xu, “Edge Computing: Vision and Challenges,” IEEE Internet of Things Journal , vol. 3, no. 5, pp. 637–646, Oct. 2016. doi: 10.1109/JIOT.2016.2579198. [Online]. Available: http://ieeexplore.ieee.org/document/7488250/
3[3] W. Z. Khan, E. Ahmed, S. Hakak, I. Yaqoob, and A. Ahmed, “Edge computing: A survey,” Future Generation Computer Systems , vol. 97, pp. 219–235, Aug. 2019. doi: 10.1016/j.future.2019.02.050. [Online]. Available: https://linkinghub.elsevier.com/retrieve/pii/S 0167739 X 18319903
4[4] L.-A. Phan, D.-T. Nguyen, M. Lee, D.-H. Park, and T. Kim, “Dynamic fog-to-fog offloading in SDN-based fog computing systems,” Future Generation Computer Systems , vol. 117, pp. 486–497, Apr. 2021. doi: 10.1016/j.future.2020.12.021. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S 0167739 X 20330831
5[5] P. Silva, A. Costan, and G. Antoniu, “Investigating Edge vs. Cloud Computing Trade-offs for Stream Processing,” in 2019 IEEE International Conference on Big Data (Big Data) . Los Angeles, CA, USA: IEEE, Dec. 2019. doi: 10.1109/Big Data 47090.2019.9006139. ISBN 978-1-72810-858-2 pp. 469–474. [Online]. Available: https://ieeexplore.ieee.org/document/9006139/
6[6] F. Carpio, M. Delgado, and A. Jukan, “Engineering and Experimentally Benchmarking a Container-based Edge Computing System,” in ICC 2020 - 2020 IEEE International Conference on Communications (ICC) . Dublin, Ireland: IEEE, Jun. 2020. doi: 10.1109/ICC 40277.2020.9148636. ISBN 978-1-72815-089-5 pp. 1–6. [Online]. Available: https://ieeexplore.ieee.org/document/9148636/
7[7] M. Vitali, P. Plebani, D. Bermbach, and E. Elmroth, “Special issue on co-design of data and computation management in Fog Computing,” Future Generation Computer Systems , vol. 129, pp. 423–424, Apr. 2022. doi: 10.1016/j.future.2021.11.001. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S 0167739 X 21004301
8[8] F. Al Mobayed, “Efficient High Performance Protocols For Long Distance Big Data File Transfer,” p. 134.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Code & Models

Videos

Taxonomy

Choosing an effective setup for stream processing

Abstract

Contents

1 List of Acronyms and Abbreviations

Acronyms

2 Introduction

2.1 Background

2.2 Literature review

2.3 Problem statement

3 Method(s)

3.1 Technical specifications

4 Results

4.1 Latency

4.2 Throughput and Resource utilization

4.3 Cost

5 Discussion

5.1 Conclusion

5.2 Future work