Cloud Resource Optimization for Processing Multiple Streams of Visual Data
Zohar Kapach, Andrew Ulmer, Daniel Merrick, Arshad Alikhan,, Yung-Hsiang Lu, Anup Mohan, Ahmed S. Kaseb, George K. Thiruvathukal

TL;DR
This paper presents a cloud resource management approach for analyzing real-time visual data streams from network cameras, achieving significant cost savings by optimizing instance types and allocations.
Contribution
It introduces a novel resource allocation strategy that considers analysis types, stream counts, and camera locations to optimize cloud resource usage for real-time visual data processing.
Findings
Over 50% cost reduction demonstrated on AWS
Effective resource allocation for real-time camera data analysis
Optimized selection of cloud instance types
Abstract
Hundreds of millions of network cameras have been installed throughout the world. Each is capable of providing a vast amount of real-time data. Analyzing the massive data generated by these cameras requires significant computational resources and the demands may vary over time. Cloud computing shows the most promise to provide the needed resources on demand. In this article, we investigate how to allocate cloud resources when analyzing real-time data streams from network cameras. A resource manager considers many factors that affect its decisions, including the types of analysis, the number of data streams, and the locations of the cameras. The manager then selects the most cost-efficient types of cloud instances (e.g. CPU vs. GPGPU) to meet the computational demands for analyzing streams. We evaluate the effectiveness of our approach using Amazon Web Services. Experiments demonstrate…
| Vendor | Instance | Cores | Memory (GiB) | GPU | Price Per Hour (US$) | ||
| Virginia | London | Singapore | |||||
| EC2 | c4.2xlarge | 8 | 15 | 0 | 0.398 | 0.476 | 0.462 |
| c4.8xlarge | 36 | 60 | 0 | 1.591 | 1.902 | 1.848 | |
| g3.8xlarge | 32 | 244 | 2 | 2.280 | N/A | 3.340 | |
| US East | West Europe | East Asia | |||||
| Azure | D8 v3 | 8 | 32 | 0 | 0.384 | 0.480 | 0.625 |
| NC24r | 24 | 224 | 4 | 3.960 | 5.132 | N/A | |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
\newfloatcommand
capbtabboxtable[][] \floatsetup[figure]style=plain,capbesidewidth=.1cm,capbesideposition=top \floatsetup[table]style=plain,capposition=bottom \floatsetup[subfigure]capbesideposition=left
Cloud Resource Optimization for Processing Multiple Streams of Visual Data
Zohar Kapach, Andrew Ulmer, Daniel Merrick, Arshad Alikhan, Yung-Hsiang Lu,
Anup MohanIntel, Santa Clara, CA, USA
[email protected] Ahmed S. KasebCairo University, Giza, Egypt
[email protected] George K. ThiruvathukalLoyola University Chicago, Chicago, IL, USA Argonne National Laboratory, Argonne, IL, USA
Purdue University, West Lafayette, IN, USA
{zkapach, ulmera, dmerrick, aalikhan, yunglu}@purdue.edu
Abstract
Hundreds of millions of network cameras have been installed throughout the world. Each is capable of providing a vast amount of real-time data. Analyzing the massive data generated by these cameras requires significant computational resources and the demands may vary over time. Cloud computing shows the most promise to provide the needed resources on demand. In this article, we investigate how to allocate cloud resources when analyzing real-time data streams from network cameras. A resource manager considers many factors that affect its decisions, including the types of analysis, the number of data streams, and the locations of the cameras. The manager then selects the most cost-efficient types of cloud instances (e.g. CPU vs. GPGPU) to meet the computational demands for analyzing streams. We evaluate the effectiveness of our approach using Amazon Web Services. Experiments demonstrate more than 50% cost reduction for real workloads.
Introduction
Visual data is projected to grow exponentially in the coming years. Video is expected to grow 26% annually and account for 82% of consumer Internet traffic; live video on the Internet is expected to grow 1,500% from 2016 to 2021 [1]. Surveillance cameras are expected to see similar growth. Today, more than 240 million surveillance cameras have been installed globally [2] and Stratistics MRC predicts that the market will continue to grow 18.3% annually. These video surveillance cameras can produce vast amounts of real-time data. With the rapid progress of machine learning and computer vision, it is possible to analyze these real-time streams. Two key technologies are essential to harness the potential of these real-time data steams: (1) analyzing the data using computer vision, and (2) employing scalable resources to meet the demands of computation.
This paper focuses on solving the second problem. This research group adopts both supercomputing and cloud computing to address the problem from different perspectives. Supercomputing systems are optimized for speed and I/O throughput but are limited when it comes to meeting fluctuating demands because of job scheduling requirements. Cloud computing offers the potential for high-end computing resources and also allows for on-demand operation. Proponents of cloud computing are willing to make trade-offs when it comes to exchanging CPU and I/O speed in favor of high availability and flexible scheduling. This paper describes our experiences with cloud computing to analyze many video streams from network cameras.
Many factors affect the resource requirements for analyzing visual data streams, including the complexity of the analysis programs, the content being analyzed, the size (number of pixels) of each image or video frame, the frame rates, etc. The requirements may vary over time. For example, a program that analyzes video streams from traffic cameras to detect congestion may run during rush hours only. Cloud computing is the best option to meet these varying needs. Cloud computing allows users to dynamically allocate virtual machines (called instances) on demand. Cloud vendors, such as Amazon EC2 (Elastic Cloud Computing), Microsoft Azure, and IBM Cloud, provide many types of cloud instances with different amounts of memory, number of cores, and number of GPUs (graphics processing units) at different prices (US dollars per hour). These instances reside in data centers distributed in North and South America, Europe, Asia, and Australia. The challenge is minimizing the cost of cloud instances without sacrificing performance.
To drive the research in cloud resource optimization for processing multiple visual data streams, a software infrastructure named Continuous Analysis of Many Cameras (CAM2) has been established at Purdue University [3]. CAM2 uses network cameras that provide real-time visual data publicly available on the Internet. These cameras observe traffic intersections, metropolitan areas, university campuses, tourist attractions, etc. Some cameras provide videos and others show snapshots. The analysis of real-time data streams can be used in many applications, such as urban planning and emergency response.
[TABLE]
Resource Management of Cloud Instances
As mentioned earlier, cloud instances would be ideal solutions for meeting the varying demands of video analytics. Cloud vendors provide many options; some instances have GPUs while the others do not. Amazon EC2 has optimizers for processes, memory, and networks. IBM cloud gives users the option of selecting virtual machines or physical machines (called ``bare metal servers''); within each category there are dozens of options with different types of processors and amounts of memory. Microsoft Azure also has many configurations to choose from. Table I shows the prices of several types of cloud instances at different locations.
Because of the wide range of prices, instance types, and locations, a resource manager is essential when optimizing large scale cloud usage. Figure 1 illustrates the role of a resource manager. The resource manager has to consider the following factors when selecting the most appropriate cloud instances:
- •
Characteristics of analysis programs. Different programs have different resource requirements: some programs benefit from more cores, some need more memory, and some need GPUs.
- •
Desired frame rates. Some analysis programs (such as tracking moving objects) need high frame rates. For some other programs (such as observing air quality or traffic congestion), low frame rates are sufficient.
- •
Image or frame sizes, in terms of pixels. If an image has more pixels, more computation is needed.
- •
Content of the data. The execution time and resource requirements depend on the complexity of the content. Complex content (e.g., many moving objects in a video stream) may require more computing resources than simple content.
- •
Locations of cameras and cloud instances. When analyzing video streams, the distances between cloud instances and cameras (measured by the round-trip time) can affect the frame rates [5]. Therefore, there may be restrictions on the location of an instance.
Based on these inputs, the resource manager selects the cloud instances. An instance's configuration (also called the type) corresponds to the number of cores, the amount of memory, the presence of GPUs, and the geographical location. The choice is made to meet the resource requirements at the lowest possible cost. This resource manager is dynamic and its decisions may change over time because the demands may vary. This paper presents two optimization strategies: one manages CPU and GPU usage, and the other manages the locations of instances.
[TABLE]
Adaptive Resource Management for Video Analysis in the Cloud
A solution to handling cloud resource management, proposed by Mohan et al. [6], is Adaptive Resource Management for Video Analysis in the Cloud (ARMVAC) . This method does the following: (1) reads inputs necessary for modeling the problem as a Vector Bin Packing Problem, (2) selects the locations of cloud instances to be considered for the given analysis, (3) determines the types and number of cloud instances needed for the analysis, and (4) employs an adaptive resource management solution to adjust resource requirements during runtime. Kaseb et al. [7] improve ARMVAC by considering cloud instances with both CPU and GPU. Mohan et al. [8] extend ARMVAC by considering instances' locations. The following sections explain these improvements.
CPU and GPU Management in the Cloud
Differences between CPUs and GPUs
Central processing units (CPUs) and graphics processing units (GPUs, also called general-purpose graphics processing units GPGPUs) have different characteristics and capabilities. CPUs, with several to dozens of processing cores, are the brains of computers. CPU cores can handle complex program flows with many control statements. In contrast, GPUs have thousands of smaller and simpler cores. GPUs can be significantly faster when performing similar tasks on many pieces of data, such as videos and images. GPUs adopt the SIMD (single instruction, multiple data) style parallelism: the same computation runs on different elements of arrays.
Another key difference between CPUs and GPUs is price. For example, Amazon EC2's c5d.9xlarge CPU instance has 36 virtual CPUs with 72 GB of memory and costs 3.06 per hour. Another GPU instance, p3.8xlarge, has 32 virtual CPUs and 244 GB memory and costs $12.24 per hour.
[TABLE]
Select Instances with CPU and GPU
The significant differences in price and performance of CPU-only and GPU-equipped instances makes it critical to select the most cost-effective instances when analyzing many real-time data streams. To effectively select CPU and GPU instances for this task, Kaseb et al. [7] formulated the problem as multi-dimensional multi-choice packing problem (please see the sidebar for explanation). This is illustrated in Figure 2 (a). In this figure, there are three types of data streams and four types of cloud instances. The goal is to fit the streams so that they completely fit inside the cloud instance and as little space is wasted as possible. Figure 2 (b) shows possible solutions to effectively pack different combinations of the data streams.
Kaseb's solution organizes the resource requirements into four dimensions: CPU, memory size, GPU, and GPU memory size. The method considers the frame sizes and frame rates of the video streams for determining the resource requirements needed to run analysis on different data streams. Due to the fluctuations in executing the analysis programs, the study discovers that when any dimension is more than 90% utilized, the performance starts to degrade. Thus, the method keeps the utilization of each dimension below 90%. This multi-dimension, multi-choice optimization solution demonstrates considerable cost savings in different experimental settings.
Evaluation results from [7] are shown in Figure 3. The experiments use ten network cameras from CAM2's database, with frame rates varying from 0.2 frames per second to 8 frames per second. Two object detection programs are used to analyze the data: VGG16 [11] and ZF [12]. At the highest frame rates, GPUs can accelerate these two analysis programs up to 16 times. At the lowest frame rates, the improvement falls below 5%. In other words, the benefits of GPUs are apparent only when the frame rates are high. At low frames rates, CPUs are preferred because of the lower costs. This solution can reduce the costs by as much as 61% by matching the resource requirements of the analysis programs and the cloud instances' capabilities.
This study provides deep insights on how to offer computer vision services at lower costs as they become widely available in the cloud. This study, however, does not consider the geographical locations of network cameras. The next section explains how cameras' locations impact network distances and the cloud resource management of different types of instances.
Optimizing Instance Type and Location
In order to determine the most effective configuration of resources, the resource manager considers the cost of an instance in the context of its location. To make these considerations in practice, the location and type are first evaluated independently as follows.
Local Optimization
Table I shows that the same type of cloud instance at different locations can have different costs. Sometimes this cost disparity can exceed 60%. For example, the Azure D8 v3 instance costs 63% more in Singapore than in Virginia USA (). A natural question is whether the data from network cameras should be sent to the cloud instances with the lowest prices. A prior study [5] shows that the observed frame rate is reduced when the distance (measured by network round-trip time) between a network camera and a cloud instance increases. Thus, when analyzing data streams from worldwide network cameras, the locations of the cameras and cloud instances must be considered.
Existing video cameras are designed for human viewing. For this purpose, 30 frames per second provide seamless experience. When video streams are analyzed by computers, the needed frame rates depend on the purposes. Though high frame rates are needed for tracking fast moving objects, low frame rates are sufficient for observing phenomena such as weather. Mohan et al. [13] study the necessary frame rates to track objects such as people walking, jogging, cycling etc. The study discovers that for cameras watching pedestrians walking, the frame rates can be reduced to as low as six frames per second. For objects that are far away from the cameras, even lower frame rates suffice.
Figure 4 illustrates the relationships between frame rates and geographical locations. In this figure, a small circle indicates a high frame rate. When a high frame rate is desired, the data stream can be sent only a short distance - measured by the round-trip time (RTT). This requires the resource manager to analyze the data stream at a cloud instance near the network camera. In Figure 4 (a), six separate cloud instances are needed because the circles do not overlap. If a lower frame rate is acceptable, the acceptable RTT is higher and the circles can be larger, as shown in Figure 4 (b). One cloud instance is capable of analyzing multiple data streams. As a result, only the three boxed instances are needed and the cost can be reduced.
Instance Type Optimization
Typically, when multiple data streams are analyzed at one instance, additional cores, memory, or the presence of GPUs are required. This results in higher costs. Thus, to effectively optimize cloud instance usage, the resource manager has to consider the number of instances as well as their capabilities. Consider the example in Figure 5. Here, data from eight network cameras is analyzed, and the cloud manager must decide which instances to use. The cloud manager has the options to choose three types of instances at 2, and $3 per hour. The first instance has the fewest cores and the least amount of memory; consequently, it can analyze only two data streams. The third type of instance, despite the higher cost, can analyze eight data streams at the lowest cost per stream.
Considering instances' types and locations simultaneously makes cloud resource management a complex optimization problem. Mohan et al. [8, 6] propose to first eliminate instance locations that are outside the acceptable RTT range. This method, named ARMVAC, then selects the lowest-cost instances from the remaining pool, and sends as many data streams to this instance while meeting the desired frame rates. This strategy performs well for high and low frame rates; streams with higher than 20 frames per second perform well since few instances can meet the processing requirements. Analyzing data streams with lower than one frame per second also performs well since there are few restrictions on instance requirements. The method does not perform well, however, when the desired frame rates are between one and twenty frames per second. In this range there are too many instance selections that can analyze the data. Mohan et al. [8] resolve this issue by formulating it as the multi-dimensional, multi-choice packing problem that accounts for the camera to cloud instance price ratio. This method, named Globally Cheapest Location (GCL), can reduce cost by as much as 56% compared with a resource manager that always selects the Nearest Location (NL) instances, and 31% compared with the ARMVAC method. An evaluation of the relationship between cost and frame rates is shown in Figure 6 which compares ARMVAC, GCL, and NL solutions. As explained earlier, the analysis programs' resource demands may vary due to a wide range of reasons. These methods can make resource decisions quickly and be applied during runtime. An experiment shows the adaptive solutions implemented in Amazon EC2 responding to the changing needs is presented in [14].
Summary
With the rise of the ``Internet of Video Things'' [15], comes the possibility to make use of the massive amount of visual data. Analyzing the data requires a large amount of cloud computing resources. With the variety of cloud services available, it is important to optimize cloud-instance utilization to save money. This work proposes a cloud resource manager to make cost-effective use of both the real-time video data available on the Internet and the wide variety of cloud services available. The resource manager determines cost-effective ways to analyze video streams using cloud instances. It considers the geographic location of an instance relative to a camera, as well as the resources available in particular instances. By taking these factors into consideration, more than 50% cost can be saved when using a commercial cloud vendor.
Acknowledgements
This research project is supported by the National Science Foundation OAC-1535108, IIP-1530914, OISE-1427808, and CNS-0958487. We also acknowledge the Lynn CSE Fellowship at Purdue University, Amazon Web Services, Microsoft Azure, Google, Facebook, and Intel for their financial or technical supports. Thiruvathukal has a director's discretionary allocation from the Argonne National Laboratory to support the supercomputing aspects of this research. We thank the owners of the data for the permission to conduct the experiments. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the sponsors.
About The Authors
Zohar Kapach is pursuing his B.S degree in computer engineering from Purdue University. He is leading the transfer learning research team in the CAM2 project. His research interests include cloud resource optimization, deep learning, and computer vision.
Email: [email protected]
Andrew Ulmer is a senior undergraduate student studying computer engineering and statistics at Purdue University. His interests include deep learning, computer vision, and entrepreneurship.
Email: [email protected]
Daniel Merrick is pursuing his B.S degree in electrical engineering from Purdue University, West Lafayette with an expected graduation date of Fall 2019. His research interests include machine learning and computer vision.
Email: [email protected]
Arshad Alikhan is currently a BS student in Computer Science at Purdue University with a concentration in Machine Learning and a Mathematics minor. He is currently doing research with CAM2 in the area of transfer learning. His research interests include machine learning and cloud computing.
Email: [email protected]
Yung-Hsiang Lu is a professor in the School of Electrical and Computer Engineering and (by courtesy) the Department of Computer Science of Purdue University. He is an ACM distinguished scientist and ACM distinguished speaker. Dr. Lu is a co-founder and the scientific adviser of a technology company using video analytics to improve shoppers' experience in physical stores.
Email: [email protected]
Anup Mohan obtained his Ph.D. from the School of Electrical and Computer Engineering at Purdue University in 2017. His research interests include large-scale video analysis, cloud computing, and big data analysis. Anup Mohan is currently working at Intel Corporation, Santa Clara, U.S.A.
Email: [email protected]
Ahmed S. Kaseb is an assistant professor of computer engineering in the Faculty of Engineering at Cairo University. He obtained the Ph.D. in computer engineering from Purdue University in 2016. He obtained the M.S. and B.E. in computer engineering from Cairo University in 2013 and 2010 respectively.
Email: [email protected]
George K. Thiruvathukal is a Professor of Computer Science at Loyola University Chicago and visiting faculty at Argonne National Laboratory in the Argonne Leadership Computing Facility.
Email: [email protected]
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Cisco. Cisco visual networking index: Forecast and methodology, 2016-2021. Document ID:1465272001663118, September 15 2017.
- 2[2] Niall Jenkins. 245 million video surveillance cameras installed globally in 2014. IHS Markit Insight, June 11 2015.
- 3[3] A. S. Kaseb, Y. Koh, E. Berry, K. Mc Nulty, Y. H. Lu, and E. J. Delp. Multimedia content creation using global network cameras: The making of cam 2. In 2015 IEEE Global Conference on Signal and Information Processing (Global SIP) , pages 15–18, Dec 2015.
- 4[4] A. S. Kaseb, E. Berry, Y. Koh, A. Mohan, W. Chen, H. Li, Y. H. Lu, and E. J. Delp. A system for large-scale analysis of distributed cameras. In 2014 IEEE Global Conference on Signal and Information Processing (Global SIP) , pages 340–344, Dec 2014.
- 5[5] W. Chen, A. Mohan, Y. H. Lu, T. Hacker, W. T. Ooi, and E. J. Delp. Analysis of large-scale distributed cameras using the cloud. IEEE Cloud Computing , 2(5):54–62, Sept 2015.
- 6[6] Anup Mohan, Ahmed S Kaseb, Yung-Hsiang Lu, and Thomas Hacker. Adaptive resource management for analyzing video streams from globally distributed network cameras. IEEE Transactions on Cloud Computing , 2018.
- 7[7] Ahmed S Kaseb, Bo Fu, Anup Mohan, Yung-Hsiang Lu, Amy Reibman, and George K Thiruvathukal. Analyzing real-time multimedia content from network cameras: Using cpus and gpus in the cloud. ar Xiv preprint ar Xiv:1802.08176 , 2018.
- 8[8] Anup Mohan, Ahmed S Kaseb, Yung-Hsiang Lu, and Thomas J Hacker. Location based cloud resource management for analyzing real-time videos from globally distributed network cameras. In Cloud Computing Technology and Science (Cloud Com), 2016 IEEE International Conference on , pages 176–183. IEEE, 2016.
