Opportunistic Data Ferrying in Areas with Limited Information and Communications Infrastructure
Ihab Mohammed, Shadha Tabatabai, Ala Al-Fuqaha, Junaid Qadir

TL;DR
This paper introduces an ensemble-based opportunistic data ferrying algorithm that leverages existing vehicles to reduce data delivery delays in smart city environments, demonstrated through real-world taxi trace experiments.
Contribution
It proposes a novel ensemble online hiring algorithm for vehicle-based data ferrying, improving delay performance over existing methods in urban smart city scenarios.
Findings
Reduces data delivery delay by up to 258% compared to baseline algorithms.
Effectively utilizes real-world taxi traces for empirical evaluation.
Outperforms four state-of-the-art online hiring algorithms.
Abstract
Interest in smart cities is rapidly rising due to the global rise in urbanization and the wide-scale instrumentation of modern cities. Due to the considerable infrastructural cost of setting up smart cities and smart communities, researchers are exploring the use of existing vehicles on the roads as "message ferries" for the transport data for smart community applications to avoid the cost of installing new communication infrastructure. In this paper, we propose an opportunistic data ferry selection algorithm that strives to select vehicles that can minimize the overall delay for data delivery from a source to a given destination. Our proposed opportunistic algorithm utilizes an ensemble of online hiring algorithms, which are run together in passive mode, to select the online hiring algorithm that has performed the best in recent history. The proposed ensemble based algorithm is…
| Time | Algorithm A | Algorithm B | Proposed | |
|---|---|---|---|---|
| Avg. delay | Avg. delay | Avg. delay | Selection | |
| 0 | 0 | 0 | Algorithm B | |
| 8 | 10 | 10 | Algorithm A | |
| 12 | 22 | 14 | Algorithm A | |
| 18 | 26 | 20 | Algorithm B | |
| 30 | 28 | 22 | Algorithm B | |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Opportunistic Algorithm for Data Ferrying in Smart Communities with Limited Communications Infrastructure
Ihab Mohammed , Shadha Tabatabai , Ala Al-Fuqaha , and Junaid Qadir
Opportunistic Data Ferrying in Areas with Limited Information and Communications Infrastructure
Ihab Mohammed , Shadha Tabatabai , Ala Al-Fuqaha , and Junaid Qadir
Abstract
Interest in smart cities is rapidly rising due to the global rise in urbanization and the wide-scale instrumentation of modern cities. Due to the considerable infrastructural cost of setting up smart cities and smart communities, researchers are exploring the use of existing vehicles on the roads as “message ferries” for the transport data for smart community applications to avoid the cost of installing new communication infrastructure. In this paper, we propose an opportunistic data ferry selection algorithm that strives to select vehicles that can minimize the overall delay for data delivery from a source to a given destination. Our proposed opportunistic algorithm utilizes an ensemble of online hiring algorithms, which are run together in passive mode, to select the online hiring algorithm that has performed the best in recent history. The proposed ensemble-based algorithm is evaluated empirically using real-world traces from taxies plying routes in Shanghai, China, and its performance is compared against a baseline of four state-of-the-art online hiring algorithms. A number of experiments are conducted and our results indicate that the proposed algorithm can reduce the overall delay compared to the baseline by an impressive 13% to 258%.
Index Terms:
Data ferrying, opportunistic online algorithm, smart communities, limited information and communications infrastructure, hiring algorithms.
I Introduction
It is estimated that about 60% of the world’s population will live in cities by 2030 [1]. Additionally, by the year 2020, around 20.4 billion devices are expected to be connected to the Internet [2]. To cope with the trend of people moving to urban centers and to provide high-quality services to their residents, municipalities are increasingly turning to Information and Communications Technologies (ICT)—such as cloud computing, Internet of Things (IoT), Wireless Sensor Networks (WSNs), and Cyber-Physical Systems (CPS) [3]—for the deployment of smart community applications that provide value-added services in diverse fields such as healthcare, transportation, entertainment, and governance [4].
But such smart community deployments are often prohibitively expensive, especially for smaller communities where the deployment of smart community applications is hindered by the unavailability of appropriate communication infrastructure. One way to address this concern is to exploit existing infrastructure in innovative ways. In particular, modern vehicles that abundantly ply the roads of urban cities can be exploited to obviate the need for an expensive communications infrastructure. In recent times, with increased interest in vehicular ad-hoc networks (VANETs) and self-driving cars, vehicles are increasingly becoming more sophisticated and it is expected that by the year 2020, 90% of vehicles will be equipped with a hardware-based On-Board Unit (OBU) [5] that has processing and communications capabilities. Therefore developing an approach for opportunistically accessing these smart vehicles in an efficient delay-tolerant manner becomes a promising approach towards the deployment of cost-effective and efficient smart community applications with limited ICT infrastructure overhead. For example, rural areas that lack the funds to deploy ICT infrastructure can benefit from our proposed approach.
One approach of delivering data in sparse networks is Message Ferrying (MF). In this approach, devices are classified as message ferries (or ferries) or regular nodes. Ferries are mobile devices that move around to collect messages from regular nodes to deliver them to their destination [6]. In this paper, we use this approach and utilize vehicles as ferries to transfer data collected from the smart devices (i.e., regular nodes) to the Smart Community Management Center (SCMC). One example of the SCMC is the Traffic Management Center (TMC), which is used for managing traffic in support of intelligent transportation applications.
In this paper, we envision a smart community application architecture where the service area is divided into blocks, each having at least a single Local Community Broker (LCB), where an LCB is a cloudlet that is deployed in the block or hosted on a vehicle that has processing and communications capabilities. Each LCB serves as a block manager responsible for selecting vehicles to transfer data collected from IoT devices scattered across the block to the Smart Community Management Center (SCMC). When a vehicle passes through the block, the block manager (i.e., LCB) acquires the estimated delivery delay from the vehicle and decides on whether to utilize it as a ferry to transfer collected data to the SCMC. Consequently, the vehicles themselves are used as ferries to transfer data between the different blocks of the service area. Actually, sparsely deployed LCBs are the only required infrastructure in our proposed architecture and no communications infrastructure (optical, microwave, or 5G base stations) is required to relay the data from the LCBs to the SCMC.
The main challenge in this architecture lies in the ‘intelligent’ selection of vehicles. If the block manager is not selective, it may end up using vehicles with high delivery delay as any vehicle might be selected even if it has a high delivery delay—delivery delay is the time taken to transfer a data bundle on a vehicles selected to serve as a data ferry from the LCB to the SCMC. On the other hand, if it is very selective and picks vehicles that have a short delivery delay, it would incur high waiting delay as it needs to wait longer to find such vehicles—waiting delay is the time taken to select a vehicle to serve as a data ferry. In fact, this problem is of an online nature because once a vehicle leaves the block, the block manager can no longer utilize the vehicle. Consequently, the block manager may regret its decision when the delivery delay of future vehicles is shorter than that of previous ones.
To solve this problem, we propose an online algorithm that utilizes an ensemble of online hiring algorithms. The proposed algorithm opportunistically selects the best performing algorithm from its ensemble based on the recent observed performance. The average overall delay is the sum of the average waiting delay and the average delivery delay as detailed later in the paper. The proposed algorithm selects one algorithm from its ensemble to be active with the objective of minimizing the average overall delay. All other algorithms in the ensemble are set to be passive (i.e., inactive). This way, the active algorithm alone selects the vehicles to be used as data ferries. Although other algorithms in the ensemble will not be utilized to select the data ferries, their performance is utilized in order to choose the best performing algorithm in the ensemble by choosing the one that demonstrated the lowest average overall delay in the recent history (i.e., greedy approach). Consequently, different hiring algorithms from the ensemble might be utilized over time.
To the best of our knowledge, this is the first research effort that utilizes online hiring algorithms for the selection of data ferries in the specific setting of smart community applications. We perform a thorough evaluation of our work and show that our proposed algorithm performs better than other state-of-the-art online hiring algorithms in a wide variety of settings (including for different traffic volumes).
II Related Work
The literature is rich with research that deals with network issues in smart communities. Jawhar et al. [3] presented the networking requirements for different smart city applications and additionally presented network architectures for different smart city systems. In [4], the authors discussed the networking and communications challenges encountered in smart cities. Paradells et al. [7] state that deploying wireless sensor networks along with the aggregation network in different locations in the smart city is very costly and consequently propose an infrastructure-less approach in which vehicles equipped with sensors is used to collect data.
Bouroumine et al. [8] present a system where public and semi-public vehicles are used for transporting data between stations distributed around the city and the main server. Aloqaily et al. [9] introduce the concept of Smart Vehicle as a Service (SVaaS). They predict the future location of the vehicle in order to guarantee a continuous vehicle service in smart cities. In another work [10], the authors indicate that cars will be the building blocks for future smart cities due to their mobility, communications, and processing capabilities. They propose Car4ICT, an architecture that uses cars as the main ICT resource in a smart city. The authors in [11] propose an algorithm for collecting and forwarding data through vehicles in a multi-hop fashion in smart cities. They proposed a ranking system in which vehicles are ranked based on the connection time between the OBU and the RSU. The authors claim that their ranking system results in a better delivery ratio and decrease the number of replicated messages.
In [12], authors state that existing network infrastructure in smart cities can not sustain the traffic generated by sensors. To overcome this problem, an investment in telecommunication infrastructure is required. However, authors proposed to exploit buses in a Delay Tolerant Network (DTN) to transfer data in smart cities. In [13], the authors introduce mobile cloud servers by installing servers on vehicles and use them in relief efforts of large-scale disasters to collect and share data. These mobile cloud servers convey data among isolated shelters while traveling and finally returning to the disaster relief headquarters. Vehicles exchange data while waiting in the disaster relief headquarters, which is connected to the Internet.
Bonola et al. [14] conduct a study on using taxi cabs as oblivious data mules for data collection and delivery in smart cities. They have no guarantee on data communications since they are using taxi cabs without any selection criteria. They use real taxi traces in the city of Rome and divide the city into blocks of size meter2. Depending only on opportunistic connections between vehicles and nodes, the authors claim achieving a coverage of 80% of the downtown area over a 24 hour period.
The aforementioned papers mostly utilize multiple relays for transferring data between source-destination locations. Furthermore, these papers do not approach the ferry selection problem from an online perspective. Conversely, in this paper we propose an approach where each vehicle transfers a data bundle from source to destination without having to use relays and decisions are made in an online fashion—these assumptions are practical as more vehicles utilize OBU and GPS units that provide exact or probabilistic information about the path of the vehicle. Additionally, this paper considers online hiring algorithms for data ferry selection.
III System Model
To utilize data ferrying in a given area, we divide the service area (e.g., city) into blocks. Each block has a number of smart devices (e.g., sensor nodes) that generate data. In addition, one of these blocks hosts the SCMC as shown in Figure 1. Vehicles are used to transfer data collected from the smart devices to the SCMC. Also, each block has one LCB positioned near the center of the block. Any vehicle that enters a given block is within the communications range of its LCB.
In this paper, we make two assumptions about incoming vehicles. First, it is known if the vehicle will pass through the SCMC in the future. Second, for vehicles that pass through the SCMC, the expected arrival time is known. Our assumptions are based on many research efforts that appeared in the recent literature [15, 16, 17, 18] to predict those parameters.
Each LCB serves as a block manager that contacts incoming vehicles to acquire two pieces of information; namely, whether the vehicle is going to pass through the SCMC block, and at what time (i.e., expected arrival time at the SCMC block). If a vehicle is going to visit the SCMC block, the block manager computes its expected arrival time. Using the expected arrival time, the block manager computes the delivery delay , which is the time required by the vehicle to transfer the data from the current block to the SCMC block. Besides, the block manager computes the waiting delay , which is the time between the selection of the last vehicle and the selection of the current vehicle. In other words, is the time the data must wait until a vehicle is selected to serve as a data ferry.
When a vehicle passes through a block, the block manager, running the proposed algorithm, makes a decision on whether to accept or reject the current vehicle to serve as a data ferry. The block manager computes the delivery delay and the waiting delay and passes them to the proposed algorithm. Once the proposed algorithm makes a decision, it is impossible for that decision to be altered. The proposed algorithm utilizes an ensemble of online hiring algorithm as explained in Section V. If more than one vehicle exists in a block, and these vehicles are going to pass through the SCMC block some time in the future, the block manager considers only the vehicle with the minimum .
Once a vehicle is selected, the LCB uploads data of block to the vehicle. Next, the vehicle continues its trip and eventually passing through block , which has the SCMC. Once in block , the OBU of the vehicle uploads data collected from block to the LCB of block , which in turn conveys it to the SCMC. These steps are illustrated in Fig 1.
IV Online Hiring Algorithms
Many companies around the world use hiring algorithms to select employees instead of the traditional manual selection process. Actually, there are many flavors of the hiring algorithm. In [19] [20], researchers investigate the performance of the different heuristics for the classical secretary problem that select the best candidate out of multiple candidates. The problem involves an interviewer interviewing candidates one at a time for a position and then deciding after each interview if the interviewee is the best candidate. The overall goal in this problem that seeks to decide under uncertainty is to maximize the probability of choosing the best candidate. To select the best candidate, the authors introduce three hiring algorithms [19] including (1) hire above a threshold, (2) hire above minimum or maximum, and (3) hire above mean or median (Lake Wobegon111Lake Wobegon refers to a fictional town, where “all the women are strong, all the men are good looking, and all the children are above average.”. A Lake Wobegon strategy refers to one that hires above the average (mean or median).). We provide a brief description of these algorithms next.
- •
Hiring Above a Threshold: In this version of the hiring algorithm, vehicle is selected only if the delivery delay is less than or equal to a fixed threshold .
- •
Hiring Above the Mean: In this hiring algorithm version, vehicle is selected only if the delivery delay is less than (the average delivery delay of selected vehicles in block ). Initially, the algorithm accepts the first vehicle that enters block and then sets to , and subsequently decreases gradually as the algorithm accepts more vehicles.
- •
Hiring Above the Median: Hiring above the median uses (the median of the delivery delay of all selected vehicles in block ). Like hiring above the mean, this algorithm initially accepts the first vehicle that enters block . This algorithm needs an odd number of selected vehicles before recomputing since is the value in the middle after sorting. Therefore, after selecting a vehicle, if the number of selected vehicles is even, the algorithm does not update postponing the update of to the situation where the number of vehicles is odd.
V Heuristic Solution
In this work, we propose an algorithm that strives to minimize the average overall delay for transporting data from one block to the SCMC block. The overall delay is the sum of the waiting delay and the delivery delay. The idea is simply to run an ensemble of online hiring algorithms in passive mode while selecting only one of them to be active at any point in time. By passive, we mean an algorithm makes a decision for whether a given vehicle should be selected to serve as a data ferry but the decision is not executed. This is done in order to collect performance metrics needed to compare the performance of the different algorithms in the ensemble.
The proposed algorithm utilizes four hiring algorithms; namely, low threshold, high threshold, mean, and median. These algorithms can only consider the delivery delay and cannot take the waiting delay into consideration. In other words, they cannot make a decision based on the overall delay which includes the waiting delay. This stems from the fact that the waiting delay can be larger than the threshold used by those algorithms. Consequently, those algorithms will reject all requests after that time and will be stuck in this state forever. For example, if the threshold of the low threshold algorithm is set to 50 minutes and the algorithm is waiting for more than 50 minutes (i.e., waiting delay is greater than 50) then the overall delay will always be greater than 50 even if the delivery delay is zero. Thus, the low threshold algorithm will reject vehicles from serving as data ferries indefinitely. However, the proposed algorithm is capable of analyzing the history of all algorithms in its ensemble in terms of the overall delay. Moreover, to be efficient, the proposed algorithm has the chance to switch between the four algorithms in its ensemble every time units in case the performance of the already selected algorithm deteriorates.
Algorithm (1) shows the three parts of the proposed algorithm. One of the four algorithms in the ensemble is selected randomly to serve as the active algorithm in the initialization part, which is executed once when the algorithm starts. The second part is executed whenever a vehicle arrives and a decision needs to be made on whether to select it as a data ferry. In the second part, the four algorithms in the ensemble are executed and the overall delay of each one is saved for later analysis in the third part. Moreover, the decision made by the active algorithm is committed while decisions of other algorithms are ignored. The last part is only executed after time units had passed. Furthermore, the average overall delay of all algorithms is computed based on saved data and the algorithm with the minimum average overall delay is set as the active algorithm while setting the other algorithms as passive.
VI Illustrative Example
Let Algorithm A and Algorithm B be two online hiring algorithms with Algorithm A initialized in passive mode and Algorithm B initialized in active mode (i.e., selected randomly by the proposed algorithm) at . Table I shows the average overall delay per algorithm recorded every minutes (see Algorithm 1), which is computed as the average of overall delays in the period to , where . The performance of the proposed algorithm is not the best at since it randomly selects Algorithm B as the active algorithm between and , which performs worst than Algorithm A. However, the proposed algorithm switches to Algorithm A at and got an average overall delay of 4 minutes between and , which is the same value gained by Algorithm A. Between and , Algorithm B performs better than Algorithm A leading the proposed algorithm to get 6 minutes instead of 2 minutes. In , the proposed algorithm switches to Algorithm B and get the minimum average overall delay of 2 minutes between and . By , the proposed algorithm achieves less average overall delay compared to both algorithms.
VII Experimental Results
In this section, we describe the dataset used in our experiments; explain the experiments’ settings; evaluate the performance of the proposed online algorithm by comparing it with four baseline online hiring algorithms using real vehicular traces; and finally, discuss the results and present the major insights learned from our experiments.
VII-A Dataset & Experimental Settings
In our experiments, we make use of the Shanghai dataset consists of taxi traces collected in the city of Shanghai in China. Each taxi has a GPS unit and a GPRS wireless communications modem. Vehicles send their GPS location along with other information to a data center every minute. Around 2,109 taxis participated in this dataset in 2007. Information sent by taxis includes ID, timestamp, longitude and latitude, speed, and heading direction [21].
In order to utilize the Shanghai dataset, we have encoded the geographical location that encompasses longitude and latitude into a string of 7 characters using the GeoHashing method [22]. Every string represents a grid (i.e., block) of the city. Actually, we used a GeoHashing of 7 characters because this allows for the division of the globe into blocks, each of meters, which is within the communications coverage of a typical LCB. Using these blocks, we position the SCMC in the block that is mostly visited by vehicles. Additionally, we filtered the dataset to remove blocks that have no traffic activity and focus on the active blocks. The dataset is based on a one-day observation. However, one day is a very short period for the proposed algorithm to work effectively. Therefore, we replicate the one-day data a number of times to have datasets for 5, 10, 15, 20, and 25 continuous days.
To study the performance of the proposed algorithm under different traffic scenarios, we divide the city into three areas based on the traffic volume. We computed the average number of vehicles per block along with the standard deviation and found that the standard deviation is greater than the average. Therefore, we categorized each block based on , the number of vehicles in block , as follows:
- •
Light traffic area: average
- •
Medium traffic area: average standard deviation
- •
High traffic area: standard deviation
We set to 30 minutes in all of the experiments to be consistent. Also, to derive the low and high threshold values for the threshold algorithms in every block, we used a percentile of the delivery delay in the block. The dataset used in the experiment has a heavy-tailed distribution and particularly a long-tail distribution and we resort to extremely low and high threshold values—\nth2 percentile for the low threshold algorithm and \nth95 percentile for the high threshold algorithm—to fully explore the space of values in such a distribution.
VII-B Results Discussion
VII-B1 Evaluation of the proposed algorithm for different traffic volumes
Considering the four baseline hiring algorithms only, it can be clearly seen that a different one outperforms in each area depending on the traffic volume. The mean algorithm is the best in light traffic areas, the high threshold is the best in medium traffic areas, and the low threshold is the best in high traffic areas as shown in Fig. 2.
The low threshold algorithm suffers from a high overall delay in the light traffic areas because the waiting delay is very high since it is very selective. However, the low threshold algorithm outperforms in the area of the high traffic since it only selects vehicles with low delivery delay and there are plenty of vehicles to pick from. On the other hand, the high threshold algorithm performs best in the medium traffic areas, which provide a balance between waiting delay and delivery delay. Since the high threshold algorithm accepts the majority of vehicles, it benefits from this balance. As for the mean and median algorithms, their performance is best in the areas of light traffic. This is because the thresholds of these algorithms decrease with more vehicles. The more these algorithms accept, the more greedy they become towards a lower threshold.
The proposed algorithm achieves the best results in all areas with up to 258% less overall delay albeit at a cost. To understand this cost, we focus on the 10 days results and record the number of selected vehicles, average delivery delay, average waiting delay, and average overall delay.
The proposed algorithm outperforms the baselines algorithms regardless of the traffic volume by either performing better on the waiting delay or on the delivery delay but not both as indicated in Fig. 3.
It should be noted that the proposed algorithm does not only perform better in terms of the average overall delay but it also accepts more vehicles to serve as data ferries as shown in Fig. 4 (with the exception of the high threshold algorithm since it accepts the majority of vehicles in all areas).
VII-B2 Evaluation of the proposed algorithm for different time periods
To assess the performance of proposed algorithm relative to the four baseline hiring algorithms, we run the algorithms for different number of days. Results are collected in terms of the average overall delay in each of the three different areas as shown in Fig. 5. The figure shows the consistent behaviour of the proposed algorithm regardless of the number of days.
VII-B3 Switching activity of the proposed algorithm
To show the switching activity of the proposed algorithm, we record the average overall delay every hour for one block over 10 days as illustrated in Fig. 6. The figure shows how some algorithms perform better for a period of time and how the proposed algorithm follows the one with the minimum average overall delay based on performance collected from the recent history.
VIII Conclusions and Future Work
In this paper, the problem of selecting vehicles to serve as data ferries in support of smart community applications is considered. The selection process strives to achieve the minimum average overall delay. An online algorithm is proposed that utilizes four online hiring algorithms by running all of them together in passive mode and selecting the one that has performed the best in recent history. The proposed algorithm is evaluated using real taxi traces from the city of Shanghai in China and compared against a baseline of four online hiring algorithms. Experiments with these traces demonstrate that the proposed algorithm outperforms online hiring algorithms presented in the literature regardless of the traffic volume by either performing better on the waiting delay or on the delivery delay but not both.
In the future, we plan to evaluate the proposed algorithm analytically to provide performance guarantees, in terms of competitive ratio, in worst-case scenarios.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] United Nations Population Division, “Data booklet - the world’s cities in 2016,” 2016. [Online]. Available: http://tinyurl.com/World Cities 2016
- 2[2] S. Tabatabai, I. Mohammed, A. Al-Fuqaha, and M. A. Salahuddin, “Managing a cluster of Io T brokers in support of smart city applications,” in IEEE 28th Annual International Symposium on Personal, Indoor, and Mobile Radio Communications (PIMRC) , Oct 2017, pp. 1–6.
- 3[3] I. Jawhar, N. Mohamed, and J. Al-Jaroodi, “Networking and communication for smart city systems,” in IEEE Smart World, Ubiquitous Intelligence Computing, Advanced Trusted Computed, Scalable Computing Communications, Cloud Big Data Computing, Internet of People and Smart City Innovation , Aug 2017, pp. 1–7.
- 4[4] A. Gharaibeh, M. A. Salahuddin, S. J. Hussini, A. Khreishah, I. Khalil, M. Guizani, and A. Al-Fuqaha, “Smart cities: A survey on data management, security, and enabling technologies,” IEEE Communications Surveys Tutorials , vol. 19, no. 4, pp. 2456–2501, Fourthquarter 2017.
- 5[5] Z. Su, Y. Hui, and Q. Yang, “The next generation vehicular networks: A content-centric framework,” IEEE Wireless Communications , vol. 24, no. 1, pp. 60–66, February 2017.
- 6[6] W. Zhao, M. Ammar, and E. Zegura, “A message ferrying approach for data delivery in sparse mobile ad hoc networks,” in Proceedings of the 5th ACM international symposium on Mobile ad hoc networking and computing . ACM, 2004, pp. 187–198.
- 7[7] J. Paradells, C. Gomez, I. Demirkol, J. Oller, and M. Catalan, “Infrastructureless smart cities. use cases and performance,” in International Conference on Smart Communications in Network Technologies (Sa Co Ne T) , June 2014, pp. 1–6.
- 8[8] A. Bouroumine, M. Zekraoui, and M. Abdelilah, “The influence of the opportunistic vehicular networks on smart cities management study case on Agdal district in Rabat city,” in 4th IEEE International Colloquium on Information Science and Technology (Ci St) , Oct 2016, pp. 830–834.
