Temporal Data Fusion at the Edge

Linfu Yang; Bin Liu

arXiv:1907.12042·cs.DC·September 9, 2019

Temporal Data Fusion at the Edge

Linfu Yang, Bin Liu

PDF

Open Access

TL;DR

This paper introduces GPTDF, a Gaussian process-based method for real-time temporal data fusion at the edge, enhancing privacy, reducing latency, and saving bandwidth in IoT applications.

Contribution

It proposes a novel Gaussian process-based approach for temporal data fusion directly at the edge, addressing privacy, latency, and bandwidth issues in IoT systems.

Findings

01

GPTDF improves prediction accuracy over traditional methods.

02

It reduces latency and bandwidth consumption in edge computing.

03

Real-data experiments show timely, accurate predictions at the network edge.

Abstract

As an enabler technique, data fusion has gained great attention in the context of Internet of things (IoT). In traditional settings, data fusion is done at the cloud server. So the data to be fused should be transferred from the sensor nodes to the cloud server before data fusion. Such an application mode of data fusion inherits disturbing concerns from the cloud computing framework, e.g., privacy-leaking, large latency between data capture and computation, excessive ingress bandwidth consumption. We take into account how to do temporal data fusion at the edge to bypass the above issues. We present a Gaussian process based temporal data fusion (GPTDF) method targeted for the problem of sequential online prediction at the edge. The GPTDF method fits the edge computing framework and thus inherits desirable properties from edge computing, such as privacy-preserving, low latency between…

Tables3

Table 1. TABLE I: Prediction performance comparison

	NLL	MAE	MSE	Delay
GPTDF-All	0.2839	0.2472	0.1041	0
GPTDF-I	0.1591	0.2081	0.0777	0
GPTDF-II	3.4316	0.3098	0.1673	0
GP-I ( $N = 50$ )	5.7005	0.2954	0.1652	50
GP-II ( $N = 100$ )	1.8171	0.2398	0.1078	100
GP-III ( $N = 150$ )	0.1560	0.2063	0.0788	150

Table 2. TABLE II: Hyper-parameter values of the candidate GP models employed by “GPTDF-I”

	$σ_{l}$	$σ_{f}$	$σ_{n}$
$ℳ_{1}$	2.0752	0.8215	0.1001
$ℳ_{2}$	2.4335	0.8069	0.1000
$ℳ_{3}$	2.2916	0.8096	0.1001
$ℳ_{4}$	2.1494	0.8206	0.1000

Table 3. TABLE III: Hyper-parameter values of the candidate GP models employed by “GPTDF-II”

	$σ_{l}$	$σ_{f}$	$σ_{n}$
$ℳ_{1}$	7.3899	0.7773	0.1000
$ℳ_{2}$	4.5846	0.7778	0.1007
$ℳ_{3}$	9.6141	0.7897	0.1001
$ℳ_{4}$	7.5284	0.8471	0.1003

Equations29

y (t) = f (t), f \sim G P (μ, k_{θ}),

y (t) = f (t), f \sim G P (μ, k_{θ}),

\mathbf{K}_{\theta}(\mathbf{t},\mathbf{t})=\left(\begin{array}[]{cccc}{k_{\theta}\left(t_{1},t_{1}\right)}&{k_{\theta}\left(t_{1},t_{2}\right)}&{\dots}&{k_{\theta}\left(t_{1},t_{i}\right)}\\ {k_{\theta}\left(t_{2},t_{1}\right)}&{k_{\theta}\left(t_{2},t_{2}\right)}&{\dots}&{k_{\theta}\left(t_{2},t_{i}\right)}\\ {\vdots}&{\vdots}&{\vdots}&{\vdots}\\ {k_{\theta}\left(t_{i},t_{1}\right)}&{k_{\theta}\left(t_{i},t_{2}\right)}&{\dots}&{k_{\theta}\left(t_{i},t_{i}\right)}\end{array}\right),

\mathbf{K}_{\theta}(\mathbf{t},\mathbf{t})=\left(\begin{array}[]{cccc}{k_{\theta}\left(t_{1},t_{1}\right)}&{k_{\theta}\left(t_{1},t_{2}\right)}&{\dots}&{k_{\theta}\left(t_{1},t_{i}\right)}\\ {k_{\theta}\left(t_{2},t_{1}\right)}&{k_{\theta}\left(t_{2},t_{2}\right)}&{\dots}&{k_{\theta}\left(t_{2},t_{i}\right)}\\ {\vdots}&{\vdots}&{\vdots}&{\vdots}\\ {k_{\theta}\left(t_{i},t_{1}\right)}&{k_{\theta}\left(t_{i},t_{2}\right)}&{\dots}&{k_{\theta}\left(t_{i},t_{i}\right)}\end{array}\right),

p (y (t)) = N (μ (t), K_{θ} (t, t)) .

p (y (t)) = N (μ (t), K_{θ} (t, t)) .

y (t) = f (t) + η,

y (t) = f (t) + η,

V_{θ} (t, t) = K_{θ} (t, t) + σ_{n}^{2} I

V_{θ} (t, t) = K_{θ} (t, t) + σ_{n}^{2} I

k_{θ} (t_{i}, t_{j}) = h^{2} exp [- (\frac{t _{i} - t _{j}}{λ})^{2}] .

k_{θ} (t_{i}, t_{j}) = h^{2} exp [- (\frac{t _{i} - t _{j}}{λ})^{2}] .

k_{θ} (x_{i}, x_{j}) = σ_{f}^{2} (1 + \frac{5 r}{σ _{l}} + \frac{5 r ^{2}}{3 σ _{l}^{2}}) exp (- \frac{5 r}{σ _{l}}),

k_{θ} (x_{i}, x_{j}) = σ_{f}^{2} (1 + \frac{5 r}{σ _{l}} + \frac{5 r ^{2}}{3 σ _{l}^{2}}) exp (- \frac{5 r}{σ _{l}}),

lo g p (y ∣ t)

lo g p (y ∣ t)

m_{*} = μ (t_{*}) + K_{θ} (t_{*}, t) V_{θ} (t, t)^{- 1} (y - μ (t))

m_{*} = μ (t_{*}) + K_{θ} (t_{*}, t) V_{θ} (t, t)^{- 1} (y - μ (t))

σ_{*}^{2} = k_{θ} (t_{*}, t_{*}) - K_{θ} (t_{*}, t) V_{θ} (t, t)^{- 1} K_{θ} (t, t_{*}) .

σ_{*}^{2} = k_{θ} (t_{*}, t_{*}) - K_{θ} (t_{*}, t) V_{θ} (t, t)^{- 1} K_{θ} (t, t_{*}) .

\overset{ω}{^}_{j, i + 1} = \frac{ω _{j, i}^{α}}{\sum _{k = 1}^{M} ω _{k, i}^{α}}, j = 1, \dots, M,

\overset{ω}{^}_{j, i + 1} = \frac{ω _{j, i}^{α}}{\sum _{k = 1}^{M} ω _{k, i}^{α}}, j = 1, \dots, M,

ω_{j, i + 1} = \frac{ω ^ _{j, i + 1} p ( y ( t _{i + 1} ) ∣ M _{j} )}{\sum _{k = 1}^{M} ω ^ _{k, i + 1} p ( y ( t _{i + 1} ) ∣ M _{k} )}, j = 1, \dots, M,

ω_{j, i + 1} = \frac{ω ^ _{j, i + 1} p ( y ( t _{i + 1} ) ∣ M _{j} )}{\sum _{k = 1}^{M} ω ^ _{k, i + 1} p ( y ( t _{i + 1} ) ∣ M _{k} )}, j = 1, \dots, M,

p (y (t_{i + 1})) \propto Π_{j = 1}^{M} [p_{j} (y (t_{i + 1}))]^{\overset{ω}{^}_{j, i + 1}} .

p (y (t_{i + 1})) \propto Π_{j = 1}^{M} [p_{j} (y (t_{i + 1}))]^{\overset{ω}{^}_{j, i + 1}} .

m_{i + 1}

m_{i + 1}

σ_{i + 1}^{2}

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGaussian Processes and Bayesian Inference · Air Quality Monitoring and Forecasting · Distributed Sensor Networks and Detection Algorithms

Full text

Temporal Data Fusion at the Edge

††thanks: *⋆*Correspondence author. This work was partly supported by National Natural Science Foundation of China (Nos. 61571238, 61906099), Scientific Research Foundation of Nanjing University of Posts and Telecommunications (No. NY218072) and a research fund from Yancheng Big Data Research Institute.

Linfu Yang

*School of Computer Science

Nanjing University of Posts and Telecommunications

*Nanjing, China

[email protected]

Bin Liu*⋆*

*School of Computer Science

Jiangsu Key Lab of Big Data Security $\&$ Intelligent Processing*

*Nanjing University of Posts and Telecommunications

*Nanjing, China

[email protected]

Abstract

As an enabler technique, data fusion has gained great attention in the context of Internet of things (IoT). In traditional settings, data fusion is done at the cloud server. So the data to be fused should be transferred from the sensor nodes to the cloud server before data fusion. Such an application mode of data fusion inherits disturbing concerns from the cloud computing framework, e.g., privacy-leaking, large latency between data capture and computation, excessive ingress bandwidth consumption. We take into account how to do temporal data fusion at the edge to bypass the above issues. We present a Gaussian process based temporal data fusion (GPTDF) method targeted for the problem of sequential online prediction at the edge. The GPTDF method fits the edge computing framework and thus inherits desirable properties from edge computing, such as privacy-preserving, low latency between data capture and computation, and tiny bandwidth consumption. Through a real-data experiment using archived traffic datasets from the Caltrans Performance Measurement System (PeMS), we demonstrate that the application of GPTDF can provide more timely and accurate real-time predictions at the network edge.

Index Terms:

Internet of things, edge computing, temporal data fusion, Gaussian process, traffic flow prediction

I Introduction

In recent years, the Internet of things (IoT) has become ubiquitous due to the advances in sensor and computing technologies and commercial needs from manufacturing industries, smart farming, to autonomous vehicles [1, 2, 3, 4, 5]. As a result of it, there is an exponential increase in the number of network nodes connected to the Internet, which then generates an extreme amount of data that need to be stored and analyzed in a timely fashion. As an enabler technique for data analysis, data fusion has recently gained great attention in the field of IoT [6, 7, 8, 9].

Broadly speaking, data fusion refers to the theory, techniques and tools applied for combining relevant information from multiple sources to provide better decisions or actions than would not be possible if any of these data sources was used individually. Cloud computing is integrated with IoT to handle the massive data [10, 11]. The cloud server can provide elastic virtual resources management, storage capacity, and computation facility. Currently, most of the data fusion processing for IoT is done on the cloud server.

The traditional cloud-based data fusion procedure requires that all data be transferred from the data sources to the cloud server before data fusion. This leads to disturbing concerns, e.g., privacy-leaking, large latency between data capture and computation, and excessive ingress bandwidth consumption.

Fog computing and edge computing have emerged as the new alternative paradigms of cloud computing. They provide possibilities to process data near or at the data source rather than transferring data to the cloud [12, 13]. For a schematic diagram of edge computing, see Fig.1, which shows that, for each sensor node, there is at least one edge server dispatched close to it. The edge server is responsible for processing the data generated at the sensor node. In contrast with cloud computing, which suffers from the inherent speed of light latency, edge computing can enable real-time data processing with negligible latency due to the close distance between the sensor node and the edge server. This is a desirable property for time-sensitive applications like autonomous vehicles. Besides, by processing the data at the edge, the data privacy is preserved and the bandwidth for data relaying is saved.

In this paper, we consider how to do temporal data fusion at the network edge. The goal is to combine strengths of temporal data fusion and edge computing. As coined in [14], temporal data fusion refers to the fusion of data or information acquired over some time. Different from traditional fusion methods that only fuse sensor data at a point in time, temporal fusion aims at inferring dynamic patterns of the system rather than just the system state at a point in time. For clarity, here we take traffic flow prediction as one application instantiation of temporal data fusion. Specifically, we consider how to achieve more timely and accurate real-time predictions at a target edge node by borrowing knowledge from data analyzed at other edge nodes. The challenge to address the above issue lies in that, on one side, we would like to borrow as much related knowledge as possible to overcome the cold start problem when launching the prediction algorithm at the target edge node; meanwhile, on the other side, we hope to transmit as less data as possible among the network nodes to save communication bandwidth, reduce data processing latency and preserve data privacy. We break the above dilemma through a novel algorithm design termed Gaussian process (GP) based temporal data fusion (GPTDF), the efficacy and accuracy of which is demonstrated using a real-data based experiment.

To summarize, the main contribution of this paper is two-fold. First, we propose the concept of temporal data fusion at the edge. To the best of our knowledge, this is the first paper that introduces this concept to the literature. Second, we propose a novel algorithm design, namely GPTDF, which works at the edge to provide sequential online prediction service. The remainder of this paper is organized as follows. Section II briefly introduces the Gaussian process (GP) based approach for capturing the temporal feature from time-series data. Section III presents the GPTDF method. Section IV summarizes the connections and differences between our GPTDF method and the relevant works in the literature. Section V provides experimental results on the application of GPTDF for sequential online traffic flow prediction. Finally, Section VI concludes the paper.

II Gaussian Process based Temporal Feature Capturing

We treat a time-series, $\{t_{i},y(t_{i})\},i=1,2,\ldots,n$ , as a random sample drawn from a Gaussian Process (GP). Here $y(t_{i})$ denotes the $i$ th data point in the time-series, which is observed at time $t_{i}$ . A GP can be seen as a distribution over functions fully specified by a pair of a mean function and a covariance kernel function. For more details on GP and its applications, readers are referred to [15].

Here we use GP to model the mapping relationship from the time variable $t$ to the observation $y(t)$ as follows

[TABLE]

where $\mathcal{GP}\left(\mu,k_{\theta}\right)$ denotes a GP specified by the mean function $\mu(\cdot)$ and the covariance kernel function $k_{\theta}(\cdot,\cdot)$ parameterized by $\theta$ , $f$ is a random function drawn from this GP. Given a set of input locations $\mathbf{t}=\{t_{1},\ldots,t_{i}\}$ , the covariance elements associated with each pair of the input locations can be described by the covariance matrix

[TABLE]

where $k_{\theta}\left(t_{j},t_{k}\right)$ denotes the covariance element between $t_{j}$ and $t_{k}$ , $j,k\in\{1,\ldots,i\}^{2}$ . Then evaluations of $f$ at input locations covered in $\mathbf{t}$ can be taken as a draw from a multi-variate Gaussian distribution,

[TABLE]

Here $\mathbf{y}(\mathbf{t})=\left\{y_{1},y_{2},\ldots,y_{i}\right\}$ denotes dependent function values evaluated at $t_{1},t_{2},\ldots,t_{i}$ , respectively; $\boldsymbol{\mu}(\mathbf{t})$ denotes the mean vector that consists of mean function values, again evaluated at $t_{1},t_{2},\ldots,t_{i}$ , respectively. To take account of the observation noise, we can add a noise item $\eta$ as follows

[TABLE]

where $\eta$ is assumed to be Gaussian distributed, namely $\eta\sim\mathcal{N}\left(0,\sigma_{n}^{2}\right)$ , where $\sigma_{n}^{2}$ denotes the variance. Then the form of the covariance matrix becomes

[TABLE]

where $\mathbf{I}$ denotes the identity matrix.

The covariance kernel function can take different forms. For example, the squared exponential (SE) function, often adopted as the covariance kernel function, is given by [15]

[TABLE]

Its hyper-parameters $\theta\triangleq[h,\lambda]$ describe general properties of our function $f$ [15]. Specifically, the parameter $h$ governs the output scale of $f$ , $\lambda$ determines its input scale, and thus its smoothness. In Fig.2, we show four random functions sampled from the GP, each corresponding to a specific setting of hyper-parameter values. We see that the GP hyper-parameters can provide a quantitative and succinct description for the associated time-series data.

Here we adopt the Matern $5/2$ kernel function [15], given by

[TABLE]

where $r=\sqrt{\left(t_{i}-t_{j}\right)^{T}\left(t_{i}-t_{j}\right)}$ is the Euclidean distance between $t_{i}$ and $t_{j}$ , $\theta\triangleq[\sigma_{f},\sigma_{l}]$ . Now it is the parameter $\sigma_{f}$ that governs the output scale of our function, and $\sigma_{l}$ describes its smoothness. Given a GP approximation of the time-series $\{t_{i},y(t_{i})\},i=1,2,\ldots,n$ , we use parameters $\epsilon\triangleq\{\sigma_{f},\sigma_{l},\sigma_{n}\}$ to describe the temporal structure of this time-series. Given an observed dataset $\{\mathbf{t},\mathbf{y}\}$ , we set the value of $\epsilon$ by maximizing the log marginal likelihood [15]:

[TABLE]

For approaches to solve the above optimization problem, see [16]. As shown above, GP provides a way to capture temporal features from time-series data. GP also provides a way to do prediction. Let consider predicting the data point $y_{*}$ that will be observed at a future time $t_{*}$ based on an observed dataset $\{\mathbf{t},\mathbf{y}\}$ . According to the definition of GP, it can be derived that the distribution of $y_{*}$ conditional on $\{\mathbf{t},\mathbf{y}\}$ is Gaussian with mean [15]

[TABLE]

and variance

[TABLE]

III The Proposed GPTDF Method for Sequential Online Prediction at the Edge

The GPTDF algorithm is targeted for the problem of sequential online prediction at the edge. We take this problem as one application instantiation of the concept of temporal data fusion at the edge.

To do predictions with a learning algorithm, it is required that enough labeled data be collected beforehand for training a prediction model. For the GP model being used here, its hyper-parameters need to be determined in the training procedure before making predictions. This leads to a dilemma termed cold starting. That says, if a new edge server is added into the system, then the prediction can not be performed immediately, since there is no data stored there for training the prediction model.

We propose the GPTDF method to solve the above dilemma (See Algorithm 1 for a pseudo-code to implement it). First, let the target edge server query the cloud server about the temporal features of historical datasets that have been processed at other edge servers. Then the cloud server sends back related temporal feature data to the target edge server. As described in Section II, the temporal feature data corresponding a time-series dataset only consists of three parameters, namely $\sigma_{f},\sigma_{l}$ , and $\sigma_{n}$ .

Suppose that, after the query, the cloud server sends back to the target edge server $M$ groups of hyper-parameter values, $\{\sigma_{f,j},\sigma_{l,j},\sigma_{n,j}\}_{j=1}^{M}$ , each group standing for a report made by another edge server to the cloud server. Now there are $M$ candidate GP models that can be used for sequential online prediction at the target edge server, each model being characterized by a group of hyper-parameter values, namely, $\{\sigma_{f,j},\sigma_{l,j},\sigma_{n,j}\}$ , $j\in\{1,\cdots,M\}$ .

We use a weighted mixture of these models to capture the non-stationary temporal structure of the data that will be observed at the target edge server. For implementing a data-driven automatic tuning of the model weights, we resort to the dynamic model averaging (DMA) technique [17, 18, 19, 20]. Suppose that, at time $t_{i}$ , the weight of the model $\mathcal{M}_{j}$ is $\omega_{j,i}>0$ , $j\in\{1,\ldots,M\}$ , $\sum_{j=1}^{M}\omega_{j,i}=1$ . Then the predictive weights of the models at time $t_{i+1}$ are defined to be

[TABLE]

where $0<\alpha<1$ is termed the forgetting parameter. Upon the arrival of the observation $y(t_{i+1})$ , the model weights are updated according to Bayesian formalism as follows

[TABLE]

where $p\left(y(t_{i+1})|\mathcal{M}_{j}\right)$ denotes the likelihood of the hypothesis $\mathcal{M}_{j}$ given $y(t_{i+1})$ , $j=1,\ldots,M$ .

To combine predictions provided by $\mathcal{M}_{1},\ldots,\mathcal{M}_{M}$ to yield a fused prediction, we resort to the weighted version of the product of experts (PoE) model. Denote the predictive distribution of $y(t_{i+1})$ corresponding to $\mathcal{M}_{j}$ as $p_{j}(y(t_{i+1})|y(t_{i-\tau+1}),\ldots,y(t_{i}))$ (or $p_{j}(y(t_{i+1}))$ for short), where $\tau$ denotes the length of the time window. Then the fused predictive distribution of $y(t_{i+1})$ is defined to be

[TABLE]

Since $p_{j}(y(t_{i+1}))$ , $j=1,\ldots,M$ , are Gaussian, $p(y(t_{i+1}))$ calculated with (13) is still Gaussian, with its mean and variance given by [21]

[TABLE]

where $P_{j}=\left(\sigma_{j,i+1}^{2}\right)^{-1}$ , $m_{j,i+1}$ and $\sigma_{j,i+1}^{2}$ denote the mean and variance associated with $p_{j}(y(t_{i+1}))$ , respectively. The mean $m_{i+1}$ is taken as the prediction of $y(t_{i+1})$ made at time $t_{i}$ . A confidence interval associated with this prediction is also available. For example, a 99.75% confidence interval is shown to be $[m_{i+1}-3\sigma_{i+1},m_{i+1}+3\sigma_{i+1}]$ .

In GPTDF as shown in Algorithm 1, the forgetting parameter $\alpha$ is initialized at 0.9, and $\omega_{j,1}$ is set at $1/M$ , for $j=1,\ldots,M$ .

III-A Algorithm Analysis

In GPTDF, for a dataset, only its temporal features $\sigma_{f},\sigma_{l},\sigma_{n}$ are transferred between the cloud server and the edge servers. Compared with the raw dataset, the size of the temporal feature data is much compressed. Therefore, transferring the feature data between an edge server and the cloud server may only consume negligible bandwidth and take little time. In addition, since only temporal features are transferred between the cloud server and the edge servers, the raw data are invisible for the cloud server and all edge servers except the one that is connected to the data source. That says the data privacy is preserved.

IV Connections to Existent Works

The GPTDF method presented here is related with other GP model based time-series analysis methods in e.g., [22, 23, 20], among which the instant temporal structure learning (INTEL) algorithm of [20] is of most relevance. Both GPTDF and INTEL use multiple GP models. The crucial difference between them lies in that, in the former, each GP model is associated with one specific edge server together with one time-series data that have been analyzed there. That says, for each GP model involved in GPTDF, there is a unique training dataset associated with it. For the INTEL algorithm, except a template model itself, all the other GP models are built based on the template model. They are variants of the template model and there is no training dataset associated with any of them. Conceptually speaking, GPTDF provides a way to fuse different temporal datasets collected from different sensor nodes, while INTEL provides a way to make use of prior knowledge in processing one single temporal dataset.

Our temporal data fusion method proposed here is also relevant with existent data fusion methods developed for IoT applications in e.g., [6, 7, 8]. The biggest difference between our approach and these existing methods is that our approach performs the fusion of temporal data and is run at the edge server, while most of these existent methods are run at the cloud server and do the fusion of non-temporal data.

V Experiment

In this section, we focus on an edge computing application scenario, namely real-time traffic flow prediction at the edge. We seek to experimentally validate that the proposed GPTDF method can provide more timely and accurate predictions at the edge.

In our experiment, we used archived traffic datasets from the Caltrans Performance Measurement System (PeMS) [24]. These data are collected in real-time from over 39,000 individual sensors, which span the freeway system across all major metropolitan areas of the State of California. We selected 19 segments of time-series from the PeMS dataset, 18 of which are treated as historical datasets that have been stored at 18 edge servers, respectively, and the other one is treated as the dataset that is observed and processed at the target edge server. For each dataset, an identical pre-processing operation is used to do data normalization. The normalized dataset has mean zero and standard error one.

The performance metrics used for performance evaluation include the negative log likelihood (NLL), the mean absolute error (MAE), and the mean square error (MSE). For each metric, the smaller is its value, the better the prediction performance it stands for. The resulting prediction performance metrics are presented in Table I. “GPTDF-All” in Table I stands for the GPTDF method that fuses all 18 historical datasets in making predictions at the target edge server. Both “GPTDF-I” and “GPTDF-II” only fuse 4 of the 18 historical datasets in making predictions at the target edge server. The 4 datasets associated with “GPTDF-I” are plotted in Fig.3, and those used by “GPTDF-II” are plotted in Fig.4. The observations that are processed at the target edge server, which we call the target dataset here, are shown in Fig.5. “GP” in Table I represents the GP based prediction method that uses the first $N$ data points of the target dataset for training the GP model. For “GP-I”, “GP-II” and “GP-III”, $N$ takes values 50, 100 and 150, respectively.

As shown in Table I, compared with the non-data-fusion based GP method, the biggest feature of our GPTDF method is a zero-delay in making the predictions. This benefit comes from the temporal data fusion operation, which makes it unnecessary to gather training data to train the model beforehand. In contrast, the working of the non-data-fusion based GP method requires that a number of training data points be available beforehand for use in training the GP model.

It is shown that the best prediction performance is given by “GPTDF-I”, as it produces the most accurate prediction, which is comparable with “GP-III”, while at a cost of zero-delay. In contrast, if “GP-III” is adopted at the edge, one has to bear the cost of the maximum delay. “GPTDF-I” performs better than “GPTDF-All”. It indicates that, for GPTDF, using more models is not certain to lead to better prediction performance.

Despite that “GPTDF-I” and “GPTDF-II” fuse temporal feature information from the same number of historical datasets, the difference in prediction performance between them is significant. This is again confirmed in Fig.6. The GP model hyper-parameter values used in “GPTDF-I” and “GPTDF-II” are presented in Tables II and III, respectively. As is shown, it is the difference in the hyper-parameter values that leads to a significant difference in the prediction performance. In practice, how to select the best subset of the model set that covers all available candidate models is a question to raise up. An easy solution is just to use “GPTDF-All”, as it can produce moderate prediction accuracy at the cost of zero-delay.

As for the “GP” method, it is shown in Table I that, the more training data being used, the higher the prediction accuracy, and the larger the delay in making the first prediction. Note that since the “GP” method uses the first $N$ data points of the target dataset to do model training before making predictions for the follow-up observations, the delay for it making the first prediction is exactly $N$ time steps.

VI Concluding Remarks

In this paper, for the first time, we proposed the concept of temporal data fusion at the edge. Our goal is to combine the strengths of edge computing and temporal data fusion by novel algorithm design. We focused on an application scenario, namely temporal data fusion assisted sequential online prediction at the edge, and proposed the GPTDF method, which inherits desirable properties of edge computing, such as privacy-preserving, low latency between data capture and computation, and low bandwidth consumption. We experimentally validated that the application of GPTDF can provide more timely and accurate predictions at the edge. In this way, we gave a proof-of-concept for temporal data fusion at the edge.

Currently, we only consider the fusion of homogeneous temporal data at the edge, while an interesting question is how to do the fusion of heterogeneous temporal data at the edge. The fusion mechanism may be application dependent. Besides, how to make use of the context information such as the edge servers’ relative locations in the fusion process is also worth further investigations.

Bibliography24

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] A. Al-Fuqaha, M. Guizani, M. Mohammadi, M. Aledhari, and M. Ayyash, “Internet of things: A survey on enabling technologies, protocols, and applications,” IEEE communications surveys & tutorials , vol. 17, no. 4, pp. 2347–2376, 2015.
2[2] E. Ahmed, I. Yaqoob, A. Gani, M. Imran, and M. Guizani, “Internet-of-things-based smart environments: state of the art, taxonomy, and open research challenges,” IEEE Wireless Communications , vol. 23, no. 5, pp. 10–16, 2016.
3[3] Z. Bi, L. Da Xu, and C. Wang, “Internet of things for enterprise systems of modern manufacturing,” IEEE Transactions on industrial informatics , vol. 10, no. 2, pp. 1537–1546, 2014.
4[4] A. Kamilaris, F. Gao, F. X. Prenafeta-Boldú, and M. I. Ali, “Agri-iot: A semantic framework for internet of things-enabled smart farming applications,” in 2016 IEEE 3rd World Forum on Internet of Things (WF-Io T) . IEEE, 2016, pp. 442–447.
5[5] M. Gerla, E. Lee, G. Pau, and U. Lee, “Internet of vehicles: From intelligent grid to autonomous cars and vehicular clouds,” in 2014 IEEE world forum on internet of things (WF-Io T) . IEEE, 2014, pp. 241–246.
6[6] M. Wang, C. Perera, P. Jayaraman, M. Zhang, P. Strazdins, R. Shyamsundar, and R. Ranjan, “City data fusion: Sensor data fusion in the internet of things,” International Journal of Distributed Systems and Technologies (IJDST) , vol. 7, no. 1, pp. 15–36, 2016.
7[7] R. Dautov and S. Distefano, “Distributed data fusion for the Internet of things,” in Int’l Conf. on Parallel Computing Technologies . Springer, 2017, pp. 427–432.
8[8] ——, “Three-level hierarchical data fusion through the Io T, edge, and cloud computing,” in Proc. of the 1st Int’l Conf. on Internet of Things and Machine Learning . ACM, 2017, p. 1.