Learn to Sense: a Meta-learning Based Sensing and Fusion Framework for Wireless Sensor Networks
Hui Wu, Zhaoyang Zhang, Chunxu Jiao, Chunguang Li, and Tony Q.S. Quek

TL;DR
This paper introduces a meta-learning based framework for adaptive sensing and data fusion in wireless sensor networks, significantly reducing data redundancy and communication costs while improving field reconstruction accuracy.
Contribution
It proposes a novel two-layer meta-learning framework combining SGD and reinforcement learning for adaptive sensing in WSNs, enhancing efficiency and robustness.
Findings
Reduces spatial samples needed for accurate field reconstruction
Improves convergence rate of sensing algorithms
Outperforms conventional sensing schemes in robustness
Abstract
Wireless sensor networks (WSN) acts as the backbone of Internet of Things (IoT) technology. In WSN, field sensing and fusion are the most commonly seen problems, which involve collecting and processing of a huge volume of spatial samples in an unknown field to reconstruct the field or extract its features. One of the major concerns is how to reduce the communication overhead and data redundancy with prescribed fusion accuracy. In this paper, an integrated communication and computation framework based on meta-learning is proposed to enable adaptive field sensing and reconstruction. It consists of a stochastic-gradient-descent (SGD) based base-learner used for the field model prediction aiming to minimize the average prediction error, and a reinforcement meta-learner aiming to optimize the sensing decision by simultaneously rewarding the error reduction with samples obtained so far and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Learn to Sense: a Meta-learning Based Sensing and Fusion Framework for Wireless Sensor Networks
Hui Wu, Zhaoyang Zhang*†*, Chunxu Jiao, Chunguang Li, and Tony Q.S. Quek Copyright (c) 20xx IEEE. Personal use of this material is permitted. However, permission to use this material for any other purposes must be obtained from the IEEE by sending a request to [email protected]. A preliminary version of this work was presented at the 10th International Conference on Wireless Communications and Signal Processing (WCSP 2018) and was published in its Proceedings (DOI: 10.1109/WCSP.2018.8555926). Corresponding author: Zhaoyang Zhang (Email: [email protected]).This work was supported in part by National Natural Science Foundation of China under Grant 61725104 and 61631003, and Huawei Technologies Co., Ltd, under Grant YBN2018115223.H. Wu and C. Jiao were with College of Information Science and Electronic Engineering, Zhejiang University (ZJU), Hangzhou 310027, China, and are now with Huawei Technologies Co., Ltd, Shanghai, China. Z. Zhang and C. Li are with the College of Information Science and Electronic Engineering, Zhejiang University (ZJU), Hangzhou 310027, China. Tony Quek is with the ISTD Pillar, Singapore University of Technology and Design (SUTD). The authors are also with the ZJU-SUTD IDEA Center of Network Intelligence.
Abstract
Wireless sensor networks (WSN) acts as the backbone of Internet of Things (IoT) technology. In WSN, field sensing and fusion are the most commonly seen problems, which involve collecting and processing of a huge volume of spatial samples in an unknown field to reconstruct the field or extract its features. One of the major concerns is how to reduce the communication overhead and data redundancy with prescribed fusion accuracy. In this paper, an integrated communication and computation framework based on meta-learning is proposed to enable adaptive field sensing and reconstruction. It consists of a stochastic-gradient-descent (SGD) based base-learner used for the field model prediction aiming to minimize the average prediction error, and a reinforcement meta-learner aiming to optimize the sensing decision by simultaneously rewarding the error reduction with samples obtained so far and penalizing the corresponding communication cost. An adaptive sensing algorithm based on the above two-layer meta-learning framework is presented. It actively determines the next most informative sensing location, and thus considerably reduces the spatial samples and yields superior performance and robustness compared with conventional schemes. The convergence behavior of the proposed algorithm is also comprehensively analyzed and simulated. The results reveal that the proposed field sensing algorithm significantly improves the convergence rate.
Index Terms:
Learn to sense, meta-learning, wireless sensor networks, field sensing and reconstruction, stochastic gradient descent (SGD), reinforcement learning.
I Introduction
Internet of Things (IoT) is one of the most promising technologies that has arisen for decades. Through connecting massive communication terminals together, it provides ubiquitous access to almost everything in the world. Among the IoT techniques, wireless sensor network (WSN) is regarded as the backbone due to its capability of collecting, storing, querying and understanding raw sensor data. For instance, the automated switching control of street lamps depends on the monitoring of light intensity via light sensors deployed across the region.
Advanced and intelligent WSN-based IoT has attracted a lot of research interest. Generally, sensing and fusion are two basic problems. In many applications like environment monitoring, a huge volume of spatial samples are to be collected and processed by the fusion center (FC) to extract some field features or reconstruct the field. In such field sensing and reconstruction scenarios, to ensure rapid, accurate and efficient fusion, it is often required to deploy either many less-capable sensors or less sensors each of which capable of collecting many spatial samples, especially when the area of interest is relatively large. Whichever case it is, in addition to the hardware cost of the sensors and the computation cost at the FC, another cost, i.e., the intensive communication between all the sensors and the FC, has become one of the major concerns in system design and realization as spectrum and/or energy resources become stringent. In this regard, how to redesign the sensing, communication and computing processes of a WSN, so as to downsize the dataset and reduce the communication cost with prescribed fusion accuracy, have attracted much attention from both academia and industry.
Much research effort has been devoted to the trade-off between energy consumption and data fusion accuracy for specific field sensing and reconstruction problems. The goal is to design energy-efficient algorithms that effectively reduce the communication cost while maintaining a desired sensing quality. One line of such investigation is to conduct the offline sensor selection based on some information-theoretic criteria before gathering the data. For example, [1, 2, 3, 4] mainly focus on the optimal -out-of- sensor selection problem. Among them, [3] exploits the marginal entropy of the sample and [4] tends to optimize the mutual information between the unknown system state and the stochastic output samples to make the choice. It is seen that the sensors are filtered before gathering new measurements in these algorithms. As a result, the choices have to be made based on some prior knowledge of the sensing target. Also, it is worth noting that the offline algorithms suffer from two serious drawbacks: a) high computational complexity considering the NP-hard combinational selection problem; b) poor adaptation to the sensing target’s dynamic changes.
In contrast to the offline approach, another line of approach is to reduce data redundancy in an online fashion through active sampling techniques [5, 6, 7, 8, 10, 11, 12, 13], in which the next sample location is optimally computed based on the previous measurements. Such active online sampling is of particular interest in the study of field sensing and reconstruction problems due to its adaptation to the unknown or dynamic field properties. Moreover, from the IoT perspective, they are extremely appealing because many real-time applications require the FC to onlinely deal with a significant amount of streaming sensor data with low latency. Obviously, the aforementioned offline sampling approach is no longer suitable.
In fact, the idea of active sampling is not totally new. For example, [5] employed a recursive dynamic partition (RDP) based hierarchical approach called “backcasting” to reduce communication costs. [6, 7, 8] analytically demonstrated that such sensing method achieves faster convergence rate. Yet these works are restricted to the sensing of some specific non-parametric inhomogeneous fields. For more general field sensing problems, mobile robotic sensors were preferred [9, 10, 11, 12]. In these works, the key problem lies in how to steer the mobile sensor to the next sensing location in the field based on the information gathered so far. Specifically, for a non-parametric Gaussian process (GP) modeled field, information-theoretic criteria are often utilized to choose the next optimal sensing location. For instance, using a single sensing robot, Suh et al. proposed an environmental monitoring navigation strategy, which effectively maximizes the information gain along the robot s trajectory [9]. Considering a team of sensing agents, [10] proposed an adaptive sampling strategy that picks out the next location through minimizing the uncertainty, i.e., the conditional entropy at the unobserved locations. As for a parametric field model, Popa et al. proposed extended Kalman filter (EKF) [12, 13] based adaptive sampling approach to optimally estimate the parameters.
In summary, online algorithms aim to decline the uncertainty in the knowledge of field distribution. Despite the contributions, these active sampling approaches are faced with challenges on multiple fronts. First, active sampling inevitably induce complex coordination and frequent communication between sensors [14], thus inducing extra communication cost and computational complexity. Second, robot-like sensors have constrained mobility, which confines the next sensing location to a limited region and degrades the overall convergence rate. Last but not least, the algorithms above are mostly task-specific and cannot be transferred to other field sensing tasks. When the field fluctuates or the task changes, they have to re-execute the entire sensing procedure, which is time-consuming and energy-inefficient. Extending to the IoT paradigm, the current active sampling algorithms incur extra burden in implementation, and may not be able to satisfy the requirements of fast and intelligent ambient sensing.
In this paper, we improve the performance of active sampling algorithms through the subtle intrinsic interaction between communication and computation. Intuitively, communication provides additional data for more accurate computation, and in the meantime, computation has the potential to enable more selective sensing and effective communication along the process. Hence, these two should be exquisitely incorporated to develop an efficient sensing algorithm based on integrated communication and computation. In particular, with the help of online reinforcement learning and the state-of-the-art meta-learning techniques, a robust two-layer learning and sensing algorithm, which adaptively determines the most informative sensing location, is presented. It consists of a stochastic-gradient-descent (SGD) based base-learner used for the field model prediction aiming to minimize the average prediction error, and a reinforcement meta-learner aiming to optimize the sensing decision by simultaneously rewarding the error reduction with samples obtained so far and penalizing the corresponding communication cost. It significantly reduces the communication overhead and lays a good foundation for future sensing and fusion system.
To summarize, the contributions of this paper are listed as follows:
- •
A two-layer sensing and learning framework based on meta-learning is proposed for field sensing and reconstruction problems. This two-layer meta-learning framework implies a smart explore-and-exploit strategy, which guides the sensing (exploration) by active learning (exploitation), and in turn improves the learning (exploitation) with effective sensing (exploration).
- •
An adaptive sensing algorithm based on the above two-layer meta-learning framework is presented. It actively determines the most informative sensing location, and thus considerably reduces the spatial samples and yields superior performance and robustness compared with conventional schemes.
- •
The convergence behaviour of the proposed algorithm is also comprehensively analyzed and simulated. The results reveal that for typical scenarios, the proposed field sensing algorithm significantly improves the convergence rate.
The rest of the paper is organized as follows. Section II describes the system model for the specific field sensing and reconstruction problem, and defines the main objective to be achieved. The adaptive two-layer meta-learning based sensing and learning framework is brought out in Section III, and algorithm design of the meta-learner and the base-learner are also discussed there. The asymptotic performances including the convergence behavior of the proposed framework is analyzed in detail in Section IV. Section V shows the simulation settings and the comprehensive simulation results. And finally, Section VI concludes the paper and provides a brief discussion on future works.
II System Model and Problem Formulation
II-A System model
Suppose sensors are randomly deployed in a -dimensional field to get some scalar quantity which is determined by an unknown field function , as shown in Fig. 1. The noisy measurement at the -th sensor is thus given by
[TABLE]
where is the coordinate of the -th sensor, and is the Gaussian noise. An FC is deployed to collect the measurements from the potential sensors, based on which, the field function is then reconstructed.
In general, an unknown field can be represented by a combination of parameterized basis (kernel) functions such as the Radial Basis Functions (RBFs) [15] which well captures the local characteristics of almost all nonlinear fields and then effectively approximates the whole fields. Invoking the RBF kernel representation, can be represented as
[TABLE]
where denotes the known RBF kernels and is the corresponding weight vector. In this paper, for useful insights and ease of treatment, we use the isotropic Gaussian kernel-weighted model and select as a Gaussian RBF with center and constant width . Specifically, we have
[TABLE]
where represents the squared Euclidean distance between sensor location and the -th kernel center . is empirically chosen and characterizes the locally-decaying speed of the spatial phenomenon. Intuitively, a larger potentially leads to a smoother field or vice versa. Given that is known and fixed, now the FC only needs to find a good estimation of the weight vector .
II-B Field Reconstruction Based on Sensed Samples
Given enough sensed data, the optimal can be obtained by solving the following optimization problem which minimizes the overall loss function:
[TABLE]
where is the square error at location w.r.t. , is the average loss over training data samples with , and is the regularizer which ensures the sparsity of , and is the regularization parameter.
As a typical variant of stochastic approximation [16], the above L1-norm regularized optimization problem (4) can be recursively solved by a well known method called proximal gradient descent, which can be described by the following update rule for [17]:
[TABLE]
where is the initial weight vector. Defining the proximal operator as , the above rule means that the -th iterate satisfies:
[TABLE]
where denotes an estimate of the gradient at Step , and is the learning rate. Further, given , the proximal mapping is the following shrinkage operation, also known as the “soft-threshold operator”:
[TABLE]
with .
To calculate , one commonly appeals to batch gradient descent (BGD) or stochastic gradient descent (SGD), as in traditional statistical learning approaches. Since BGD requires the whole dataset to estimate the gradients (i.e., for Step ), while SGD only uses the data sensed by a randomly selected sensor (i.e., for Step ), it is generally less efficient for BGD to be applied to the studied scenario considering its larger overall communication cost and delay needed for the reconstruction. As such, we mainly focus on the SGD approach in this paper, which is illustrated in Fig. 2. Specifically, the environment corresponds to the field which generates data samples at each sensor. In each step, after a new data is sensed, the gradients are calculated based on the previous weight vector and are then used to produce the next one.
II-C How to sense efficiently: uniform or non-uniform sampling?
Due to the largely unknown parameters of the field, reconstructing from a totally randomly selected data (like the vanilla SGD) at the FC in general still requires a large dataset and thus consumes plenty of communication and computing resources, which makes the field sensing and reconstruction costly and practically inefficient. One intuitive way to relieve this situation is to explore and exploit the most informative data samples from all potential sensors based on the already observed samples, i.e., the field sensing is driven by certain kinds of statistical learning and prediction so as to reduce the overall cost of sensing and communication.
In other words, to enable fast and accurate reconstruction, it is rather crucial to do efficient selective sensing / sampling, i.e., the sampling distribution in the -step SGD process as illustrated in Fig. 2 should be carefully designed. Note that for vanilla SGD, the sampling distribution for Step , are in fact constant, or namely, . By using such a uniform sampling, on one hand, fast startup could be achieved since it does not need all the data to be ready in advance. On the other hand, only single derivative is calculated thus the per iterate computational cost is reduced to in comparison with BGD. However, imaginably, the purely random sampling also inevitably introduces larger data redundancy due to the intrinsic spatial correlation of the field, as well as poorer convergence due to the large deviation of with the index .
One recent promising approach to improve the SGD performance is to incorporate adaptive non-uniform sampling with it. As shown in many recent studies (see [17, 18] and references therein), sampling at a probability distribution in proportion to the relative importance of a data sample with respect to the entire dataset, as represented by its relative norm
[TABLE]
can achieve certain optimum in terms of efficiency and prediction error. Unfortunately, such importance-based sampling has to evaluate the gradients based on the entire dataset and relies on the full knowledge of the target model, which is largely unknown until the sensors are really chosen to sense and send its data to the FC.
Therefore, it is desirable to develop some adaptive sampling algorithm which is able to exploit the information incrementally gathered so far while keeping the capability to explore the unknown portion of the field as the algorithm proceeds. Motivated by this, we enforce the following Markovian greedy sampling scheme:
[TABLE]
where denotes the fixed uniform sampling distribution as used in vanilla SGD, is the importance-based sampling distribution varying with as defined in (8) and calculated over the data sampled so far, i.e., , and is a tuning parameter. The rationale behind the above equation can be interpreted as follows: the resultant sampling distribution is a mixture of two laws of distribution that stand for two complementary tendencies of sampling strategy, respectively. The former tends to fully explore the unknown field while the latter tends to effectively exploit the gathered information so far. As such, the above sampling process can be regarded to work in two different states, referred to as “exploration” and “exploitation”, respectively, as depicted in the Markov chain in Fig.3.
III Two-layer Learning and Sensing Algorithm
The Markov chain based sampling framework in the previous Section realizes some trade-off between exploration and exploitation, yet it is not smart enough to avoid all redundant or less significant samples since when the weighting factor is fixed, the uniform random sampling always exists even if the amount of samples are large enough and the convergence is approached.
As a remedy, we can further use a time-varying weight to tune the proportion of distributions for more flexible trade-off between exploration and exploitation. To this end, we resort to the recently emerged idea in the generic area of machine learning, a.k.a. “learn to learn” or “meta-learning” [19, 20, 21, 22] which enables a learning machine to learn its own learning process so as to learn more intelligently, and propose the so-called “learn to sense” framework which enables a smarter online adjustment of sensing strategy and faster field reconstruction with much less sensing / communication effort.
III-A Two-layer learning and sensing framework
In particular, we propose a two-layer learning and sensing framework, which includes a conventional SGD-based base learner and a high-level reinforcement-learning-based meta-learner, as shown in Fig. 4. The meta-learner aims to generate an optimal sampling policy that minimizes the “meta-loss” (or equivalently, maximizes the overall reward) which serves as an integrated measure of both the processing gain of the base learner and the sensing / communication cost of the environment, which is defined as follows:
[TABLE]
In the above equation, the first term in the brackets quantifies the increment of average loss (prediction error) at Step w.r.t. the time of start, while the second term denotes the overall sensing / communication cost paid for the sampled dataset with a unit price of . Note that the first term is in general negative and tends smaller as the algorithm proceeds.
In the above two-layer meta-learning framework, the iterative interaction between the meta-learner and the base learner is crucial to obtain the optimal sampling policy as well as the desired reconstruction performance. A reinforcement learning algorithm based on partially-observable Markov decision process (POMDP) is employed for this purpose with details described below.
III-B Meta-learner and base-learner design
The basic algorithms of the meta-learner and the base-learner are designed in the following. As shown in Fig. 4, this two-layer learner can be further transformed into a POMDP-based policy training machine. The basic factors of the related tuple are defined in detail as follows:
- •
is the state represented by the entire dataset and the corresponding status of each data at time indicating whether it is observed or not. Note that the fusion center makes incremental observation to the field and thus only part of the data are known to it at each time.
- •
contains the set of observed data at time .
- •
denotes the action made by the meta-learner at time and here is the index of the next sensor location to be sampled.
- •
stands for the sampling policy at time and here is the conditional probability distribution over the action space given the current observed measurements .
- •
denotes the reward value that the previous action gains at the current state. Invoking (10), here we let , where “Loss Gap” is the prediction error reduced by the model update, is the unit cost induced by extra sensing / communication, and is the set of sensor indices corresponding to .
The system works in an iterative manner. In iteration , the agent activates the -th sensor to get its data sample according to the current sampling policy . The sampled data is then passed to the base-learner which makes the prediction based on SGD and results in an updated model . With the set of sensed data or the field information partially observed by the agent grown from to , a state transition occurs, which triggers an update of the reward to . With the new observation and reward , the agent generates a new sampling policy for the next iteration.
Now let us elaborate more on the meta-learner which tries to learn the optimal sampling policy with a goal maximizing the expected reward accumulated over time, i.e.,
[TABLE]
in which the expectation is taken over the sequence of states and actions . And for effectiveness and learnability, is often supposed to be within a pmf family parameterized by as follows:
[TABLE]
where refers to the fixed uniform sampling distribution over the whole dataset, while is the importance sampling distribution associated with the dataset varying over time and is defined as follows:
[TABLE]
where
[TABLE]
Moreover, is the dynamic weight varying as the learning proceeds. From (12) we can see that the optimal , i.e., , can be solely determined from . Therefore, the goal reduces to finding the best parameterized non-linear mapping from to . For simplicity, a three-layer neural network [22] is chosen to model this non-linear mapping, but note that other deep neural works can also be used. As shown in Fig. 5, the input layer of the three-layer neural network includes the following features:
[TABLE]
In which denotes the average loss associated with the already sampled sensors in and the current model . In particular, indicates whether is directly sampled from the existing observed dataset or not, shows how much weighs on the model estimation in the previous step, whereas indicates how much it does on the current step, and finally indicates the relative loss change after the model update. The hidden units in the middle layer are activated by , whereas are used in the output layer to ensure .
Based on (12), the original policy optimization problem (11) reduces to
[TABLE]
where is the state-action value function. Although is non-differentiable w.r.t. ,
[TABLE]
where an episode is considered as a trajectory , and is the cumulative reward obtained from one episode. By summing over the expected rewards at all steps, the above equation can be rewritten as:
[TABLE]
Invoking the famous Monte-Carlo policy gradient algorithm REINFORCE [23] (see Algorithm 1), the above can be empirically estimated as:
[TABLE]
Here is the sampled estimation of from one episode execution of the current sampling policy : , in which is the instantaneous reward at step and is the discount factor.
IV Asymptotic Performance Analysis
In this section, we provide an asymptotic behavior and performance analysis of the proposed algorithm.
IV-A Preliminaries
Here, we briefly introduce some key definitions and propositions that are useful there-in-after.
Definition 1: A function is -Lipschitz if for all we have
[TABLE]
where is a norm.
Definition 2: A function is -smooth if it is differentiable and its gradient is -Lipschitz.
Definition 3: A function is -strongly convex if for all we have
[TABLE]
Based upon the above definitions, we give the following lemma.
Lemma 1: i) is -smooth for , ii) is -strongly convex.
Proof:
Recall that , thus the gradient equals
[TABLE]
Due to the Gaussian kernel-based model assumption, we have . Then for all , the following inequality holds.
[TABLE]
Obviously, the gradient is -Lipschitz constant, therefore is -smooth.
To show is -strongly convex, it is necessary to prove the following inequality for all .
[TABLE]
With subsitution , . and by setting , we have
[TABLE]
Summing over , and invoking directly yields (27) and completes the proof. ∎
IV-B Main results
Now we now turn to the analysis of the asymptotic performances of our proposed sensing algorithm.
Define the optimal solution, and introduce the sampling probability
[TABLE]
where is the ideal sampling distribution defined by (8), and is the optimal weight parameter learned by our policy training scheme.
Lemma 2: Denote and the Hessian at point , and set , then we have:
The sequence converges to a zero-mean Gaussian variable , where the covariance matrix is the solution to the following Lyapunov equation
[TABLE] 2. 2.
The sequence converges to a random variable where is a Gaussian vector . The mean of is .
Proof:
Note that the above lemma is the direct result of [24], by utilizing Lemma 1 and the second-order delta-method. Detailed proof is omitted here due to lack of space. ∎
Intuitively, the optimal fixed importance-based sampling distribution in [17], should minimize the mean value , that is,
[TABLE]
Set , the following theorem directly follows from simple standard algebra and some necessary reorganization.
Theorem 1: Let be the asymptotic covariance matrix defined in Lemma 2. Then,
[TABLE]
Remark: Theorem 1 implies that normalized error sequence is strictly bounded and more importantly, the asymptotic performance of the proposed algorithm is comparable with the one associated with the best sampling distribution, provided that is close enough to zero, which however, is not supposed to happen in the early stage of the online algorithm, since the system needs a non-zero to explore the field. Whereas as , consider all the samples have been collected, then can be ideally set to zero, which makes with probability 1.
In this sense, as (9) being the mixture of the uniform and the non-uniform sampling laws, it is safe to argue that its performance falls in between. The lower bound is associated with the uniform sampling vanilla SGD, and has already shown a convergence rate of [25]. The upper bound, whereas, corresponds to the non-uniform sampling defined by (8), whose optimality is shown as below.
Theorem 2: When the non-uniform sampling probability satisfies (8), it maximizes the reduction of the objective value defined by the RHS of (4).
Proof:
Recall that for proximal SGD with non-uniform sampling, where denotes the sampling probability of the -th sample at step , the update rule is written as:
[TABLE]
By setting the derivative of optimization function above as zero, we can easily obtain the following implicit solution:
[TABLE]
Invoking Lemma 1, and setting , we have
[TABLE]
In addition, since is convex,
[TABLE]
Combining the above two inequalities, we can get the reduction on the objective function, bounded as:
[TABLE]
Therefore, in order to maximize the reduction on the objective value, it is straightforward that the third term in (37) should be minimized, so that the ideal choice of turns out to be
[TABLE]
∎
Although the above optimality can not be achieved immediately, it can be gradually approximated using the accumulated samples in our proposed scheme.
Corollary 1: The proposed meta-learning based algorithm performs better if samples located at steep slopes of the original field are collected sooner.
Proof:
First, we write
[TABLE]
Since for all , (38) can be approximated by:
[TABLE]
Therefore, it is straightforward that the ideal sampling distribution is approximately determined by the distribution of residual error, over the entire field. By taking derivative of w.r.t , we have
[TABLE]
where stands for the field estimation at step .
The above equation reveals that the the shape of error function at step , is well captured by the difference between the first-order derivatives of the original field as well as the time-varying estimated field . In General, it is desirable to quickly build the estimation of , using accumulated samples over steps, so as to converge to the ideal as soon as possible. Thus, larger are more preferred. Further, we argue that will gradually increase from [math] to for all during SGD process. Therefore, it is implied that the proposed algorithm performs better if sensors located at steeper slopes of the original field are selected sooner, which will provide more information of the real error function, thus contribute more to subsequent learning. ∎
Other than the stochastic sampling behavior, the characteristics of the original unknown field also has a great impact on the performance of the proposed algorithm.
Here we introduce , normalized by .
Theorem 3: Compared to uniform sampling in vanilla SGD, the proposed meta-learning based algorithm improves the convergence rate if .
Proof:
The proof is motivated by steps in [17].
[TABLE]
Invoking Lemma 2 where is 2-strongly convex, the first term on the right-hand side in the above equation satisfies
[TABLE]
Next, due to the convexity of , we have
[TABLE]
Substitute(43) and (44) into (42), yielding
[TABLE]
By taking expectation of both sides, it can be straightforwardly derived that
[TABLE]
Summing the above inequality over and using , gives rise to the following:
[TABLE]
Further, since
[TABLE]
which shows that each is upper-bounded by . Accordingly, . Together with (47) we write
[TABLE]
As such, we can view vanilla SGD with uniform sampling as a special case where , and the distribution , thus above inequality now becomes
[TABLE]
Taking ratio between (50) and (49), yields
[TABLE]
which implies the improvement on convergence rate, especially when . ∎
Remark: Theorem 3 indicates that the performance gain provided by the proposed algorithm is more obvious when the field contains less but distinguishing features. For better illustration, we plot 3 truncated windows of the same length under one-dimensional case. The X axis represents the one-dimensional location, the Y axis stands for the corresponding field value. Note that the peaks within are all set to the same height, thus field (a) and (b) only differ in the number of features, while the (b) and (c) differ in the shape of the feature, i.e., the spread. Comparing s of each field, obviously, , hence proposed algorithm tends to advances most in field (b).
V Experiments
In this section, we evaluate the performance of the proposed algorithm and compare it with the conventional ones in terms of convergence properties and communication cost.
V-A Experiment Setup
The simulation is conducted for a WSN with distributedly deployed sensors measuring some unknown environmental quantity. Practically, it can stand for typical environment monitoring scenarios in WSN-based IoT. For instance, the indoor/outdoor temperature field of a residence needs to be estimated through deployed sensors, in order to enable a variety of “Smart Home” applications, such as automated air-conditioning, floor heating, etc.
V-A1 Task Generation
Here for useful insights and simplicity, we only take the one-dimensional scenario for illustration. Note two or more dimensional case can be directly extended.
The 1-D spatial domain is confined to , and the target field function is supposed to be a weighted sum of potential Gaussian kernels with equally-spaced centers and identical width . To introduce certain degree of sparsity as well as randomness of the unknown field, out of entries of the parameter vector are randomly chosen to be nonzero, i.i.d Gaussian variables. Intuitively, larger will lead to more complex and fluctuated field.
Second, to simulate the sensing scenario, we assume a total of sensors are randomly deployed throughout the field, and each sensor gets its observation according to , where is the zero-mean Gaussian noise with variance . Now, a FC begins to access these sensors one at a time. At each access, the FC collects an observation and stores it in the buffer. Meanwhile the sampled observations are utilized to train the field model.
As explained in Section II, this particular field reconstruction problem corresponds to finding the optimal parameter satisfying:
[TABLE]
based on all potential measurements .
The above field are repeatedly generated for 200 times and the above procedure runs for the same number of times as well, which then be used to meta-train an optimal sampling policy in the sequel.
V-A2 Strategies
Due to the real-time processing nature of the above task, the Proximal SGD mechanism in (5) is leveraged to solve the problem in (52), where the gradient of each step is reflected by each individual observation and the learning rate and the penalty factor are set to and 0.08, respectively. On this basis, the only factor that influences the task performance lies in the sampling strategy along the process. Here we compare the proposed meta-learning based sampling with its conventional counterpart, the uniform sampling, by appling them to the above SGD framework. In detail, the two are described below:
- •
Meta-based Sampling: The optimal sampling policy based on the proposed two-layer learning and sensing algorithm, meta-trained upon the 200 tasks generated above. The meta-training process on a particular task is listed in Algorithm 2. To avoid over-fitting, we freeze the parameter after episodes of training on a single task, and then evaluate its performance on the other tasks. We choose the policy that achieves the best expected reward to be the final optimal policy. Fig. 7 qualitatively compares the field reconstruction performance before and after policy optimization. The X-axis represents the one-dimensional spatial location, , and Y-axis denotes the corresponding field value at each location. As shown, with the red solid line being the originally generated field, the blue curve, representing the field reconstruction AFTER policy optimization, is apparently much more accurate than that BEFORE, yet it enjoys a much lower communication cost (or sample numbers). More interestingly, most of the sample sensed (the green crosses in the figure) resides nearby the abrupt changes of the curve such as peaks or valleys which capture the most critical features of the field, whereas the FC tends to allocate less sensing effort to those in the smooth region.
- •
Uniform Sampling: The Proximal SGD algorithm with uniform sampling, i.e., is randomly picked from .
V-B Experiment Results
We directly apply the meta-trained policy, as well as the uniform sampling strategy to 500 testing tasks with various , i.e., to the case where the field has changed, either to become more fluctuated or smoother. From Fig. 8(a) to Fig .8(c), it is observed that the meta-learner is capable of providing the base-learner with more crucial and informative training samples, instead of those redundant ones, thus yielding a much better reconstruction performance.
In terms of communication cost, as shown in Fig. 9, as time goes, the number of samples sensed in the proposed meta-learning based sampling policy grows much more slowly in comparison with that of the uniform sampling (note that the benchmark importance sampling always needs to evaluate all samples over the field). Moreover, it gradually converges to some upper bounds that increase with , indicating that no more samples are needed to meet certain fusion accuracy. It can be interpreted that the meta-learning based sampling policy in fact enables the FC to adaptively shift between exploration and exploitation. The former tends to explore the unobserved portion of the field, whereas the latter makes use of the existing samples without inducing extra communication cost. This way, it “intelligently” decides whether the number of samples is enough or not. Intuitively, the number of samples may increase accordingly given a more complex field with larger , to guarantee the required reconstruction performance.
We further compare the convergence rate between different sampling strategies, by evaluating the averaged mean squared error (MSE), i.e., the first loss component in (52) at each iteration step. Note that here an additional strategy, known as the “Importance-based Sampling” [17], is also involved for reference purpose. Specifically, it samples ideally according to importance over the entire dataset (whether observed or not), as reflected by its norm of gradient, namely,
[TABLE]
As shown in Fig. 10. On the one hand, the performance of the meta-based sampling scheme is upper-bounded by that of the importance-based sampling, where the gap in-between stands for the incremental learning process of gradually accumulating information of the field. In this sense, we conclude that the meta-based sampling can reach an sub-optimal convergence, but without suffering the communicational expense of collecting all the samples beforehand.
On the other hand, it beats the uniform sampling used in vanilla SGD with a faster average loss dropping rate and a lower prediction error and variance, Nonetheless, as shown in Fig. 11, this kind of convergence victory margin perishes with increasing , i.e., the actual number of effective kernels and/or , the kernel width. The is because that larger usually induces more drastic fluctuations to the field, while larger means a smoother field. In both cases, the importances of all potential locations tend to be equal, which makes the meta-based sampling scheme gradually boils down to the uniform sampling. This also verifies the remark on Theorem 3.
VI Conclusion and Future Works
In the paper, we study the WSN-based field sensing and reconstruction problem. we establish a two-layer learning framework based on reinforcement learning, and present the detailed design for an adaptive sampling policy which can actively determine the most informative sensing location and thus significantly reduce the communication cost. Numerical results show that the algorithm brings a remarkable improvement in reconstruction performance and efficiency compared to conventional ones, and it also exhibits good robustness to both information dynamics and the variation of field parameters. However, there are still many interesting problems left open on this topic.
For example, we aim to further enhance the online learning framework to make it more adaptive to the dynamic changes of field features, and to quantify the tradeoff the computational complexity of learning with the sensing and / or communication cost in this framework. Or more fundamentally, we want to derive the closed-form results of how many samples are needed to reconstruct the field by using this framework. We believe these problems are of particular importance and will leave them as our future work.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] S. Joshi and S. Boyd, “Sensor selection via convex optimization,” IEEE Trans. on Signal Processing , vol. 57, no. 2, pp. 451–462, 2009.
- 2[2] F. Ghassemi and V. Krishnamurthy, “Separable approximation for solving the sensor subset selection problem,” IEEE Trans. on Aerospace and Electronic Systems , vol. 47, no. 1, pp. 557–568, 2011.
- 3[3] P. Sebastiani and P. Henry, “Maximum entropy sampling and optimal bayesian experimental design,” J. of the Royal Statistical Society , vol. 62, no. 1, pp. 145–157, 2000.
- 4[4] L. Paninski, Asymptotic Theory of Information-Theoretic Experimental Design . MIT Press, 2005.
- 5[5] R. Willett, A. Martin and R. Nowak, “Backcasting: Adaptive sampling for sensor networks,” in Proc. of Int. Symposium on Information Processing in Sensor Networks , 2004, pp. 124–133.
- 6[6] R. Nowak and U. Mitra, “Boundary estimation in sensor networks: Theory and methods,” IPSN , vol. 2634, pp. 80–95, 2003.
- 7[7] C. Rui, R. Willett and R. Nowak, “Faster rates in regression via active learning,” in Proc. of Int. Conf. on Neural Information Processing Systems , 2005, pp. 179–186.
- 8[8] C. Rui and R. Nowak, Active Learning and Sampling . Springer US, 2008.
