Learn to Sense: a Meta-learning Based Sensing and Fusion Framework for   Wireless Sensor Networks

Hui Wu; Zhaoyang Zhang; Chunxu Jiao; Chunguang Li; and Tony Q.S. Quek

arXiv:1906.07233·eess.SP·June 19, 2019

Learn to Sense: a Meta-learning Based Sensing and Fusion Framework for Wireless Sensor Networks

Hui Wu, Zhaoyang Zhang, Chunxu Jiao, Chunguang Li, and Tony Q.S. Quek

PDF

TL;DR

This paper introduces a meta-learning based framework for adaptive sensing and data fusion in wireless sensor networks, significantly reducing data redundancy and communication costs while improving field reconstruction accuracy.

Contribution

It proposes a novel two-layer meta-learning framework combining SGD and reinforcement learning for adaptive sensing in WSNs, enhancing efficiency and robustness.

Findings

01

Reduces spatial samples needed for accurate field reconstruction

02

Improves convergence rate of sensing algorithms

03

Outperforms conventional sensing schemes in robustness

Abstract

Wireless sensor networks (WSN) acts as the backbone of Internet of Things (IoT) technology. In WSN, field sensing and fusion are the most commonly seen problems, which involve collecting and processing of a huge volume of spatial samples in an unknown field to reconstruct the field or extract its features. One of the major concerns is how to reduce the communication overhead and data redundancy with prescribed fusion accuracy. In this paper, an integrated communication and computation framework based on meta-learning is proposed to enable adaptive field sensing and reconstruction. It consists of a stochastic-gradient-descent (SGD) based base-learner used for the field model prediction aiming to minimize the average prediction error, and a reinforcement meta-learner aiming to optimize the sensing decision by simultaneously rewarding the error reduction with samples obtained so far and…

Equations136

y_{i} = f (x_{i}) + n_{i},

y_{i} = f (x_{i}) + n_{i},

f (x) = Φ (x) ω,

f (x) = Φ (x) ω,

ϕ_{j} (x) = exp (- \frac{∥ x - c _{j} ∥ ^{2}}{β ^{2}}),

ϕ_{j} (x) = exp (- \frac{∥ x - c _{j} ∥ ^{2}}{β ^{2}}),

ω \in R^{K} min P (ω) := L (ω) \frac{1}{n} i = 1 \sum n L_{i} (ω) + γ Γ (ω),

ω \in R^{K} min P (ω) := L (ω) \frac{1}{n} i = 1 \sum n L_{i} (ω) + γ Γ (ω),

ω_{0} \in R^{K}, ω_{t + 1} = prox_{γ η_{t} Γ} (ω_{t} - η_{t} \hat{ξ}_{t} (ω_{t})),

ω_{0} \in R^{K}, ω_{t + 1} = prox_{γ η_{t} Γ} (ω_{t} - η_{t} \hat{ξ}_{t} (ω_{t})),

ω_{t + 1} = ar g ω min γ η_{t} Γ (ω) + \frac{1}{2} ∥ ω - (ω_{t} - η_{t} \hat{ξ}_{t} (ω_{t})) ∥^{2},

ω_{t + 1} = ar g ω min γ η_{t} Γ (ω) + \frac{1}{2} ∥ ω - (ω_{t} - η_{t} \hat{ξ}_{t} (ω_{t})) ∥^{2},

prox_{γ η_{t} Γ} (ω) = sign (ω) ⊙ [∣ ω ∣ - γ η_{t}]_{+},

prox_{γ η_{t} Γ} (ω) = sign (ω) ⊙ [∣ ω ∣ - γ η_{t}]_{+},

p_{i}^{t} = \frac{∥\nabla L _{i} ( ω _{t} ) ∥}{\sum _{j = 1}^{n} ∥\nabla L _{j} ( ω _{t} ) ∥},

p_{i}^{t} = \frac{∥\nabla L _{i} ( ω _{t} ) ∥}{\sum _{j = 1}^{n} ∥\nabla L _{j} ( ω _{t} ) ∥},

π_{t} = ρ π_{u} + (1 - ρ) π_{v, t},

π_{t} = ρ π_{u} + (1 - ρ) π_{v, t},

L_{meta} = [L (ω_{T}) - L (ω_{0})] + μ ∣ O_{T} ∣.

L_{meta} = [L (ω_{T}) - L (ω_{0})] + μ ∣ O_{T} ∣.

\boldsymbol{\pi}_{t}^{*}=\arg\max_{\boldsymbol{\pi}_{t}}E\Big{[}\sum_{t=0}^{T}R(s_{t},a_{t})\Big{]},

\boldsymbol{\pi}_{t}^{*}=\arg\max_{\boldsymbol{\pi}_{t}}E\Big{[}\sum_{t=0}^{T}R(s_{t},a_{t})\Big{]},

π_{t} (Θ)

π_{t} (Θ)

= P_{f} (a_{t} ∣ o_{t}) ρ_{t} (o_{t}, Θ) + P_{v} (a_{t} ∣ o_{t}) (1 - ρ_{t} (o_{t}, Θ)),

P_{v} (a_{t} = i ∣ o_{t}) = \frac{g _{i} ( t )}{\sum _{i = 1}^{n} g _{i} ( t )},

P_{v} (a_{t} = i ∣ o_{t}) = \frac{g _{i} ( t )}{\sum _{i = 1}^{n} g _{i} ( t )},

g_{i} (t) = {∥\nabla L_{i} (ω_{t}) ∥ 0 i \in O_{t} otherwise .

g_{i} (t) = {∥\nabla L_{i} (ω_{t}) ∥ 0 i \in O_{t} otherwise .

F_{1} = 1 (a_{t - 1} \in O_{t - 1}),

F_{1} = 1 (a_{t - 1} \in O_{t - 1}),

F_{2} = \frac{L _{O_{t - 1}} ( ω _{t} ) - L _{O_{t - 1}} ( ω _{t - 1} )}{L _{O_{t - 1}} ( ω _{t - 1} )},

F_{3} = \frac{L _{O_{t}} ( ω _{t} ) - L _{O_{t - 1}} ( ω _{t - 1} )}{L _{O_{t - 1}} ( ω _{t - 1} )},

F_{4} = \frac{∣ O _{t} ∣ L _{O_{t}} ( ω _{t} ) - ∣ O _{t - 1} ∣ L _{O_{t - 1}} ( ω _{t - 1} )}{∣ O _{t - 1} ∣ L _{O_{t - 1}} ( ω _{t - 1} )},

Θ max J (Θ) = Θ max E_{P_{Θ} (a ∣ s)} R (s, a),

Θ max J (Θ) = Θ max E_{P_{Θ} (a ∣ s)} R (s, a),

\nabla J (Θ)

\nabla J (Θ)

= τ \sum R (τ) P (τ ∣Θ) \nabla lo g P (τ ∣Θ),

\nabla_{\Theta}J=\sum_{t=1}^{T}E_{P_{\Theta}(a_{1:T}|s)}\Big{[}R(s_{t},a_{t})\nabla_{\Theta}\log P_{\Theta}(a_{t}|s_{t})\Big{]},

\nabla_{\Theta}J=\sum_{t=1}^{T}E_{P_{\Theta}(a_{1:T}|s)}\Big{[}R(s_{t},a_{t})\nabla_{\Theta}\log P_{\Theta}(a_{t}|s_{t})\Big{]},

t = 1 \sum T \nabla_{θ} lo g P (a_{t} ∣ s_{t}) v_{t} .

t = 1 \sum T \nabla_{θ} lo g P (a_{t} ∣ s_{t}) v_{t} .

∥ ϕ (u) - ϕ (v) ∥ \leq L ∥ u - v ∥,

∥ ϕ (u) - ϕ (v) ∥ \leq L ∥ u - v ∥,

ϕ (u) \geq ϕ (v) + \nabla ϕ^{T} (v) (u - v) + \frac{σ}{2} ∥ u - v ∥^{2} .

ϕ (u) \geq ϕ (v) + \nabla ϕ^{T} (v) (u - v) + \frac{σ}{2} ∥ u - v ∥^{2} .

\nabla L_{i} (ω) = - 2 (Φ (x_{i}) ω - y_{i}) Φ^{T} (x_{i}) .

\nabla L_{i} (ω) = - 2 (Φ (x_{i}) ω - y_{i}) Φ^{T} (x_{i}) .

∥\nabla L_{i} (ω) - \nabla L_{i} (ω^{'}) ∥

∥\nabla L_{i} (ω) - \nabla L_{i} (ω^{'}) ∥

\leq 2∥Φ (x_{i}) ∥∥ ω - ω^{'} ∥∥ Φ^{T} (x_{i}) ∥

= 2∥Φ (x_{i}) ∥^{2} ∥ ω - ω^{'} ∥

\leq 2 n ∥ ω - ω^{'} ∥.

L (ω) \geq L (ω^{'}) + \nabla L^{T} (ω^{'}) (ω - ω^{'}) + ∥ ω - ω^{'} ∥^{2} .

L (ω) \geq L (ω^{'}) + \nabla L^{T} (ω^{'}) (ω - ω^{'}) + ∥ ω - ω^{'} ∥^{2} .

L_{i} (ω^{'}) + \nabla L_{i} (ω^{'})^{T} (ω - ω^{'}) + ∥ ω - ω^{'} ∥^{2}

L_{i} (ω^{'}) + \nabla L_{i} (ω^{'})^{T} (ω - ω^{'}) + ∥ ω - ω^{'} ∥^{2}

= \tilde{ω}^{'2} + 2 \tilde{ω}^{'} (\tilde{ω} - \tilde{ω}^{'}) + \frac{1}{∥Φ ( x _{i} ) ∥ ^{2}} ∥ \tilde{ω} - \tilde{ω}^{'} ∥^{2}

\leq \tilde{ω}^{'2} + 2 \tilde{ω}^{'} (\tilde{ω} - \tilde{ω}^{'}) + ∥ \tilde{ω} - \tilde{ω}^{'} ∥^{2}

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Learn to Sense: a Meta-learning Based Sensing and Fusion Framework for Wireless Sensor Networks

Hui Wu, Zhaoyang Zhang*†*, Chunxu Jiao, Chunguang Li, and Tony Q.S. Quek Copyright (c) 20xx IEEE. Personal use of this material is permitted. However, permission to use this material for any other purposes must be obtained from the IEEE by sending a request to [email protected]. A preliminary version of this work was presented at the 10th International Conference on Wireless Communications and Signal Processing (WCSP 2018) and was published in its Proceedings (DOI: 10.1109/WCSP.2018.8555926). Corresponding author: Zhaoyang Zhang (Email: [email protected]).This work was supported in part by National Natural Science Foundation of China under Grant 61725104 and 61631003, and Huawei Technologies Co., Ltd, under Grant YBN2018115223.H. Wu and C. Jiao were with College of Information Science and Electronic Engineering, Zhejiang University (ZJU), Hangzhou 310027, China, and are now with Huawei Technologies Co., Ltd, Shanghai, China. Z. Zhang and C. Li are with the College of Information Science and Electronic Engineering, Zhejiang University (ZJU), Hangzhou 310027, China. Tony Quek is with the ISTD Pillar, Singapore University of Technology and Design (SUTD). The authors are also with the ZJU-SUTD IDEA Center of Network Intelligence.

Abstract

Wireless sensor networks (WSN) acts as the backbone of Internet of Things (IoT) technology. In WSN, field sensing and fusion are the most commonly seen problems, which involve collecting and processing of a huge volume of spatial samples in an unknown field to reconstruct the field or extract its features. One of the major concerns is how to reduce the communication overhead and data redundancy with prescribed fusion accuracy. In this paper, an integrated communication and computation framework based on meta-learning is proposed to enable adaptive field sensing and reconstruction. It consists of a stochastic-gradient-descent (SGD) based base-learner used for the field model prediction aiming to minimize the average prediction error, and a reinforcement meta-learner aiming to optimize the sensing decision by simultaneously rewarding the error reduction with samples obtained so far and penalizing the corresponding communication cost. An adaptive sensing algorithm based on the above two-layer meta-learning framework is presented. It actively determines the next most informative sensing location, and thus considerably reduces the spatial samples and yields superior performance and robustness compared with conventional schemes. The convergence behavior of the proposed algorithm is also comprehensively analyzed and simulated. The results reveal that the proposed field sensing algorithm significantly improves the convergence rate.

Index Terms:

Learn to sense, meta-learning, wireless sensor networks, field sensing and reconstruction, stochastic gradient descent (SGD), reinforcement learning.

I Introduction

Internet of Things (IoT) is one of the most promising technologies that has arisen for decades. Through connecting massive communication terminals together, it provides ubiquitous access to almost everything in the world. Among the IoT techniques, wireless sensor network (WSN) is regarded as the backbone due to its capability of collecting, storing, querying and understanding raw sensor data. For instance, the automated switching control of street lamps depends on the monitoring of light intensity via light sensors deployed across the region.

Advanced and intelligent WSN-based IoT has attracted a lot of research interest. Generally, sensing and fusion are two basic problems. In many applications like environment monitoring, a huge volume of spatial samples are to be collected and processed by the fusion center (FC) to extract some field features or reconstruct the field. In such field sensing and reconstruction scenarios, to ensure rapid, accurate and efficient fusion, it is often required to deploy either many less-capable sensors or less sensors each of which capable of collecting many spatial samples, especially when the area of interest is relatively large. Whichever case it is, in addition to the hardware cost of the sensors and the computation cost at the FC, another cost, i.e., the intensive communication between all the sensors and the FC, has become one of the major concerns in system design and realization as spectrum and/or energy resources become stringent. In this regard, how to redesign the sensing, communication and computing processes of a WSN, so as to downsize the dataset and reduce the communication cost with prescribed fusion accuracy, have attracted much attention from both academia and industry.

Much research effort has been devoted to the trade-off between energy consumption and data fusion accuracy for specific field sensing and reconstruction problems. The goal is to design energy-efficient algorithms that effectively reduce the communication cost while maintaining a desired sensing quality. One line of such investigation is to conduct the offline sensor selection based on some information-theoretic criteria before gathering the data. For example, [1, 2, 3, 4] mainly focus on the optimal $k$ -out-of- $n$ sensor selection problem. Among them, [3] exploits the marginal entropy of the sample and [4] tends to optimize the mutual information between the unknown system state and the stochastic output samples to make the choice. It is seen that the sensors are filtered before gathering new measurements in these algorithms. As a result, the choices have to be made based on some prior knowledge of the sensing target. Also, it is worth noting that the offline algorithms suffer from two serious drawbacks: a) high computational complexity considering the NP-hard combinational selection problem; b) poor adaptation to the sensing target’s dynamic changes.

In contrast to the offline approach, another line of approach is to reduce data redundancy in an online fashion through active sampling techniques [5, 6, 7, 8, 10, 11, 12, 13], in which the next sample location is optimally computed based on the previous measurements. Such active online sampling is of particular interest in the study of field sensing and reconstruction problems due to its adaptation to the unknown or dynamic field properties. Moreover, from the IoT perspective, they are extremely appealing because many real-time applications require the FC to onlinely deal with a significant amount of streaming sensor data with low latency. Obviously, the aforementioned offline sampling approach is no longer suitable.

In fact, the idea of active sampling is not totally new. For example, [5] employed a recursive dynamic partition (RDP) based hierarchical approach called “backcasting” to reduce communication costs. [6, 7, 8] analytically demonstrated that such sensing method achieves faster convergence rate. Yet these works are restricted to the sensing of some specific non-parametric inhomogeneous fields. For more general field sensing problems, mobile robotic sensors were preferred [9, 10, 11, 12]. In these works, the key problem lies in how to steer the mobile sensor to the next sensing location in the field based on the information gathered so far. Specifically, for a non-parametric Gaussian process (GP) modeled field, information-theoretic criteria are often utilized to choose the next optimal sensing location. For instance, using a single sensing robot, Suh et al. proposed an environmental monitoring navigation strategy, which effectively maximizes the information gain along the robot s trajectory [9]. Considering a team of sensing agents, [10] proposed an adaptive sampling strategy that picks out the next location through minimizing the uncertainty, i.e., the conditional entropy at the unobserved locations. As for a parametric field model, Popa et al. proposed extended Kalman filter (EKF) [12, 13] based adaptive sampling approach to optimally estimate the parameters.

In summary, online algorithms aim to decline the uncertainty in the knowledge of field distribution. Despite the contributions, these active sampling approaches are faced with challenges on multiple fronts. First, active sampling inevitably induce complex coordination and frequent communication between sensors [14], thus inducing extra communication cost and computational complexity. Second, robot-like sensors have constrained mobility, which confines the next sensing location to a limited region and degrades the overall convergence rate. Last but not least, the algorithms above are mostly task-specific and cannot be transferred to other field sensing tasks. When the field fluctuates or the task changes, they have to re-execute the entire sensing procedure, which is time-consuming and energy-inefficient. Extending to the IoT paradigm, the current active sampling algorithms incur extra burden in implementation, and may not be able to satisfy the requirements of fast and intelligent ambient sensing.

In this paper, we improve the performance of active sampling algorithms through the subtle intrinsic interaction between communication and computation. Intuitively, communication provides additional data for more accurate computation, and in the meantime, computation has the potential to enable more selective sensing and effective communication along the process. Hence, these two should be exquisitely incorporated to develop an efficient sensing algorithm based on integrated communication and computation. In particular, with the help of online reinforcement learning and the state-of-the-art meta-learning techniques, a robust two-layer learning and sensing algorithm, which adaptively determines the most informative sensing location, is presented. It consists of a stochastic-gradient-descent (SGD) based base-learner used for the field model prediction aiming to minimize the average prediction error, and a reinforcement meta-learner aiming to optimize the sensing decision by simultaneously rewarding the error reduction with samples obtained so far and penalizing the corresponding communication cost. It significantly reduces the communication overhead and lays a good foundation for future sensing and fusion system.

To summarize, the contributions of this paper are listed as follows:

•

A two-layer sensing and learning framework based on meta-learning is proposed for field sensing and reconstruction problems. This two-layer meta-learning framework implies a smart explore-and-exploit strategy, which guides the sensing (exploration) by active learning (exploitation), and in turn improves the learning (exploitation) with effective sensing (exploration).

•

An adaptive sensing algorithm based on the above two-layer meta-learning framework is presented. It actively determines the most informative sensing location, and thus considerably reduces the spatial samples and yields superior performance and robustness compared with conventional schemes.

•

The convergence behaviour of the proposed algorithm is also comprehensively analyzed and simulated. The results reveal that for typical scenarios, the proposed field sensing algorithm significantly improves the convergence rate.

The rest of the paper is organized as follows. Section II describes the system model for the specific field sensing and reconstruction problem, and defines the main objective to be achieved. The adaptive two-layer meta-learning based sensing and learning framework is brought out in Section III, and algorithm design of the meta-learner and the base-learner are also discussed there. The asymptotic performances including the convergence behavior of the proposed framework is analyzed in detail in Section IV. Section V shows the simulation settings and the comprehensive simulation results. And finally, Section VI concludes the paper and provides a brief discussion on future works.

II System Model and Problem Formulation

II-A System model

Suppose $n$ sensors are randomly deployed in a $d$ -dimensional field to get some scalar quantity which is determined by an unknown field function $f(x):\mathbb{R}^{d}\rightarrow\mathbb{R}$ , as shown in Fig. 1. The noisy measurement at the $i$ -th sensor is thus given by

[TABLE]

where $x_{i}\in\mathbb{R}^{d}$ is the coordinate of the $i$ -th sensor, and $n_{i}$ is the Gaussian noise. An FC is deployed to collect the measurements from the potential sensors, based on which, the field function $f(x)$ is then reconstructed.

In general, an unknown field can be represented by a combination of parameterized basis (kernel) functions such as the Radial Basis Functions (RBFs) [15] which well captures the local characteristics of almost all nonlinear fields and then effectively approximates the whole fields. Invoking the RBF kernel representation, $f(x)$ can be represented as

[TABLE]

where $\Phi(x)=\left[\phi_{1}(x),\phi_{2}(x),...,\phi_{K}(x)\right]$ denotes the known RBF kernels and $\boldsymbol{\omega}=\left[\omega_{1},\omega_{2},...,\omega_{K}\right]^{T}$ is the corresponding weight vector. In this paper, for useful insights and ease of treatment, we use the isotropic Gaussian kernel-weighted model and select $\phi_{j}(x),~{}j=1,2,...,K,$ as a Gaussian RBF with center $c_{j}$ and constant width $\beta$ . Specifically, we have

[TABLE]

where $\|x-c_{j}\|^{2}$ represents the squared Euclidean distance between sensor location $x$ and the $j$ -th kernel center $c_{j}$ . $\beta$ is empirically chosen and characterizes the locally-decaying speed of the spatial phenomenon. Intuitively, a larger $\beta$ potentially leads to a smoother field or vice versa. Given that $\Phi(x)$ is known and fixed, now the FC only needs to find a good estimation of the weight vector $\boldsymbol{\omega}$ .

II-B Field Reconstruction Based on Sensed Samples

Given enough sensed data, the optimal $\boldsymbol{\omega}$ can be obtained by solving the following optimization problem which minimizes the overall loss function:

[TABLE]

where $L_{i}(\boldsymbol{\omega})=\left(\Phi(x_{i})\boldsymbol{\omega}-y_{i}\right)^{2}$ is the square error at location $x_{i}$ w.r.t. $\boldsymbol{\omega}$ , $L(\boldsymbol{\omega})$ is the average loss over $n$ training data samples $\{D_{1},...,D_{n}\}$ with $D_{i}\triangleq(x_{i},y_{i})$ , and $\Gamma(\boldsymbol{\omega})=\|\boldsymbol{\omega}\|_{1}$ is the regularizer which ensures the sparsity of $\boldsymbol{\omega}$ , and $\gamma>0$ is the regularization parameter.

As a typical variant of stochastic approximation [16], the above L1-norm regularized optimization problem (4) can be recursively solved by a well known method called proximal gradient descent, which can be described by the following update rule for $t=1,2,...$ [17]:

[TABLE]

where $\boldsymbol{\omega}_{0}$ is the initial weight vector. Defining the proximal operator as $\textrm{prox}_{h}(x)=\arg\min_{\boldsymbol{\omega}}h(\boldsymbol{\omega})+\frac{1}{2}\|\boldsymbol{\omega}-x\|^{2}$ , the above rule means that the $t+1$ -th iterate satisfies:

[TABLE]

where $\hat{\xi}_{t}$ denotes an estimate of the gradient $\nabla L(\boldsymbol{\omega}_{t})$ at Step $t$ , and $\eta_{t}$ is the learning rate. Further, given $\Gamma({\boldsymbol{\omega}})=\|{\boldsymbol{\omega}}\|_{1}$ , the proximal mapping is the following shrinkage operation, also known as the “soft-threshold operator”:

[TABLE]

with $[\cdot]_{+}\triangleq\max(\cdot,0)$ .

To calculate $\hat{\xi}_{t}$ , one commonly appeals to batch gradient descent (BGD) or stochastic gradient descent (SGD), as in traditional statistical learning approaches. Since BGD requires the whole dataset to estimate the gradients (i.e., $\hat{\xi}_{t}(\boldsymbol{\omega}_{t})=\nabla L(\boldsymbol{\omega}_{t})$ for Step $t\geq 0$ ), while SGD only uses the data sensed by a randomly selected sensor (i.e., $\hat{\xi}_{t}(\boldsymbol{\omega}_{t})=\nabla L_{i_{t+1}}(\boldsymbol{\omega}_{t})$ for Step $t+1$ ), it is generally less efficient for BGD to be applied to the studied scenario considering its larger overall communication cost and delay needed for the reconstruction. As such, we mainly focus on the SGD approach in this paper, which is illustrated in Fig. 2. Specifically, the environment corresponds to the field which generates data samples at each sensor. In each step, after a new data is sensed, the gradients are calculated based on the previous weight vector and are then used to produce the next one.

II-C How to sense efficiently: uniform or non-uniform sampling?

Due to the largely unknown parameters of the field, reconstructing $f(x)$ from a totally randomly selected data (like the vanilla SGD) at the FC in general still requires a large dataset and thus consumes plenty of communication and computing resources, which makes the field sensing and reconstruction costly and practically inefficient. One intuitive way to relieve this situation is to explore and exploit the most informative data samples from all potential sensors based on the already observed samples, i.e., the field sensing is driven by certain kinds of statistical learning and prediction so as to reduce the overall cost of sensing and communication.

In other words, to enable fast and accurate reconstruction, it is rather crucial to do efficient selective sensing / sampling, i.e., the sampling distribution in the $T$ -step SGD process as illustrated in Fig. 2 should be carefully designed. Note that for vanilla SGD, the sampling distribution for Step $t$ , $\boldsymbol{\pi}_{t}=[p_{1}^{(t)},\cdots,p_{n}^{(t)}]$ are in fact constant, or namely, $p_{i}^{(t)}=1/n$ . By using such a uniform sampling, on one hand, fast startup could be achieved since it does not need all the data to be ready in advance. On the other hand, only single derivative is calculated thus the per iterate computational cost is reduced to $1/n$ in comparison with BGD. However, imaginably, the purely random sampling also inevitably introduces larger data redundancy due to the intrinsic spatial correlation of the field, as well as poorer convergence due to the large deviation of $\nabla L_{i}(\boldsymbol{\omega}_{t})$ with the index $i$ .

One recent promising approach to improve the SGD performance is to incorporate adaptive non-uniform sampling with it. As shown in many recent studies (see [17, 18] and references therein), sampling at a probability distribution in proportion to the relative importance of a data sample with respect to the entire dataset, as represented by its relative norm

[TABLE]

can achieve certain optimum in terms of efficiency and prediction error. Unfortunately, such importance-based sampling has to evaluate the gradients based on the entire dataset and relies on the full knowledge of the target model, which is largely unknown until the sensors are really chosen to sense and send its data to the FC.

Therefore, it is desirable to develop some adaptive sampling algorithm which is able to exploit the information incrementally gathered so far while keeping the capability to explore the unknown portion of the field as the algorithm proceeds. Motivated by this, we enforce the following Markovian greedy sampling scheme:

[TABLE]

where $\boldsymbol{\pi}_{u}$ denotes the fixed uniform sampling distribution as used in vanilla SGD, $\boldsymbol{\pi}_{v,t}$ is the importance-based sampling distribution varying with $t$ as defined in (8) and calculated over the data sampled so far, i.e., $D_{i_{1}},...D_{i_{t}}$ , and $\rho\in[0,1]$ is a tuning parameter. The rationale behind the above equation can be interpreted as follows: the resultant sampling distribution $\boldsymbol{\pi}_{t}$ is a mixture of two laws of distribution that stand for two complementary tendencies of sampling strategy, respectively. The former tends to fully explore the unknown field while the latter tends to effectively exploit the gathered information so far. As such, the above sampling process can be regarded to work in two different states, referred to as “exploration” and “exploitation”, respectively, as depicted in the Markov chain in Fig.3.

III Two-layer Learning and Sensing Algorithm

The Markov chain based sampling framework in the previous Section realizes some trade-off between exploration and exploitation, yet it is not smart enough to avoid all redundant or less significant samples since when the weighting factor $\rho$ is fixed, the uniform random sampling always exists even if the amount of samples are large enough and the convergence is approached.

As a remedy, we can further use a time-varying weight $\rho_{t}$ to tune the proportion of distributions for more flexible trade-off between exploration and exploitation. To this end, we resort to the recently emerged idea in the generic area of machine learning, a.k.a. “learn to learn” or “meta-learning” [19, 20, 21, 22] which enables a learning machine to learn its own learning process so as to learn more intelligently, and propose the so-called “learn to sense” framework which enables a smarter online adjustment of sensing strategy and faster field reconstruction with much less sensing / communication effort.

III-A Two-layer learning and sensing framework

In particular, we propose a two-layer learning and sensing framework, which includes a conventional SGD-based base learner and a high-level reinforcement-learning-based meta-learner, as shown in Fig. 4. The meta-learner aims to generate an optimal sampling policy $\boldsymbol{\pi}_{t}$ that minimizes the “meta-loss” (or equivalently, maximizes the overall reward) which serves as an integrated measure of both the processing gain of the base learner and the sensing / communication cost of the environment, which is defined as follows:

[TABLE]

In the above equation, the first term in the brackets quantifies the increment of average loss (prediction error) at Step $T$ w.r.t. the time of start, while the second term denotes the overall sensing / communication cost paid for the sampled dataset $O_{T}$ with a unit price of $\mu$ . Note that the first term is in general negative and tends smaller as the algorithm proceeds.

In the above two-layer meta-learning framework, the iterative interaction between the meta-learner and the base learner is crucial to obtain the optimal sampling policy as well as the desired reconstruction performance. A reinforcement learning algorithm based on partially-observable Markov decision process (POMDP) is employed for this purpose with details described below.

III-B Meta-learner and base-learner design

The basic algorithms of the meta-learner and the base-learner are designed in the following. As shown in Fig. 4, this two-layer learner can be further transformed into a POMDP-based policy training machine. The basic factors of the related tuple $<s_{t},o_{t},a_{t},\boldsymbol{\pi}_{t},r_{t}>$ are defined in detail as follows:

•

$s_{t}$ is the state represented by the entire dataset and the corresponding status of each data at time $t$ indicating whether it is observed or not. Note that the fusion center makes incremental observation to the field and thus only part of the data are known to it at each time.

•

$o_{t}$ contains the set of observed data at time $t$ .

•

$a_{t}$ denotes the action made by the meta-learner at time $t$ and here is the index of the next sensor location to be sampled.

•

$\boldsymbol{\pi}_{t}$ stands for the sampling policy at time $t$ and here is the conditional probability distribution over the action space $\{1,2,...,n\}$ given the current observed measurements $o_{t}$ .

•

$r_{t}=R(s_{t},a_{t})$ denotes the reward value that the previous action gains at the current state. Invoking (10), here we let $R(s_{t},a_{t})=\mbox{Loss Gap}-\textbf{1}(a_{t}\notin O_{t-1})\times\mu$ , where “Loss Gap” is the prediction error reduced by the model update, $\mu$ is the unit cost induced by extra sensing / communication, and $O_{t}$ is the set of sensor indices corresponding to $o_{t}$ .

The system works in an iterative manner. In iteration $t$ , the agent activates the $a_{t}$ -th sensor to get its data sample $D_{a_{t}}=(x_{a_{t}},y_{a_{t}})$ according to the current sampling policy $\boldsymbol{\pi}_{t}$ . The sampled data is then passed to the base-learner which makes the prediction based on SGD and results in an updated model $\boldsymbol{\omega}_{t}$ . With the set of sensed data or the field information partially observed by the agent grown from $o_{t}$ to $o_{t+1}$ , a state transition occurs, which triggers an update of the reward to $r_{t+1}=R(s_{t},a_{t})$ . With the new observation $o_{t+1}$ and reward $r_{t+1}$ , the agent generates a new sampling policy $\boldsymbol{\pi}_{t+1}$ for the next iteration.

Now let us elaborate more on the meta-learner which tries to learn the optimal sampling policy $\boldsymbol{\pi}_{t}^{*}$ with a goal maximizing the expected reward accumulated over time, i.e.,

[TABLE]

in which the expectation is taken over the sequence of states and actions $\{s_{0},a_{0},s_{1},a_{1},...,s_{T}\}$ . And for effectiveness and learnability, $\boldsymbol{\pi}_{t}$ is often supposed to be within a pmf family parameterized by $\Theta$ as follows:

[TABLE]

where $P_{f}(a_{t}|o_{t})\equiv\frac{1}{n}$ refers to the fixed uniform sampling distribution over the whole dataset, while $P_{v}(a_{t}=i|o_{t})$ is the importance sampling distribution associated with the dataset varying over time and is defined as follows:

[TABLE]

where

[TABLE]

Moreover, $\rho_{t}(o_{t},\Theta)$ is the dynamic weight varying as the learning proceeds. From (12) we can see that the optimal $\Theta$ , i.e., $\Theta^{*}$ , can be solely determined from $\rho_{t}(o_{t},\Theta)$ . Therefore, the goal reduces to finding the best parameterized non-linear mapping from $o_{t}$ to $\rho_{t}\ \in[0,1]$ . For simplicity, a three-layer neural network [22] is chosen to model this non-linear mapping, but note that other deep neural works can also be used. As shown in Fig. 5, the input layer of the three-layer neural network includes the following features:

[TABLE]

In which $L_{O_{t}}(\boldsymbol{\omega}_{t})$ denotes the average loss associated with the already sampled sensors in $O_{t}$ and the current model $\boldsymbol{\omega}_{t}$ . In particular, $F_{1}$ indicates whether $D_{t-1}$ is directly sampled from the existing observed dataset or not, $F_{2}$ shows how much $D_{t-1}$ weighs on the model estimation in the previous step, whereas $F_{3}$ indicates how much it does on the current step, and finally $F_{4}$ indicates the relative loss change after the model update. The hidden units in the middle layer are activated by $\textrm{tanh}(\cdot)$ , whereas $\textrm{sigmoid}(\cdot)$ are used in the output layer to ensure $\rho_{t}(o_{t},\Theta^{*})\in(0,1]$ .

Based on (12), the original policy optimization problem (11) reduces to

[TABLE]

where $R(s,a)$ is the state-action value function. Although $R(s,a)$ is non-differentiable w.r.t. $\Theta$ ,

[TABLE]

where an episode is considered as a trajectory $\tau=\{s_{1},a_{1},r_{1},...,s_{T},a_{T},r_{T}\}$ , and $R(\tau)$ is the cumulative reward obtained from one episode. By summing over the expected rewards at all $T$ steps, the above equation can be rewritten as:

[TABLE]

Invoking the famous Monte-Carlo policy gradient algorithm REINFORCE [23] (see Algorithm 1), the above can be empirically estimated as:

[TABLE]

Here $v_{t}$ is the sampled estimation of $R(s_{t},a_{t})$ from one episode execution of the current sampling policy $P_{\Theta}(a|s)$ : $v_{t}=r_{t}+\lambda r_{t+1}+...+\lambda^{T-t}r_{T}$ , in which $r_{t}$ is the instantaneous reward at step $t$ and $\lambda\in[0,1]$ is the discount factor.

IV Asymptotic Performance Analysis

In this section, we provide an asymptotic behavior and performance analysis of the proposed algorithm.

IV-A Preliminaries

Here, we briefly introduce some key definitions and propositions that are useful there-in-after.

Definition 1: A function $\phi:\mathbb{R}^{d}\rightarrow\mathbb{R}$ is $L$ -Lipschitz if for all $u,~{}v\in\mathbb{R}^{d}$ we have

[TABLE]

where $\|\cdot\|$ is a norm.

Definition 2: A function $\phi:\mathbb{R}^{d}\rightarrow\mathbb{R}$ is $(1/\gamma)$ -smooth if it is differentiable and its gradient is $(1/\gamma)$ -Lipschitz.

Definition 3: A function $\phi:\mathbb{R}^{d}\rightarrow\mathbb{R}$ is $\sigma$ -strongly convex if for all $u,~{}v\in\mathbb{R}^{d}$ we have

[TABLE]

Based upon the above definitions, we give the following lemma.

Lemma 1: i) $L_{i}(\boldsymbol{\omega})$ is $(2n)$ -smooth for $i=1,2,...,n$ , ii) $L(\boldsymbol{\omega})=\frac{1}{n}\sum_{i=1}^{n}L_{i}(\boldsymbol{\omega})$ is $2$ -strongly convex.

Proof:

Recall that $L_{i}(\boldsymbol{\omega})=\left(\Phi(x_{i})\boldsymbol{\omega}-y_{i}\right)^{2}$ , thus the gradient equals

[TABLE]

Due to the Gaussian kernel-based model assumption, we have $\|\Phi(x_{i})\|^{2}\leq n^{2}$ . Then for all $\boldsymbol{\omega},~{}\boldsymbol{\omega}^{\prime}\in\mathbb{R}^{K}$ , the following inequality holds.

[TABLE]

Obviously, the gradient $\nabla L_{i}(\boldsymbol{\omega})$ is $(2n)$ -Lipschitz constant, therefore $L_{i}(\boldsymbol{\omega})$ is $(2n)$ -smooth.

To show $L(\boldsymbol{\omega})$ is $\sigma$ -strongly convex, it is necessary to prove the following inequality for all $\boldsymbol{\omega},~{}\boldsymbol{\omega}^{\prime}\in\mathbb{R}^{K}$ .

[TABLE]

With subsitution $\tilde{\boldsymbol{\omega}}:=y_{i}-\Phi(x_{i})\boldsymbol{\omega}$ , $\tilde{\boldsymbol{\omega}}^{\prime}:=y_{i}-\Phi(x_{i})\boldsymbol{\omega}^{\prime}$ . and by setting $\sigma=2$ , we have

[TABLE]

Summing over $i=1,\cdots,n$ , and invoking $L(\boldsymbol{\omega})=\frac{1}{n}\sum_{i=1}^{n}L_{i}(\boldsymbol{\omega})$ directly yields (27) and completes the proof. ∎

IV-B Main results

Now we now turn to the analysis of the asymptotic performances of our proposed sensing algorithm.

Define $\boldsymbol{\omega}^{*}$ the optimal solution, and introduce the sampling probability

[TABLE]

where $\boldsymbol{\bar{\pi}}^{*}$ is the ideal sampling distribution defined by (8), and $\rho^{*}$ is the optimal weight parameter learned by our policy training scheme.

Lemma 2: Denote $Q^{*}=\sum_{i=1}^{N}\nabla L_{i}(\boldsymbol{\omega}^{*})\nabla L_{i}(\boldsymbol{\omega}^{*})^{T}/(n^{2}\bar{p}_{i}^{*})$ and $H=\nabla^{2}L(\boldsymbol{\omega}^{*})$ the Hessian at point $\boldsymbol{\omega}^{*}$ , and set $\eta_{t}=\frac{1}{2t}$ , then we have:

The sequence $(\boldsymbol{\omega}-\boldsymbol{\omega}^{*})/\sqrt{\eta_{t}}$ converges to a zero-mean Gaussian variable $V\sim N(0,\Sigma)$ , where the covariance matrix $\Sigma$ is the solution to the following Lyapunov equation

[TABLE] 2. 2.

The sequence $\left(L(\boldsymbol{\omega})-L(\boldsymbol{\omega}^{*})\right)/\sqrt{\eta_{t}}$ converges to a random variable $V^{\prime}=(1/2)Z^{T}\Sigma^{1/2}H\Sigma^{1/2}Z$ where $Z$ is a Gaussian vector $N(0,I_{K})$ . The mean of $V^{\prime}$ is $\mathbb{E}(V^{\prime})=tr(H\Sigma)/2$ .

Proof:

Note that the above lemma is the direct result of [24], by utilizing Lemma 1 and the second-order delta-method. Detailed proof is omitted here due to lack of space. ∎

Intuitively, the optimal fixed importance-based sampling distribution in [17], should minimize the mean value $\mathbb{E}(V^{\prime})$ , that is,

[TABLE]

Set ${v^{*}}^{2}=tr(H\Sigma)|_{{\boldsymbol{\pi}}=\boldsymbol{\bar{\pi}}^{*}}$ , the following theorem directly follows from simple standard algebra and some necessary reorganization.

Theorem 1: Let $\Sigma$ be the asymptotic covariance matrix defined in Lemma 2. Then,

[TABLE]

Remark: Theorem 1 implies that normalized error sequence is strictly bounded and more importantly, the asymptotic performance of the proposed algorithm is comparable with the one associated with the best sampling distribution, provided that $\rho^{*}$ is close enough to zero, which however, is not supposed to happen in the early stage of the online algorithm, since the system needs a non-zero $\rho$ to explore the field. Whereas as $t\to\infty$ , consider all the samples have been collected, then $\rho$ can be ideally set to zero, which makes $\boldsymbol{\pi}_{t}\to\boldsymbol{\bar{\pi}}^{*}$ with probability 1.

In this sense, as (9) being the mixture of the uniform and the non-uniform sampling laws, it is safe to argue that its performance falls in between. The lower bound is associated with the uniform sampling vanilla SGD, and has already shown a convergence rate of $O(1/\sqrt{t})$ [25]. The upper bound, whereas, corresponds to the non-uniform sampling defined by (8), whose optimality is shown as below.

Theorem 2: When the non-uniform sampling probability satisfies (8), it maximizes the reduction of the objective value defined by the RHS of (4).

Proof:

Recall that for proximal SGD with non-uniform sampling, where $p_{i}^{t}$ denotes the sampling probability of the $i_{t}$ -th sample at step $t$ , the update rule is written as:

[TABLE]

By setting the derivative of optimization function above as zero, we can easily obtain the following implicit solution:

[TABLE]

Invoking Lemma 1, and setting $\eta_{t}\leq 1/{2n}$ , we have

[TABLE]

In addition, since $\Gamma$ is convex,

[TABLE]

Combining the above two inequalities, we can get the reduction on the objective function, bounded as:

[TABLE]

Therefore, in order to maximize the reduction on the objective value, it is straightforward that the third term in (37) should be minimized, so that the ideal choice of $\{p_{i}^{t}\}$ turns out to be

[TABLE]

∎

Although the above optimality can not be achieved immediately, it can be gradually approximated using the accumulated samples in our proposed scheme.

Corollary 1: The proposed meta-learning based algorithm performs better if samples located at steep slopes of the original field are collected sooner.

Proof:

First, we write

[TABLE]

Since $\|\phi(x_{i})\|\approx\|\phi(x_{j})\|$ for all $i,j\in 1,...,n$ , (38) can be approximated by:

[TABLE]

Therefore, it is straightforward that the ideal sampling distribution is approximately determined by the distribution of residual error, $E^{t}=|y-\Phi(x)\boldsymbol{\omega}_{t}|$ over the entire field. By taking derivative of $E^{t}$ w.r.t $x$ , we have

[TABLE]

where $\tilde{y}^{t}$ stands for the field estimation at step $t$ .

The above equation reveals that the the shape of error function at step $t$ , is well captured by the difference between the first-order derivatives of the original field $y$ as well as the time-varying estimated field $\tilde{y}^{t}$ . In General, it is desirable to quickly build the estimation of $E^{t}$ , using accumulated samples over steps, so as to converge $p_{i}^{t}$ to the ideal $p_{i}^{t*}$ as soon as possible. Thus, larger $\frac{dE^{t}}{dx}$ are more preferred. Further, we argue that $y_{i}^{t}$ will gradually increase from [math] to $y_{i}$ for all $i$ during SGD process. Therefore, it is implied that the proposed algorithm performs better if sensors located at steeper slopes of the original field are selected sooner, which will provide more information of the real error function, thus contribute more to subsequent learning. ∎

Other than the stochastic sampling behavior, the characteristics of the original unknown field also has a great impact on the performance of the proposed algorithm.

Here we introduce $\boldsymbol{\upsilon}=\left(\frac{|y_{1}|}{|y|_{\textrm{max}}},...,\frac{|y_{n}|}{|y|_{\textrm{max}}}\right)$ , normalized by $|y|_{\textrm{max}}=\max_{i}\left({|y_{1}|,...,|y_{n}|}\right)$ .

Theorem 3: Compared to uniform sampling in vanilla SGD, the proposed meta-learning based algorithm improves the convergence rate if $\|\boldsymbol{\upsilon}\|_{1}\ll n$ .

Proof:

The proof is motivated by steps in [17].

[TABLE]

Invoking Lemma 2 where $L(\boldsymbol{\omega})$ is 2-strongly convex, the first term on the right-hand side in the above equation satisfies

[TABLE]

Next, due to the convexity of $\Gamma(\cdot)$ , we have

[TABLE]

Substitute(43) and (44) into (42), yielding

[TABLE]

By taking expectation of both sides, it can be straightforwardly derived that

[TABLE]

Summing the above inequality over $t=1,\cdots,T$ and using $\eta_{t}=\frac{1}{2t}$ , gives rise to the following:

[TABLE]

Further, since

[TABLE]

which shows that each $L_{i}(\boldsymbol{\omega}_{t})$ is upper-bounded by $2|y_{i}|$ . Accordingly, $p_{i}^{t}\leq\frac{|y_{i}|}{\sum_{j=1}^{n}|y_{j}|}$ . Together with (47) we write

[TABLE]

As such, we can view vanilla SGD with uniform sampling as a special case where $|y_{i}|=|y|_{max}$ , and the distribution $p_{i}^{t}=\frac{|y|_{\textrm{max}}}{\sum_{j=1}^{n}|y|_{\textrm{max}}}=\frac{1}{n}$ , thus above inequality now becomes

[TABLE]

Taking ratio between (50) and (49), yields

[TABLE]

which implies the improvement on convergence rate, especially when $\|\boldsymbol{\upsilon}\|_{1}=\sum_{i=1}^{n}\frac{|y_{i}|}{|y|_{\textrm{max}}}\ll n$ . ∎

Remark: Theorem 3 indicates that the performance gain provided by the proposed algorithm is more obvious when the field contains less but distinguishing features. For better illustration, we plot 3 truncated windows of the same length under one-dimensional case. The X axis represents the one-dimensional location, the Y axis stands for the corresponding field value. Note that the peaks within are all set to the same height, thus field (a) and (b) only differ in the number of features, while the (b) and (c) differ in the shape of the feature, i.e., the spread. Comparing $\boldsymbol{\upsilon}$ s of each field, obviously, $\|\boldsymbol{\upsilon}_{b}\|_{1}<\|\boldsymbol{\upsilon}_{a}\|_{1}<\|\boldsymbol{\upsilon}_{c}\|_{1}$ , hence proposed algorithm tends to advances most in field (b).

V Experiments

In this section, we evaluate the performance of the proposed algorithm and compare it with the conventional ones in terms of convergence properties and communication cost.

V-A Experiment Setup

The simulation is conducted for a WSN with $n$ distributedly deployed sensors measuring some unknown environmental quantity. Practically, it can stand for typical environment monitoring scenarios in WSN-based IoT. For instance, the indoor/outdoor temperature field of a residence needs to be estimated through deployed sensors, in order to enable a variety of “Smart Home” applications, such as automated air-conditioning, floor heating, etc.

V-A1 Task Generation

Here for useful insights and simplicity, we only take the one-dimensional scenario for illustration. Note two or more dimensional case can be directly extended.

The 1-D spatial domain is confined to $x\in[-5,5]$ , and the target field function $f(x)$ is supposed to be a weighted sum of $K=50$ potential Gaussian kernels with equally-spaced centers $\{c_{k}\}$ and identical width $\beta=0.4$ . To introduce certain degree of sparsity as well as randomness of the unknown field, $\kappa$ out of $K$ entries of the parameter vector $\boldsymbol{\omega}\in\mathbb{R}^{K\times 1}$ are randomly chosen to be nonzero, i.i.d Gaussian variables. Intuitively, larger $\kappa$ will lead to more complex and fluctuated field.

Second, to simulate the sensing scenario, we assume a total of $n=500$ sensors are randomly deployed throughout the field, and each sensor gets its observation according to $y_{i}=f(x_{i})+n_{i}$ , where $n_{i}\sim N(0,\sigma_{n}^{2})$ is the zero-mean Gaussian noise with variance $\sigma_{n}=0.1$ . Now, a FC begins to access these sensors one at a time. At each access, the FC collects an observation and stores it in the buffer. Meanwhile the sampled observations $\{(x_{1},y_{1}),...,(x_{n},y_{n})\}$ are utilized to train the field model.

As explained in Section II, this particular field reconstruction problem corresponds to finding the optimal parameter satisfying:

[TABLE]

based on all potential measurements $\{(x_{1},y_{1}),...,(x_{n},y_{n})\}$ .

The above field are repeatedly generated for 200 times and the above procedure runs for the same number of times as well, which then be used to meta-train an optimal sampling policy in the sequel.

V-A2 Strategies

Due to the real-time processing nature of the above task, the Proximal SGD mechanism in (5) is leveraged to solve the problem in (52), where the gradient of each step is reflected by each individual observation and the learning rate $\eta_{t}$ and the penalty factor $\gamma$ are set to $1/(2t)$ and 0.08, respectively. On this basis, the only factor that influences the task performance lies in the sampling strategy along the process. Here we compare the proposed meta-learning based sampling with its conventional counterpart, the uniform sampling, by appling them to the above SGD framework. In detail, the two are described below:

•

Meta-based Sampling: The optimal sampling policy based on the proposed two-layer learning and sensing algorithm, meta-trained upon the 200 tasks generated above. The meta-training process on a particular task is listed in Algorithm 2. To avoid over-fitting, we freeze the parameter $\Theta$ after $L=10$ episodes of training on a single task, and then evaluate its performance on the other tasks. We choose the policy that achieves the best expected reward to be the final optimal policy. Fig. 7 qualitatively compares the field reconstruction performance before and after policy optimization. The X-axis represents the one-dimensional spatial location, $x\in[-5,5]$ , and Y-axis denotes the corresponding field value at each location. As shown, with the red solid line being the originally generated field, the blue curve, representing the field reconstruction AFTER policy optimization, is apparently much more accurate than that BEFORE, yet it enjoys a much lower communication cost (or sample numbers). More interestingly, most of the sample sensed (the green crosses in the figure) resides nearby the abrupt changes of the curve such as peaks or valleys which capture the most critical features of the field, whereas the FC tends to allocate less sensing effort to those in the smooth region.

•

Uniform Sampling: The Proximal SGD algorithm with uniform sampling, i.e., $i_{t+1}$ is randomly picked from $1,2,...,n$ .

V-B Experiment Results

We directly apply the meta-trained policy, as well as the uniform sampling strategy to 500 testing tasks with various $\kappa$ , i.e., to the case where the field has changed, either to become more fluctuated or smoother. From Fig. 8(a) to Fig .8(c), it is observed that the meta-learner is capable of providing the base-learner with more crucial and informative training samples, instead of those redundant ones, thus yielding a much better reconstruction performance.

In terms of communication cost, as shown in Fig. 9, as time goes, the number of samples sensed in the proposed meta-learning based sampling policy grows much more slowly in comparison with that of the uniform sampling (note that the benchmark importance sampling always needs to evaluate all samples over the field). Moreover, it gradually converges to some upper bounds that increase with $\kappa$ , indicating that no more samples are needed to meet certain fusion accuracy. It can be interpreted that the meta-learning based sampling policy in fact enables the FC to adaptively shift between exploration and exploitation. The former tends to explore the unobserved portion of the field, whereas the latter makes use of the existing samples without inducing extra communication cost. This way, it “intelligently” decides whether the number of samples is enough or not. Intuitively, the number of samples may increase accordingly given a more complex field with larger $\kappa$ , to guarantee the required reconstruction performance.

We further compare the convergence rate between different sampling strategies, by evaluating the averaged mean squared error (MSE), i.e., the first loss component in (52) at each iteration step. Note that here an additional strategy, known as the “Importance-based Sampling” [17], is also involved for reference purpose. Specifically, it samples ideally according to importance over the entire dataset (whether observed or not), as reflected by its norm of gradient, namely,

[TABLE]

As shown in Fig. 10. On the one hand, the performance of the meta-based sampling scheme is upper-bounded by that of the importance-based sampling, where the gap in-between stands for the incremental learning process of gradually accumulating information of the field. In this sense, we conclude that the meta-based sampling can reach an sub-optimal convergence, but without suffering the communicational expense of collecting all the samples beforehand.

On the other hand, it beats the uniform sampling used in vanilla SGD with a faster average loss dropping rate and a lower prediction error and variance, Nonetheless, as shown in Fig. 11, this kind of convergence victory margin perishes with increasing $\kappa$ , i.e., the actual number of effective kernels and/or $\beta$ , the kernel width. The is because that larger $\kappa$ usually induces more drastic fluctuations to the field, while larger $\beta$ means a smoother field. In both cases, the importances of all potential locations tend to be equal, which makes the meta-based sampling scheme gradually boils down to the uniform sampling. This also verifies the remark on Theorem 3.

VI Conclusion and Future Works

In the paper, we study the WSN-based field sensing and reconstruction problem. we establish a two-layer learning framework based on reinforcement learning, and present the detailed design for an adaptive sampling policy which can actively determine the most informative sensing location and thus significantly reduce the communication cost. Numerical results show that the algorithm brings a remarkable improvement in reconstruction performance and efficiency compared to conventional ones, and it also exhibits good robustness to both information dynamics and the variation of field parameters. However, there are still many interesting problems left open on this topic.

For example, we aim to further enhance the online learning framework to make it more adaptive to the dynamic changes of field features, and to quantify the tradeoff the computational complexity of learning with the sensing and / or communication cost in this framework. Or more fundamentally, we want to derive the closed-form results of how many samples are needed to reconstruct the field by using this framework. We believe these problems are of particular importance and will leave them as our future work.

Bibliography25

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] S. Joshi and S. Boyd, “Sensor selection via convex optimization,” IEEE Trans. on Signal Processing , vol. 57, no. 2, pp. 451–462, 2009.
2[2] F. Ghassemi and V. Krishnamurthy, “Separable approximation for solving the sensor subset selection problem,” IEEE Trans. on Aerospace and Electronic Systems , vol. 47, no. 1, pp. 557–568, 2011.
3[3] P. Sebastiani and P. Henry, “Maximum entropy sampling and optimal bayesian experimental design,” J. of the Royal Statistical Society , vol. 62, no. 1, pp. 145–157, 2000.
4[4] L. Paninski, Asymptotic Theory of Information-Theoretic Experimental Design . MIT Press, 2005.
5[5] R. Willett, A. Martin and R. Nowak, “Backcasting: Adaptive sampling for sensor networks,” in Proc. of Int. Symposium on Information Processing in Sensor Networks , 2004, pp. 124–133.
6[6] R. Nowak and U. Mitra, “Boundary estimation in sensor networks: Theory and methods,” IPSN , vol. 2634, pp. 80–95, 2003.
7[7] C. Rui, R. Willett and R. Nowak, “Faster rates in regression via active learning,” in Proc. of Int. Conf. on Neural Information Processing Systems , 2005, pp. 179–186.
8[8] C. Rui and R. Nowak, Active Learning and Sampling . Springer US, 2008.