On Bayesian Fisher Information Maximization for Distributed Vector   Estimation

Mojtaba Shirazi; Azadeh Vosoughi

arXiv:1705.00803·cs.IT·May 1, 2020

On Bayesian Fisher Information Maximization for Distributed Vector Estimation

Mojtaba Shirazi, Azadeh Vosoughi

PDF

TL;DR

This paper develops a Bayesian Fisher Information Maximization framework for distributed Gaussian vector estimation, optimizing sensor power allocation under various receiver models and demonstrating near-optimal performance compared to MSE minimization.

Contribution

It derives Bayesian FIM and WWB for different receiver types, formulates and solves power allocation optimization problems, and compares FIM-max schemes with MSE-min schemes in distributed estimation.

Findings

01

FIM-max power allocation improves estimation accuracy.

02

Solutions are distributed and depend on sensor quality and network constraints.

03

FIM-max schemes perform close to MSE-min schemes in practice.

Abstract

We consider the problem of distributed estimation of a Gaussian vector with linear observation model. Each sensor makes a scalar noisy observation of the unknown vector, quantizes its observation, maps it to a digitally modulated symbol, and transmits the symbol over orthogonal power-constrained fading channels to a fusion center (FC). The FC is tasked with fusing the received signals from sensors and estimating the unknown vector. We derive the Bayesian Fisher Information Matrix (FIM) for three types of receivers: (i) coherent receiver (ii) noncoherent receiver with known channel envelopes (iii) noncoherent receiver with known channel statistics only. We also derive the Weiss-Weinstein bound (WWB). We formulate two constrained optimization problems, namely maximizing trace and log-determinant of Bayesian FIM under network transmit power constraint, with sensors transmit powers being…

Figures1

Click any figure to enlarge with its caption.

Equations226

x_{k} = a_{k}^{T} θ + n_{k}, k = 1, ..., K

x_{k} = a_{k}^{T} θ + n_{k}, k = 1, ..., K

m_{k} = m_{k, l}, for x_{k} \in [u_{k, l}, u_{k, l + 1}], l = 1, ..., M_{k}

m_{k} = m_{k, l}, for x_{k} \in [u_{k, l}, u_{k, l + 1}], l = 1, ..., M_{k}

J = E {(\frac{\partial ln p ( m ^ , θ )}{\partial θ}) (\frac{\partial ln p ( m ^ , θ )}{\partial θ})^{T}},

J = E {(\frac{\partial ln p ( m ^ , θ )}{\partial θ}) (\frac{\partial ln p ( m ^ , θ )}{\partial θ})^{T}},

\displaystyle\mathop{\text{maximize}}_{P_{k},\forall k}\ \ \ \

\displaystyle\mathop{\text{maximize}}_{P_{k},\forall k}\ \ \ \

k = 1 \sum K P_{k} \leq P_{t o t}, P_{k} \in R^{+}, \forall k

\displaystyle\mathop{\text{maximize}}_{P_{k},\forall k}\ \ \ \

\displaystyle\mathop{\text{maximize}}_{P_{k},\forall k}\ \ \ \

k = 1 \sum K P_{k} \leq P_{t o t}, P_{k} \in R^{+}, \forall k

I (θ; \hat{θ}) \geq \frac{1}{2} (log_{2} (∣ C_{θ} ∣) - log_{2} (∣ D ∣)) .

I (θ; \hat{θ}) \geq \frac{1}{2} (log_{2} (∣ C_{θ} ∣) - log_{2} (∣ D ∣)) .

I (θ; \hat{θ}) \geq \frac{1}{2} (log_{2} (∣ C_{θ} ∣) + log_{2} (∣ J ∣)) .

I (θ; \hat{θ}) \geq \frac{1}{2} (log_{2} (∣ C_{θ} ∣) + log_{2} (∣ J ∣)) .

J = E {E {(\frac{\partial ln p ( m ^ , θ )}{\partial θ}) (\frac{\partial ln p ( m ^ , θ )}{\partial θ})^{T} ∣ θ}},

J = E {E {(\frac{\partial ln p ( m ^ , θ )}{\partial θ}) (\frac{\partial ln p ( m ^ , θ )}{\partial θ})^{T} ∣ θ}},

J

J

+ E {= Λ (θ) E {(\frac{\partial ln p ( m ^ \arrowvert θ )}{\partial θ}) (\frac{\partial ln p ( m ^ \arrowvert θ )}{\partial θ})^{T}}},

[Ω (θ)]_{ij} = - \frac{\partial ^{2} ln f ( θ )}{\partial θ _{i} \partial θ _{j}}, i, j = 1, ..., q

[Ω (θ)]_{ij} = - \frac{\partial ^{2} ln f ( θ )}{\partial θ _{i} \partial θ _{j}}, i, j = 1, ..., q

[Λ (θ)]_{ij} = - E {\frac{\partial ^{2} ln p ( m ^ \arrowvert θ )}{\partial θ _{i} \partial θ _{j}}}, i, j = 1, ..., q

[Λ (θ)]_{ij} = - E {\frac{\partial ^{2} ln p ( m ^ \arrowvert θ )}{\partial θ _{i} \partial θ _{j}}}, i, j = 1, ..., q

[Λ (θ)]_{ij}

[Λ (θ)]_{ij}

- \frac{1}{p ( m ^ _{k} \arrowvert θ )} \frac{\partial p ( m ^ _{k} \arrowvert θ )}{\partial θ _{i}} \frac{\partial p ( m ^ _{k} \arrowvert θ )}{\partial θ _{j}}]} n \neq = k n = 1 \prod K p (\overset{m}{^}_{n} \arrowvert θ) .

\overset{m}{^}_{1} \sum \dots \overset{m}{^}_{k - 1} \sum \overset{m}{^}_{k + 1} \sum \dots \overset{m}{^}_{K} \sum n \neq = k n = 1 \prod K p (\overset{m}{^}_{n} \arrowvert θ) = 1,

\overset{m}{^}_{1} \sum \dots \overset{m}{^}_{k - 1} \sum \overset{m}{^}_{k + 1} \sum \dots \overset{m}{^}_{K} \sum n \neq = k n = 1 \prod K p (\overset{m}{^}_{n} \arrowvert θ) = 1,

k = 1 \sum K t = 1 \sum M_{k} \frac{\partial ^{2} p ( m ^ _{k, t} \arrowvert θ )}{\partial θ _{i} \partial θ _{j}} = k = 1 \sum K \frac{\partial ^{2}}{\partial θ _{i} \partial θ _{j}} (= 1 t = 1 \sum M_{k} p (\overset{m}{^}_{k, t} \arrowvert θ)) = 0,

k = 1 \sum K t = 1 \sum M_{k} \frac{\partial ^{2} p ( m ^ _{k, t} \arrowvert θ )}{\partial θ _{i} \partial θ _{j}} = k = 1 \sum K \frac{\partial ^{2}}{\partial θ _{i} \partial θ _{j}} (= 1 t = 1 \sum M_{k} p (\overset{m}{^}_{k, t} \arrowvert θ)) = 0,

[Λ (θ)]_{ij} = k = 1 \sum K t = 1 \sum M_{k} (\frac{1}{p ( m ^ _{k, t} \arrowvert θ )} \frac{\partial p ( m ^ _{k, t} \arrowvert θ )}{\partial θ _{i}} \frac{\partial p ( m ^ _{k, t} \arrowvert θ )}{\partial θ _{j}}) .

[Λ (θ)]_{ij} = k = 1 \sum K t = 1 \sum M_{k} (\frac{1}{p ( m ^ _{k, t} \arrowvert θ )} \frac{\partial p ( m ^ _{k, t} \arrowvert θ )}{\partial θ _{i}} \frac{\partial p ( m ^ _{k, t} \arrowvert θ )}{\partial θ _{j}}) .

p (\overset{m}{^}_{k, t} \arrowvert θ) = l = 1 \sum M_{k} = α_{k, t, l} p (\overset{m}{^}_{k, t} \arrowvert m_{k, l}) = β_{k, l} (θ) p (m_{k, l} \arrowvert θ) t = 1, ..., M_{k} .

p (\overset{m}{^}_{k, t} \arrowvert θ) = l = 1 \sum M_{k} = α_{k, t, l} p (\overset{m}{^}_{k, t} \arrowvert m_{k, l}) = β_{k, l} (θ) p (m_{k, l} \arrowvert θ) t = 1, ..., M_{k} .

β_{k, l} (θ)

β_{k, l} (θ)

= (a) Q (\frac{u _{k, l} - a _{k}^{T} θ}{σ _{n_{k}}}) - Q (\frac{u _{k, l + 1} - a _{k}^{T} θ}{σ _{n_{k}}}),

\displaystyle\!\!\frac{\partial p(\hat{m}_{k,t}\arrowvert\boldsymbol{\theta})}{\partial\theta_{i}}\!=\!\sum_{l=1}^{M_{k}}\!{\color[rgb]{0.2,0.3,0.8}\frac{a_{k_{i}}}{\sqrt{2\pi}\sigma_{n_{k}}}\alpha_{k,t,l}\dot{\beta}_{k,l}(\boldsymbol{\theta})},~{}~{}i=1,...,q,

\displaystyle\!\!\frac{\partial p(\hat{m}_{k,t}\arrowvert\boldsymbol{\theta})}{\partial\theta_{i}}\!=\!\sum_{l=1}^{M_{k}}\!{\color[rgb]{0.2,0.3,0.8}\frac{a_{k_{i}}}{\sqrt{2\pi}\sigma_{n_{k}}}\alpha_{k,t,l}\dot{\beta}_{k,l}(\boldsymbol{\theta})},~{}~{}i=1,...,q,

\dot{β}_{k, l} (θ) = exp (- \frac{( u _{k, l} - a _{k}^{T} θ ) ^{2}}{2 σ _{n_{k}}^{2}}) - exp (- \frac{( u _{k, l + 1} - a _{k}^{T} θ ) ^{2}}{2 σ _{n_{k}}^{2}}) .

γ_{k} = \frac{P _{k} ∣ h _{k} ∣ ^{2}}{2 L _{k} σ _{w_{k}}^{2}} .

γ_{k} = \frac{P _{k} ∣ h _{k} ∣ ^{2}}{2 L _{k} σ _{w_{k}}^{2}} .

α_{k, t, l} = E_{k}^{N_{e_{k, t, l}}} (1 - E_{k})^{L_{k} - N_{e_{k, t, l}}} .

α_{k, t, l} = E_{k}^{N_{e_{k, t, l}}} (1 - E_{k})^{L_{k} - N_{e_{k, t, l}}} .

α_{k, t, l} = i = 1 \prod L_{k} [1_{{b_{k, l, i} = \hat{b}_{k, t, i} = 0}} (1 - E_{1_{k}}) + 1_{{b_{k, l, i} = 0, \hat{b}_{k, t, i} = 1}} (E_{1_{k}})

α_{k, t, l} = i = 1 \prod L_{k} [1_{{b_{k, l, i} = \hat{b}_{k, t, i} = 0}} (1 - E_{1_{k}}) + 1_{{b_{k, l, i} = 0, \hat{b}_{k, t, i} = 1}} (E_{1_{k}})

+ 1_{{b_{k, l, i} = 1, \hat{b}_{k, t, i} = 0}} (E_{2_{k}}) + 1_{{b_{k, l, i} = \hat{b}_{k, t, i} = 1}} (1 - E_{2_{k}})],

y_{k,i}=\left\{\begin{array}[]{lr}B_{k}h_{k}+w_{k,i},&{\cal H}_{1,i}:b_{k,l,i}\!=\!1\\ w_{k,i},&{\cal H}_{0,i}:b_{k,l,i}\!=\!0\end{array}\right.

y_{k,i}=\left\{\begin{array}[]{lr}B_{k}h_{k}+w_{k,i},&{\cal H}_{1,i}:b_{k,l,i}\!=\!1\\ w_{k,i},&{\cal H}_{0,i}:b_{k,l,i}\!=\!0\end{array}\right.

\frac{f ( r _{k, i} ∣ H _{1, i} )}{f ( r _{k, i} ∣ H _{0, i} )} H_{0, i} ≷ H_{1, i} \frac{p ( H _{0, i} )}{p ( H _{1, i} )}, i = 1, \dots L_{k},

\frac{f ( r _{k, i} ∣ H _{1, i} )}{f ( r _{k, i} ∣ H _{0, i} )} H_{0, i} ≷ H_{1, i} \frac{p ( H _{0, i} )}{p ( H _{1, i} )}, i = 1, \dots L_{k},

f (r_{k, i} ∣ H_{0, i}, ∣ h_{k} ∣) = \frac{r _{k, i}}{σ _{w_{k}}^{2}} e^{- \frac{r _{k, i}^{2}}{2 σ _{w_{k}}^{2}}},

f (r_{k, i} ∣ H_{0, i}, ∣ h_{k} ∣) = \frac{r _{k, i}}{σ _{w_{k}}^{2}} e^{- \frac{r _{k, i}^{2}}{2 σ _{w_{k}}^{2}}},

f (r_{k, i} ∣ H_{1, i}, ∣ h_{k} ∣) = \frac{r _{k, i}}{σ _{w_{k}}^{2}} e^{- (\frac{r _{k, i}^{2}}{2 σ _{w_{k}}^{2}} + 2 γ_{k})} I_{0} (\frac{2 P _{k}}{L _{k}} \frac{∣ h _{k} ∣ r _{k, i}}{σ _{w_{k}}^{2}}),

f (r_{k, i} ∣ H_{1, i}, ∣ h_{k} ∣) = \frac{r _{k, i}}{σ _{w_{k}}^{2}} e^{- (\frac{r _{k, i}^{2}}{2 σ _{w_{k}}^{2}} + 2 γ_{k})} I_{0} (\frac{2 P _{k}}{L _{k}} \frac{∣ h _{k} ∣ r _{k, i}}{σ _{w_{k}}^{2}}),

E_{1_{k}}

E_{1_{k}}

E_{2_{k}}

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

On Bayesian Fisher Information Maximization for Distributed Vector Estimation

Mojtaba Shirazi, Azadeh Vosoughi Parts of this research were presented at the IEEE 25th Annual International Symposium on Personal, Indoor, and Mobile Radio Communication, 2014, and the 48th Asilomar Conference on Signals, Systems and Computers, 2014 [1], [2]. This research is supported by the NSF under grants CCF-1341966 and CCF-1319770.

Abstract

In this paper we consider the problem of bandwidth-constrained distributed estimation of a Gaussian vector with linear observation model. Each sensor makes a scalar noisy observation of the unknown vector, employs a multi-bit scalar quantizer to quantize its observation, maps it to a digitally modulated symbol. Sensors transmit their symbols over orthogonal power-constrained fading channels to a fusion center (FC). The FC is tasked with fusing the received signals from sensors and estimating the unknown vector. We derive the Bayesian Fisher Information Matrix (FIM) for three types of receivers: (i) coherent receiver (ii) noncoherent receiver with known channel envelopes (iii) noncoherent receiver with known channel statistics only. We also derive the Weiss-Weinstein bound (WWB). We formulate two constrained optimization problems, namely maximizing trace and log-determinant of Bayesian FIM under network transmit power constraint, with sensors’ transmit powers being the optimization variables (we refer to as FIM-max schemes). We show that for coherent receiver, these problems are concave. However, for noncoherent receivers, they are not necessarily concave. The solution to the trace of Bayesian FIM maximization problem can be implemented in a distributed fashion, in the sense that each sensor calculates its own transmit power using its local parameters. On the other hand, the solution to the log-determinant of Bayesian FIM maximization problem cannot be implemented in a distributed fashion and the FC needs to find the powers (using parameters of all sensors) and inform the active sensors of their transmit powers. We numerically investigate how the FIM-max power allocation across sensors depends on the sensors’ observation qualities and physical layer parameters as well as the network transmit power constraint. Moreover, we evaluate the system performance in terms of MSE using the solutions of FIM-max schemes, and compare it with the solution obtained from minimizing the MSE of the LMMSE estimator (MSE-min scheme), and that of uniform power allocation. These comparisons illustrate that, although the WWB is tighter than the inverse of Bayesian FIM, it is still suitable to use FIM-max schemes, since the performance loss in terms of the MSE of the LMMSE estimator is not significant. Furthermore, comparing the performance of different receivers, our numerical results reveal that coherent receiver and noncoherent receiver with known channel statistics have the best and the worst performance, respectively.

Index Terms:

Bayesian Fisher information matrix, coherent versus noncoherent receiver, distributed estimation, Gaussian vector, LMMSE estimator, power allocation, multi-bit quantization,Weiss-Weinstein bound, classical Cramér-Rao bound, best linear unbiased estimator.

I Introduction

The plethora of wireless sensor network (WSN) applications, with practical constraints on network power and bandwidth raises a series of challenging technical problems for system-level engineers [3, 4]. One of these problems is bandwidth-constrained distributed parameter estimation problem, where geographically distributed battery-powered sensors are deployed over a sensing field to monitor physical or environmental conditions [5]. Each sensor makes a noisy observation of the unobservable parameter to be estimated, and transmits its locally processed observation to a fusion center (FC). The FC is tasked with estimating the unknown parameter, via fusing the received data from the sensors with the WSN.

In this work, we consider bandwidth-constrained distributed estimation of a Gaussian vector $\boldsymbol{\theta}$ , where each sensor makes a scalar observation $x_{k}\!=\!\mathbf{a}_{k}^{T}\boldsymbol{\theta}\!+\!n_{k}$ , with $\mathbf{a}_{k}^{T}$ and $n_{k}$ being respectively, the observation vector and the scalar observation noise. We model the bandwidth constraint as limiting the number of quantization bits per observation period that a sensor can send to the FC. Each sensor applies a multi-bit scalar quantizer to quantize its observation, and maps it to a digitally modulated symbol. Sensors transmit their symbols to the FC over orthogonal power-constrained fading channels.

Bandwidth-constrained distributed estimation problem has a long and rich history in both signal processing and information theory literature. Depending on how the bandwidth constraint is modeled, these works can be classified into two classes: the works in the first class model the bandwidth constraint as limiting the number of quantization bits per observation period that a sensor can send to the FC. On the other hand, the works in the second class model the bandwidth constraint as limiting the number of real-valued messages per observation period that a sensor can send to the FC111In these works, each sensor makes a noisy observation vector of (the entire or part of) vector $\boldsymbol{\theta}$ and locally compresses its observation vector. The focus in these works is finding the optimal compression matrices such that the mean square error (MSE) of reconstruction of $\boldsymbol{\theta}$ at FC is minimized.[6, 7, 8, 9, 10]. While quantization is important in the works of the first class, compression is the critical component in the works of the second class. With respect to this classification, our work belongs to the first class.

The works in the first class mentioned above can be further categorized into several subclasses. The two most related subclasses to our work are the works that consider optimal quantization design strategies (dubbed subclass I) and the works that, given quantizers, optimize a network performance metric with respect to energy or power consumption during transmission (dubbed subclass II). Most of the works in subclass I assume that sensors’ quantized observations are sent over bandwidth constrained error-free communication channels. For example, [11, 12, 13, 14] studied this problem for estimating a deterministic scalar unknown parameter. The authors in [15, 16, 17, 18] studied this problem for erroneous bandwidth constrained channels. In particular, this problem was investigated for estimating a deterministic scalar in [15, 17, 18] and for estimating a zero-mean Gaussian scalar in [16]. When addressing the problem, these works have focused on the linear estimator at the fusion center (FC) and studied the MSE distortion pertaining to this linear estimator.

Among the works in subclass II, [15, 19] explored the optimal power allocation scheme that minimizes network transmission power subject to a target MSE constraint. On the contrary, for estimating a deterministic scalar [20, 21] minimized the MSE of the best linear unbiased estimator (BLUE) subject to a network transmit power constraint. The authors in [16, 22], proposed joint transmit power and rate allocation schemes for estimating a random scalar [16] and a random vector [22], where they minimized an upper bound on the MSE of the LMMSE estimator.

As an alternative to the MSE of the best linear estimator (BLUE and LMMSE for estimating deterministic and random unknowns, respectively), one can consider the Cramér-Rao bound (CRB) and its inverse Fisher information, which are widely employed to explore the fundamental limits of a parameter estimation problem, to optimize the power consumption of a resource constrained WSN tasked with distributed estimation. According to the Cramér-Rao inequality [23], maximizing Fisher information minimizes the CRB and Bayesian (classic) CRB sets a lower bound on the MSE of any Bayesian (unbiased) estimator [24]. Within the context of distributed estimation, maximizing Bayesian Fisher information has been adopted before to address sensor selection [25] and optimal quantization design [26, 27]. In particular, [25] investigated the optimal sensor activation strategy with linear observation model, via maximizing trace of Bayesian Fisher information matrix (FIM) subject to energy constraints. [26] derived the optimality conditions of quantizers that maximize the Bayesian Fisher information for conditionally independent and dependent observations. [27] studied the quantizer designs that minimize the MSE of minimum mean square error (MMSE) and maximum a posteriori (MAP) estimators, and compared their performances with the quantizer design that maximizes Fisher information. In [1][2], we presented our preliminary results on deriving Bayesian CRB and studied its behavior with respect to the system parameters for distributed estimation of a Gaussian vector with linear and nonlinear observation models.

Our Contributions: Considering the distributed estimation of a Gaussian vector with linear observation model [22, 28], we formulate two constrained optimization problems, namely, maximization of trace and log-determinant of Bayesian FIM, subject to network transmit power constraint, where sensors’ transmit powers are the optimization variables. We link log-determinant of Bayesian FIM to the mutual information between the unknown vector and its Bayesian estimator. We derive Bayesian FIM and the Weiss-Weinstein bound (WWB), which is known to be one of the tightest Bayesian bounds [29]. We develope two transmit power allocation schemes from solving the two formulated problems (which we refer to as FIM-max schemes). We derive the MSE corresponding to the LMMSE estimator at the FC for coherent and noncoherent receivers. Our numerical results demonstrate the effectiveness of FIM-max schemes, as these power allocations perform close to the power allocation obtained from minimizing the MSE of LMMSE estimator, and outperform uniform power allocation. Based on these results, we draw the conclusion that although the WWB is tighter than the Bayesian CRB in our problem (and Bayesian CRB is not attainable), it is still appropriate to use FIM-max schemes, since the performance loss in terms of the MSE of the LMMSE estimator is not significant.

Notations: Matrices are denoted by bold uppercase letters, vectors by bold lowercase letters, and scalars by normal letters. $\mathbb{E}$ denotes the mathematical expectation operator, $||.||$ and $[.]^{T}$ represent the $L^{2}$ norm of a vector and the matrix-vector transpose operation, respectively. tr(.) and $|.|$ indicate trace and determinant of a matrix, respectively, and $|\cal A|$ is the cardinality of set $\cal A$ . $\boldsymbol{A}\!\succ\!\boldsymbol{0}$ ( $\boldsymbol{A}\!\succeq\!\boldsymbol{0}$ ) means that $\boldsymbol{A}$ is a (semi-)positive definite matrix The definition of Q-function is $Q(x)\!\!=\!\!\frac{1}{\sqrt{2\pi}}\!\int_{x}^{\infty}\!e^{-\frac{u^{2}}{2}}du$ , the Marcum-Q function of nonnegative real numbers $a$ and $b$ , denoted as ${\cal Q}(a,b)$ , is defined as [30] ${\cal Q}(a,b)\!=\!\int_{b}^{\infty}\!xe^{-\frac{x^{2}+a^{2}}{2}}I_{0}(ax)dx$ , and the two dimensional Gaussian Q-function, denoted as ${\mathfrak{Q}}\left(x,y;\rho\right)$ , is defined as [31] ${\mathfrak{Q}}\left(x,y;\rho\right)\!=\!\frac{1}{2\pi\sqrt{1\!-\!\rho^{2}}}\!\int_{x}^{\infty}\!\!\int_{y}^{\infty}\!e^{-\frac{u^{2}+v^{2}-2\rho uv}{2\left(1-\rho^{2}\right)}}\!dudv$ . The notations $\mathcal{N}$ and $\mathcal{CN}$ represent Gaussian distribution and complex Gaussian distribution, respectively.

II System Model and Problem Formulation

Suppose there are $K$ spatially-distributed and inhomogeneous sensors, each making a noisy observation of a common unobservable zero-mean Gaussian vector $\theta$ = $[\theta_{1},\theta_{2},...,\theta_{q}]^{T}\!\in\!\mathbb{R}^{q}$ with covariance matrix $\boldsymbol{\mathcal{C}}_{\boldsymbol{\theta}}=\mathbb{E}\{\boldsymbol{\theta}\boldsymbol{\theta}^{T}\}$ . Let $x_{k}$ denote the scalar noisy observation of sensor $k$ (see Fig. 1). Our linear observation model is:

[TABLE]

where $\mathbf{a}_{k}\!=\![a_{k_{1}},a_{k_{2}},...,a_{k_{q}}]^{T}\!\in\!\mathbb{R}^{q}$ is the known observation vector and $n_{k}$ denotes zero-mean Gaussian observation noise with variance $\sigma_{n_{k}}^{2}$ . We assume that $n_{k}$ ’s are uncorrelated across the sensors and also are uncorrelated with $\theta$ . Sensor $k$ employs a scalar quantizer with $M_{k}\!=\!2^{L_{k}}$ quantization levels $m_{k,l},\ l\!=\!1,...,M_{k}$ where $l$ is the index of the quantization level. In particular, the quantizer maps $x_{k}$ to one of the quantization levels $m_{k}\in\{m_{k,1},...,m_{k,M_{k}}\}$ as the following:

[TABLE]

where $u_{k,l},\ l\!=\!1,...,M_{k}\!+\!1$ , are the quantization boundaries. Following quantization, sensor $k$ employs a fixed length encoder, which encodes the index $l$ corresponding to the quantization level $m_{k,l}$ to a binary sequence of length $L_{k}=\log_{2}M_{k}$ according to natural binary encoding222Natural binary encoding is needed for the derivations of Bayesian FIM. [16, 22], and finally modulates these $L_{k}$ bits into $L_{k}$ binary symbols. Let $P_{k}$ denote the average transmit power corresponding to $L_{k}$ symbols from sensor $k$ , which is equally distributed among $L_{k}$ symbols. We consider two types of modulators, Binary Phase Shift Keying (BPSK) modulator, which maps each bit of $L_{k}$ -bit sequence into one symbol with transmit power $P_{k}/L_{k}$ , and On-Off Keying (OOK) modulator, which maps each “1” bit of $L_{k}$ -bit sequence into one symbol with transmit power $2P_{k}/L_{k}$ and sends no carrier for “0” bit.

Sensors send their modulated symbols to the FC over orthogonal flat fading channels, with fading coefficient $h_{k}=|h_{k}|e^{j\phi_{k}}$ . We assume that channel $h_{k}$ remains constant during the transmission of $L_{k}$ symbols. Denote $w_{k,i}$ as communication channel noise during the transmission of $i$ -th symbol of $L_{k}$ symbols corresponding to sensor $k$ . We assume $w_{k,i}$ ’s are independent across $k$ channels and independent and identically distributed (i.i.d.) across $L_{k}$ transmitted symbols, $w_{k,i}\sim\mathcal{CN}\left(0,2\sigma_{w_{k}}^{2}\right)$ . We further assume that there is a constraint on the network average transmit power, i.e., $\sum_{k=1}^{K}P_{k}\leq P_{tot}$ .

To describe the estimation operation at the FC, let $\hat{m}_{k}$ denote the recovered quantization level corresponding to sensor $k$ , where in general, $\hat{m}_{k}\neq m_{k}$ due to communication channel errors. The FC processes the channel output corresponding to sensor $k$ to recover the transmitted quantization levels $\hat{m}_{k}\in\{\hat{m}_{k,1},...,\hat{m}_{k,M_{k}}\}$ . We consider coherent and noncoherent receivers, corresponding to BPSK and OOK modulation schemes, respectively. For noncoherent receiver, we consider two scenarios: a) channel envelopes $|h_{k}|$ ’s are available at the FC [32], b) only statistics of complex Gaussian channel $h_{k}$ ’s are available at the FC [33]. Having $\{\hat{m}_{1},...,\hat{m}_{K}\}$ , the FC applies a Bayesian estimator to form the estimate $\hat{\boldsymbol{\theta}}$ . We define vector $\boldsymbol{m}=[m_{1},...,m_{K}]^{T}$ which consists of transmitted quantization levels, and vector $\boldsymbol{\hat{m}}=[\hat{m}_{1},...,\hat{m}_{K}]^{T}$ that includes recovered quantization levels at the FC. Let $p(\boldsymbol{\hat{m}},\boldsymbol{\theta})$ denote the joint probability distribution function (pdf) of the recovered quantization levels and the unknown vector $\boldsymbol{\theta}$ . Under certain regularity conditions that are satisfied by Gaussian vectors, the $q\times q$ Bayesian FIM, denoted as $\boldsymbol{J}$ , is defined based on the joint pdf $p(\boldsymbol{\hat{m}},\boldsymbol{\theta})$ as [23, 24, 34]:

[TABLE]

where the expectation is taken over $p(\boldsymbol{\hat{m}},\boldsymbol{\theta})$ .

Our goals are to characterize $\boldsymbol{J}$ and study the transmit power allocation schemes that maximize either tr( $\boldsymbol{J}$ ) [25] or $\textnormal{log}_{\textnormal{2}}(|\boldsymbol{J}|)$ [35], subject to the network average transmit power constraint (which we refer to as FIM-max schemes). In other words, we are interested in solving the following constrained optimization problems333Let CRB denote the Bayesian CRB matrix. We have tr $(\mbox{CRB})\!=\!\text{tr}(\mathbf{J}^{-1})\!\geq\!\frac{q^{2}}{\text{tr}(\mathbf{J})}$ [22] and $\text{log}_{2}(|\mbox{CRB}|)\!=\!\text{log}_{2}(|\mathbf{J}^{-1}|)\!=\!-\text{log}_{2}(|\mathbf{J}|)$ . Therefore, maximizing tr(J) is equivalent to minimizing the lower bound on tr $(\mbox{CRB})$ and maximizing $\text{log}_{2}(|\mathbf{J}|)$ is equivalent to minimizing $\text{log}_{2}(|\mbox{CRB}|)$ .:

[TABLE]

and

[TABLE]

Interestingly, the constrained maximization problem in (II) can be linked to the constrained maximization of mutual information between the unknown $\boldsymbol{\theta}$ and its Bayesian estimator $\hat{\boldsymbol{\theta}}$ . Let $\tilde{\boldsymbol{\theta}}=\boldsymbol{\theta}-\hat{\boldsymbol{\theta}}$ , where $\tilde{\boldsymbol{\theta}}$ is the corresponding estimation error vector. Suppose $\boldsymbol{\mu}=\mathbb{E}\{\tilde{\boldsymbol{\theta}}\}$ and $\boldsymbol{\mathcal{D}}\!=\!\mathbb{E}\{\tilde{\boldsymbol{\theta}}\tilde{\boldsymbol{\theta}}^{T}\}$ are the error mean vector and the MSE matrix, respectively. According to inequality (6) in [24] and using the fact that $\boldsymbol{\theta}$ is Gaussian, we can write:

[TABLE]

On the other hand, under the regularity conditions [23], the inverse of Bayesian FIM establishes a lower bound on the MSE matrix $\boldsymbol{\mathcal{D}}$ . The Bayesian Cramér-Rao inequality states that $\boldsymbol{\mathcal{D}}\succeq\boldsymbol{J}^{-1}$ [23]. Using the concavity of the function log $(|.|)$ on the cone of positive definite Hermitian matrices [36], we conclude that $\textnormal{log}_{\textnormal{2}}(|\boldsymbol{\mathcal{D}}|)\!\geq\!\textnormal{log}_{\textnormal{2}}(|\boldsymbol{J}^{-1}|)\!=\!-\textnormal{log}_{\textnormal{2}}(|\boldsymbol{J}|)$ . Therefore, the lower bound on $I(\boldsymbol{\theta};\hat{\boldsymbol{\theta}})$ is maximized if we substitute $\textnormal{log}_{\textnormal{2}}(|\boldsymbol{\mathcal{D}}|)$ in (5) with $-\textnormal{log}_{\textnormal{2}}(|\boldsymbol{J}|)$ . In other words:

[TABLE]

Based on (6), we observe that the problem in (II) is equivalent to constrained maximization of the mutual information lower bound.

III Characterization of Bayesian FIM

In this section, we characterize $\boldsymbol{J}$ in terms of the optimization parameters $P_{k},\forall k$ . The matrix $\boldsymbol{J}$ in (2) can be expressed as [24, 34]:

[TABLE]

where the first and second expectations are taken over the pdf of $\boldsymbol{\theta}$ , denoted as $f(\boldsymbol{\theta})\!=\!\frac{1}{\sqrt{{(2\pi)}^{q}|\boldsymbol{\cal C}_{\boldsymbol{\theta}}|}}\exp{(-\frac{1}{2}{\boldsymbol{\theta}}^{T}{\boldsymbol{\cal C}_{\boldsymbol{\theta}}}^{-1}\boldsymbol{\theta})}$ and the conditional distribution $p(\boldsymbol{\hat{m}}\arrowvert\boldsymbol{\theta})$ , respectively. Using the Bayes’ rule $p(\boldsymbol{\hat{m}},\boldsymbol{\theta})\!=\!p(\boldsymbol{\hat{m}}\arrowvert\boldsymbol{\theta})f(\boldsymbol{\theta})$ , we can decompose $\boldsymbol{J}$ into two terms:

[TABLE]

in which the outer expectations are taken over $\boldsymbol{\theta}$ . The $q\!\times\!q$ matrix $\boldsymbol{\Omega}(\boldsymbol{\theta})$ only depends on $f$ ( $\theta$ ) [24]. In particular, let $[\boldsymbol{\Omega}(\boldsymbol{\theta})]_{ij}$ denote the $(i,j)$ -th entry of matrix $\boldsymbol{\Omega}(\boldsymbol{\theta})$ . We have [23]:

[TABLE]

Since $\boldsymbol{\theta}$ is Gaussian with covariance matrix $\boldsymbol{\cal C}_{\boldsymbol{\theta}}$ , we obtain $\mathbb{E}\{\boldsymbol{\Omega}(\boldsymbol{\theta})\}={\boldsymbol{\cal C}_{\boldsymbol{\theta}}^{-1}}$ . Let $[\boldsymbol{\Lambda}(\boldsymbol{\theta})]_{ij}$ represent the $(i,j)$ -th entry of matrix $\boldsymbol{\Lambda}(\boldsymbol{\theta})$ . We can write [23]:

[TABLE]

We note that the entries $[\boldsymbol{\Lambda}(\boldsymbol{\theta})]_{ij}$ depend on the parameters of the observation model as well as the physical layer parameters (e.g., modulation scheme, receiver type, channel gain, channel noise, transmit power, and quantization bits). To find $[\boldsymbol{\Lambda}(\boldsymbol{\theta})]_{ij}$ in (8), we need Lemma 1 below, which shows that, given $\boldsymbol{\theta}$ , the entries of vector $\boldsymbol{\hat{m}}$ are conditionally independent.

Lemma 1.

Given our system model we have $p(\boldsymbol{\hat{m}}\arrowvert\boldsymbol{\theta})=\prod_{k=1}^{K}p(\hat{m}_{k}\arrowvert\boldsymbol{\theta})$ . **

Proof.

See Appendix A-A. ∎

Combining the result of Lemma 1 and (8) and recalling that the expectation in (8) is taken with respect to $p(\boldsymbol{\hat{m}}\arrowvert\boldsymbol{\theta})$ , we reach:

[TABLE]

Using the following two facts:

[TABLE]

where index $t$ indicates the quantization level corresponding to $\hat{m}_{k}$ , we find that $[\boldsymbol{\Lambda}(\boldsymbol{\theta})]_{ij}$ reduces to:

[TABLE]

Examining (10) we realize that we need to find two terms in order to fully characterize $[\boldsymbol{\Lambda}(\boldsymbol{\theta})]_{ij}$ : the probability term $p(\hat{m}_{k,t}\arrowvert\boldsymbol{\theta})$ , and its first derivative with respect to $\theta_{i}$ , i.e., $\partial p(\hat{m}_{k,t}|\boldsymbol{\theta})/\partial\theta_{i}$ . In the following, we derive these two terms. According to the Bayes’ rule and the fact that $\boldsymbol{\theta},m_{k},\hat{m}_{k}$ form a Markov chain, we have:

[TABLE]

Considering $p(\hat{m}_{k,t}\arrowvert\boldsymbol{\theta})$ in (11) we realize that each term inside the sum is the product of two probabilities: the first probabilty $\alpha_{k,t,l}$ does not depend on $\boldsymbol{\theta}$ ; it depends on the modulation scheme (BPSK or OOK) and the receiver type at the FC (coherent or noncoherent) as well as the physical layer parameters, i.e., channel errors due to fading and noise, transmit power $P_{k}$ , and number of transmitted bits $L_{k}$ . On the other hand, the second probability $\beta_{k,l}(\boldsymbol{\theta})$ depends on $\boldsymbol{\theta}$ , the observation model and its parameters as well as quantizer. In other words, the contributions of the observation model and quantization in each term inside the sum in (11) are decoupled from those of communication system.

The probability $\beta_{k,l}(\boldsymbol{\theta})$ in (11) becomes:

[TABLE]

in which ( $a$ ) follows from the fact that the conditional pdf of $x_{k}$ given $\boldsymbol{\theta}$ is ${\cal N}(\mathbf{a}_{k}^{T}\boldsymbol{\theta},\sigma_{n_{k}}^{2})$ .

Next, we find ${\partial p(\hat{m}_{k,t}\arrowvert\boldsymbol{\theta})}/{\partial\theta_{i}}$ in (10). Since $\alpha_{k,t,l}$ does not depend on $\boldsymbol{\theta}$ , from (11) we have:

[TABLE]

Now we characterize $\alpha_{k,t,l}$ in $p(\hat{m}_{k,t}\arrowvert\boldsymbol{\theta})$ . As we mentioned before, $\alpha_{k,t,l}$ depends on the modulation scheme and the receiver type at the FC. In this section we derive $\alpha_{k,t,l}$ for BPSK modulation with coherent receiver and OOK modulation with noncoherent receiver. For OOK modulation with noncoherent receiver, we consider two scenarios: a) channel envelopes are available at the FC, b) only channel statistics are available at the FC. We assume that the FC performs a symbol-by-symbol demodulation. To enable derivations of $\alpha_{k,t,l}$ , we let indices $l$ and $t$ , respectively, indicate the quantization levels corresponding to $m_{k}$ and $\hat{m}_{k}$ , and $[b_{k,l,1},\dots b_{k,l,L_{k}}]$ and $[\hat{b}_{k,t,1},\dots\hat{b}_{k,t,L_{k}}]$ , respectively, be the transmitted bit sequence and recovered (received) bit sequence of sensor $k$ .

III-A Coherent Receiver

Suppose the Hamming distance between two bit sequences $[b_{k,l,1},\dots b_{k,l,L_{k}}]$ and $[\hat{b}_{k,t,1},\dots\hat{b}_{k,t,L_{k}}]$ is $N_{e_{k,t,l}}=\sum_{i=1}^{L_{k}}\hat{b}_{k,t,i}\oplus b_{k,l,i}$ , in which $\oplus$ is the Boolean sum operator. We define $\gamma_{k}$ as the channel signal to noise ratio (SNR) of sensor $k$ , where:

[TABLE]

We can model the channel between sensor $k$ and the FC as a binary symmetric channel (BSC) with the probability of flipping a bit ${\cal E}_{k}=Q\left(\sqrt{2\gamma_{k}}\right)$ , where ${\cal E}_{k}$ does not depend on the bit index. Hence, the probability $\alpha_{k,t,l}$ in (11) becomes:

[TABLE]

III-B Noncoherent Receiver

The channel between sensor $k$ and the FC can no longer be modeled as a BSC. Instead, we can model it as a binary asymmetric channel, where ${\cal E}_{1_{k}}$ is the probability that “0” bit is flipped into “1” bit, and ${\cal E}_{2_{k}}$ is the probability that “1” bit is flipped into “0” bit. Therefore, the probability $\alpha_{k,t,l}$ in (11) becomes:

[TABLE]

where $\mathbf{1}_{\{X\}}$ is indicator function with subscript $X$ describing the event of inclusion. Next, we compute probabilities ${\cal E}_{1_{k}}$ and ${\cal E}_{2_{k}}$ in (III-B). Note that ${\cal E}_{1_{k}}$ and ${\cal E}_{2_{k}}$ do not depend on the bit index. The problem of demodulating $L_{k}$ symbols (bits) sent by sensor $k$ , based on $L_{k}$ received signals, $y_{k,1},\dots,y_{k,L_{k}}$ can be cast into $L_{k}$ binary hypothesis testing problems, in which the channel output corresponding to each problem is:

[TABLE]

for $i\!=\!1,\dots L_{k}$ , where $B_{k}$ is transmitted signal amplitude for sensor $k$ . Denoting $r_{k,i}$ as the test statistics, the optimal likelihood ratio test (LRT) at the FC can be expressed as:

[TABLE]

where the probabilities $p\left({\cal H}_{1,i}\right)\!=\!p(b_{k,l,i}=1)$ and $p\left({\cal H}_{0,i}\right)\!=\!p(b_{k,l,i}=0)$ . Lemma 2 shows that for our system model, $p({\cal H}_{0,i})=p({\cal H}_{1,i})=1/2$ .

Lemma 2.

We have $p({\cal H}_{0,i})=p({\cal H}_{1,i})=1/2$ under the following two assumptions:

the pdf of noisy observation $x_{k}$ is smooth and symmetric,
sensor $k$ uses a symmetric mid-rise quantizer and encodes the quantization level $m_{k}$ according to natural binary encoding rule. Both assumptions hold true for our system model.**

Proof. See Appendix A-B.

According to Lemma 2, we can state that $\mathbb{E}\{B_{k}^{2}\}=2P_{k}/L_{k}$ , where $P_{k}$ is the average transmit power of sensor $k$ . In the following, we find probabilities ${\cal E}_{1_{k}}$ and ${\cal E}_{2_{k}}$ for our two types of noncoherent receivers.

$\bullet$ Noncoherent Receiver with Known Channel Envelopes: For this receiver, the test statistics of LRT at the FC is the envelope of channel output, i.e., $r_{k,i}=|y_{k,i}|$ and $|h_{k}|$ is known to the FC. Hence, given $|h_{k}|$ , the two conditional pdfs of the test statistics under hypotheses ${\cal H}_{0,i}$ and ${\cal H}_{1,i}$ are [37]:

[TABLE]

where $\gamma_{k}$ is defined in (14) and $I_{0}(.)$ is the zeroth-order modified Bessel function of the first kind. Since $w_{k,i}$ ’s are independent across $L_{k}$ transmitted symbols, the random variables $r_{k,i}$ conditioned on each hypothesis and $|h_{k}|$ are i.i.d. for $i\!=\!1,\dots,L_{k}$ . Therefore, the probabilities ${\cal E}_{1_{k}}$ and ${\cal E}_{2_{k}}$ do not depend on bit index $i$ . Based on equations (7-4-7) and (7-4-11) in [37], probabilities ${\cal E}_{1_{k}}$ and ${\cal E}_{2_{k}}$ are:

[TABLE]

where the decision threshold $\zeta_{k}$ depends on $p({\cal H}_{0,i})$ and $p({\cal H}_{1,i})$ . For $p({\cal H}_{0,i})=p({\cal H}_{1,i})=1/2$ , [37] provides an accurate approximation of $\zeta_{k}$ as $\zeta_{k}=\sqrt{2+\gamma_{k}}$ .

Finally, by substituting (18) in (III-B), we compute $\alpha_{k,t,l}$ for noncoherent receiver with known channel envelopes.

$\bullet$ Noncoherent Receiver with Known Channel Statistics: For this receiver, the test statistics of LRT at the FC is the power of channel output, i.e., $r_{k,i}=|y_{k,i}|^{2}$ . The FC only knows the channel statistics $h_{k}\sim\mathcal{CN}\left(0,2\sigma_{h_{k}}^{2}\right)$ . Let $\overline{\gamma}_{k}$ denote the average channel SNR of sensor $k$ , where:

[TABLE]

in which we have used the knowledge of channel statistics to obtain $\mathbb{E}\{|h_{k}|^{2}\}=2\sigma_{h_{k}}^{2}$ . Since $y_{k,i}$ is complex Gaussian, we have [33]:

[TABLE]

Note that $r_{k,i}$ ’s conditioned on each hypothesis are i.i.d. for $i\!=\!1,\dots,L_{k}$ and therefore the probabilities ${\cal E}_{1_{k}}$ and ${\cal E}_{2_{k}}$ do not depend on bit index $i$ . Hence:

[TABLE]

in which the decision threshold $\zeta_{k}$ for $p({\cal H}_{0,i})\!=\!p({\cal H}_{1,i})\!=\!1/2$ is $\zeta_{k}=2\sigma_{w_{k}}^{2}(1+\frac{1}{2\overline{\gamma}_{k}})\ln\left(1+2\overline{\gamma}_{k}\right)$ .

Finally, by substituting (20) in (III-B), we compute $\alpha_{k,t,l}$ for noncoherent receiver with known channel statistics444 When the ratio $\frac{p\left({\cal H}_{0,i}\right)}{p\left({\cal H}_{1,i}\right)}\!=\!\tau_{FC}\!\neq\!1$ in (17), the expressions for the decision threshold $\zeta_{k}$ change. For noncoherent receiver with known channel envelopes, one can analytically find for each $\gamma_{k}$ the value of $\zeta_{k}$ which minimizes the average error probability corresponding to demodulating the symbols of sensor $k$ given as $p_{e_{k}}=p\left({\cal H}_{0,i}\right){\cal E}_{1_{k}}+p\left({\cal H}_{1,i}\right){\cal E}_{2_{k}}$ . Equivalently, $\zeta_{k}$ satisfies $e^{-2\gamma_{k}}I_{0}\left(2\zeta_{k}\sqrt{\gamma_{k}}\right)=\tau_{FC}$ . For noncoherent receiver with known channel statistics, we obtain $\zeta_{k}=2\sigma_{w_{k}}^{2}(1+\frac{1}{2\overline{\gamma}_{k}})\ln\left(\tau_{FC}(1+2\overline{\gamma}_{k}\right))$ . .

III-C Finding Bayesian FIM* $\boldsymbol{J}$ in (III) *

At this point, we have all the components to write the entries $[\boldsymbol{\Lambda}(\boldsymbol{\theta})]_{ij}$ in (8). Combining (10)-(13), we find the following compact form representation of $[\boldsymbol{\Lambda}(\boldsymbol{\theta})]_{ij}$ :

[TABLE]

where the scalar $G_{k}(\boldsymbol{\theta})$ is:

[TABLE]

Finally, we compute $\mathbb{E}\{\boldsymbol{\Lambda}(\boldsymbol{\theta})\}$ and substitute it in (III) to obtain matrix $\boldsymbol{J}$ as:

[TABLE]

where the columns of $\boldsymbol{\mathcal{A}}\!=\![\mathbf{a}_{1},...,\mathbf{a}_{K}]$ are observation vectors in (1) and the expectations over $\boldsymbol{\theta}$ in (23) are computed using numerical integration.

For $\boldsymbol{J}$ in (23) there exists two baselines. For the first baseline, suppose all sensors’ observations $x_{k}$ ’s are available at the FC with full precision (centralized estimation) and let $\boldsymbol{J}_{0}\!=\!{\boldsymbol{\cal C}_{\boldsymbol{\theta}}^{-1}}\!+\!\mathbb{E}\{{\boldsymbol{\Lambda}}_{0}(\boldsymbol{\theta})\}$ be the corresponding Bayesian FIM. To find $[{\boldsymbol{\Lambda}}_{0}(\boldsymbol{\theta})]_{ij}$ , we start from (8) and replace $p(\hat{m}_{k,t}\arrowvert\boldsymbol{\theta})$ with $f\!\left(x_{k}|\boldsymbol{\theta}\right)$ . Following the same procedure as we described to obtain (10) from (8), we reach:

[TABLE]

Since $\frac{\partial f\left(x_{k}|\boldsymbol{\theta}\right)}{\partial\theta_{i}}=\frac{a_{k_{i}}(x_{k}-\mathbf{a}_{k}^{T}\boldsymbol{\theta})}{\sigma_{n_{k}}^{2}}f\left(x_{k}|\boldsymbol{\theta}\right)$ , it is straightforward to show $[{\boldsymbol{\Lambda}}_{0}(\boldsymbol{\theta})]_{ij}=\sum_{k=1}^{K}\!\frac{a_{k_{i}}a_{k_{j}}}{\sigma_{n_{k}}^{2}}$ . Therefore:

[TABLE]

For the second baseline, suppose communication channels between sensors and the FC are error-free and hence vector $\boldsymbol{m}$ is available at the FC. Let $\boldsymbol{J}^{ideal}\!=\!{\boldsymbol{\cal C}_{\boldsymbol{\theta}}^{-1}}\!+\!\mathbb{E}\{{\boldsymbol{\Lambda}}^{ideal}(\boldsymbol{\theta})\}$ be the corresponding Bayesian FIM. To find $G_{k}^{ideal}(\boldsymbol{\theta})$ for entries $[{\boldsymbol{\Lambda}}^{ideal}(\boldsymbol{\theta})]_{ij}$ using (21) we note that $\alpha_{k,t,l}\!=\!1$ for $t\!=\!l$ and $\alpha_{k,t,l}\!=\!0$ otherwise, since the channel error probabilities ( ${\cal E}_{k}$ for coherent receiver, ${\cal E}_{1_{k}},{\cal E}_{2_{k}}$ for noncoherent receivers) are zero. Therefore, from (21) we find $G_{k}^{ideal}(\boldsymbol{\theta})\!=\!\sum_{t=1}^{M_{k}}\frac{{(\dot{\beta}_{k,t}(\boldsymbol{\theta}))}^{2}}{\beta_{k,t}(\boldsymbol{\theta})}$ . Clearly, $\boldsymbol{J}\preceq\boldsymbol{J}^{ideal}\preceq\boldsymbol{J}_{0}$ .

Remark 1.

If $\boldsymbol{\theta}$ has a known nonzero-mean $\boldsymbol{\mu}_{\theta}$ , sensor $k$ subtracts $\mathbf{a}_{k}^{T}\boldsymbol{\mu}_{\theta}$ from its observation $x_{k}$ , before quantization. At the FC, $\mathbf{a}_{k}^{T}\boldsymbol{\mu}_{\theta}$ is first added to $\hat{m}_{k}$ to generate $\tilde{m}_{k}\!=\!\hat{m}_{k}+\mathbf{a}_{k}^{T}\boldsymbol{\mu}_{\theta}$ and then the Bayesian estimator $\hat{\boldsymbol{\theta}}$ is formed using $\boldsymbol{\tilde{m}}=[\tilde{m}_{1},...,\tilde{m}_{K}]^{T}$ . Thus, the corresponding Bayesian FIM matrix $\boldsymbol{\tilde{J}}$ becomes:

[TABLE]

where the joint pdf $p_{\boldsymbol{\tilde{m}}\boldsymbol{\theta}}(\boldsymbol{\tilde{m}},\boldsymbol{\theta})=p_{\boldsymbol{\hat{m}}\boldsymbol{\theta}}(\boldsymbol{\tilde{m}}-\boldsymbol{\mathcal{A}}^{T}\boldsymbol{\mu}_{\theta},\boldsymbol{\theta})$ . Noting that $\boldsymbol{\tilde{m}}-\boldsymbol{\mathcal{A}}^{T}\boldsymbol{\mu}_{\theta}=\boldsymbol{\hat{m}}$ , we follow the same procedure as we conducted before to obtain $\boldsymbol{J}$ in (23) and we find that $\boldsymbol{\tilde{J}}$ has the same expression as $\boldsymbol{J}$ with the only difference that $\boldsymbol{\mathcal{C}}_{\boldsymbol{\theta}}=\mathbb{E}\{\boldsymbol{\theta}\boldsymbol{\theta}^{T}\}-\boldsymbol{\mu}_{\theta}{\boldsymbol{\mu}_{\theta}}^{T}$ for nonzero-mean $\boldsymbol{\theta}$ .**

IV WWB Bound: Derivation and Computation

The MSE matrix of any Bayesian estimator $\hat{\boldsymbol{\theta}}$ of random vector $\boldsymbol{\theta}\in\mathbb{R}^{q}$ satisfies the following inequality [29, 38]:

[TABLE]

where the columns of $q\times q$ matrix $\boldsymbol{R}={[\boldsymbol{r}_{1},\boldsymbol{r}_{2},...,\boldsymbol{r}_{q}]}^{T}$ , so-called test points, lie in the parameter space and their choices are left to the user [29, 38]. The $q\times q$ matrix ${\boldsymbol{G}}$ is defined by its entries $[{\boldsymbol{G}}]_{ij}$ , which are computed as follows [29]:

[TABLE]

The inequality in (25) holds for any $\boldsymbol{R}$ such that $\boldsymbol{G}$ in invertible [29, 38]. Maximizing the right side of (25) with respect to $\boldsymbol{R}$ leads to the tightest WWB, denoted as $\boldsymbol{W}\boldsymbol{W}\boldsymbol{B}$ . In other words:

[TABLE]

where the supremum operation is taken with respect to Loewner partial ordering [38]. To find $\boldsymbol{W}\boldsymbol{W}\boldsymbol{B}$ in our problem, first we need to derive the entries $[{\boldsymbol{G}}]_{ij}$ , or equivalently scalar $\mu(\boldsymbol{r})$ in (26). After deriving $\mu(\boldsymbol{r})$ , we discuss how to compute the supremum in (27).

IV-A Deriving $\mu(\boldsymbol{r})$ in (26) Based on Our System Model

Using equation (43) in [29] and the Bayes’ rule to write $p(\hat{\boldsymbol{m}},\boldsymbol{\theta}\!+\!\boldsymbol{r})\!=\!p(\hat{\boldsymbol{m}}|\boldsymbol{\theta}\!+\!\boldsymbol{r})f(\boldsymbol{\theta}\!+\!\boldsymbol{r})$ and $p(\hat{\boldsymbol{m}},\boldsymbol{\theta})\!=\!p(\hat{\boldsymbol{m}}|\boldsymbol{\theta})f(\boldsymbol{\theta})$ we find:

[TABLE]

where $V_{\theta}$ denotes the $q$ -dimensional volume over which we take integral and $p^{\frac{1}{2}}(.,.)$ is the square root of the joint pdf. To characterize $\mu(\boldsymbol{r})$ in (IV-A) we need to find $p(\hat{\boldsymbol{m}}|\boldsymbol{\theta})$ , $p(\hat{\boldsymbol{m}}|\boldsymbol{\theta}+\boldsymbol{r})$ , and $f^{\frac{1}{2}}(\boldsymbol{\theta}+\boldsymbol{r})f^{\frac{1}{2}}(\boldsymbol{\theta})$ . Let index $t$ indicate the quantization level corresponding to $\hat{m}_{k}$ . According to Lemma 1, the followings are evident:

[TABLE]

where $p(\hat{m}_{k,t}|\boldsymbol{\theta})$ is given in (11), and $p(\hat{m}_{k,t}|\boldsymbol{\theta}+\boldsymbol{r})$ can be computed with a simple substitution of $\boldsymbol{\theta}$ by $\boldsymbol{\theta}+\boldsymbol{r}$ in (11). Moreover, some easy manipulations yield:

[TABLE]

Substituting (29) and (30) in (IV-A) and some straightforward manipulations produce:

[TABLE]

where $c_{q}(\boldsymbol{r})=-\frac{q}{2}\ln(2\pi)-\frac{1}{2}\ln|\boldsymbol{\mathcal{C}}_{\boldsymbol{\theta}}|-\frac{{\boldsymbol{r}}^{T}\boldsymbol{\mathcal{C}}_{\boldsymbol{\theta}}^{-1}\boldsymbol{r}}{8}$ .

IV-B Computation of the Tightest WWB

In the following, we explain how we compute the supremum in (27). We note that the method to compute the supremum in (27) does not depend on the system model (it only depends on the parameter space). Therefore, we adopt the same method as in [38]. Let $\boldsymbol{W}(\boldsymbol{R})=\boldsymbol{R}{\boldsymbol{G}}^{-1}{\boldsymbol{R}}^{T}$ and define set:

[TABLE]

Then $\boldsymbol{W}\boldsymbol{W}\boldsymbol{B}$ is the supremum of set ${\cal W}$ , where the supremum operation is taken with respect to Loewner partial ordering [38]. It is worth mentioning the difference between the maximum and the supremum of the set $\cal W$ . The largest element of $\cal W$ , if it exists, is defined as $\boldsymbol{W}\preceq{\boldsymbol{W}}^{*},\forall\boldsymbol{W}\in{\cal W}$ . On the other hand, the supremum of $\cal W$ is a minimal-upper bound on $\cal W$ that is not necessarily contained in $\cal W$ . This implies that the largest element of $\cal W$ may not exist, but if it exists, it is also the supremum.

According to Lemma 3 of [39] for any two positive definite matrices $\boldsymbol{A}$ and $\boldsymbol{B}$ we have $\boldsymbol{A}\!\succeq\!\boldsymbol{B}$ if and only if $\varepsilon(\boldsymbol{A})\!\supseteq\!\varepsilon(\boldsymbol{B})$ , in which the hyper-ellipsoid $\varepsilon(\boldsymbol{A})$ centered at the origin can be represented by the set $\varepsilon(\boldsymbol{A})\!=\!\{\boldsymbol{z}|{\boldsymbol{z}}^{T}\!{\boldsymbol{A}}^{-1}\boldsymbol{z}\!\leq\!1\}$ . Consequently, the supremum in (27) can be computed by finding the minimum volume hyper-ellipsoid $\varepsilon({\boldsymbol{W}}^{*})$ containing the set $\varepsilon_{\cal W}\!=\!\{\varepsilon(\boldsymbol{W})|\boldsymbol{W}\!\in\!{\cal W}\}$ , where the set $\varepsilon_{\cal W}$ itself consists of the hyper-ellipsoids generated by all matrices in ${\cal W}$ . The problem of finding the minimum volume ellipsoid $\varepsilon$ that contains the ellipsoids $\varepsilon_{1},...,\varepsilon_{m}$ (and therefore the convex hull of their union) has been formulated as a convex problem in [40]:

[TABLE]

where ${\boldsymbol{W}}_{i}\in{\cal W}$ and $|\cal W|$ is the cardinality of the set ${\cal W}$ . This problem can be solved efficiently using semidefinite programming. In particular, we solve this problem using CVX.

V Power Constrained Bayesian Fisher Information Maximization

In this section, we address the constrained optimization problems formulated in (II) and (II). We denote the solutions obtained from solving these two power constrained Fisher information maximization problems as FIM-max schemes. Note that due to the cap on the network average transmit power, only a subset of the sensors might be active during each task period, which we refer to as the set of active sensors $S_{\cal A}=\{k:P_{k}>0,\ k=1,\dots,K\}$ .

V-A Solving Optimization Problem in (II)

We adopt the Lagrange multipliers method to solve the problem . The Lagrangian $\mathcal{L}$ of this problem is:

[TABLE]

The Karush-Kuhn-Tucker (KKT) optimality conditions are:

[TABLE]

where $\lambda,\eta_{k}$ ’s are the Lagrange multipliers. According to (23) we find:

[TABLE]

Thus, to show $\frac{\partial\,\text{tr}(\boldsymbol{J})}{\partial P_{k}}\!>\!0$ , we need to show $\mathbb{E}\{\!\frac{\partial\,G_{k}(\boldsymbol{\theta})}{\partial P_{k}}\!\}\!>\!0$ . Although we were not able to prove analytically, our extensive simulations for various system parameters indicate that $\mathbb{E}\{\!\frac{\partial\,G_{k}(\boldsymbol{\theta})}{\partial P_{k}}\!\}\!>\!0$ and thus $\frac{\partial\,\text{tr}(\boldsymbol{J})}{\partial P_{k}}\!>\!0$ . Fig. 2 summarizes our extensive simulations to demonstrate $\frac{\partial\,\text{tr}(\boldsymbol{J})}{\partial P_{k}}\!>\!0$ , for coherent receiver. To obtain this figure, we let $K\!=\!2$ and consider a zero-mean Gaussian vector $\boldsymbol{\theta}\!=\!\left[\theta_{1},\theta_{2}\right]^{T}$ with $\boldsymbol{\cal C}_{\boldsymbol{\theta}}\!=\![4,0.5;0.5,0.25]$ . We assume $L_{k}\!=\!3$ , $\mathbf{a}_{k}\!=\![0.6,0.8]^{T},\forall k$ , and vary $|h_{k}|$ , $\sigma_{w_{k}}$ , $\sigma_{n_{k}}$ and use the uniform quantizer described in Section IX. Let $\delta_{k}\!=\!\frac{|h_{k}|^{2}}{2\sigma_{w_{k}}^{2}}$ . For coherent receiver, Fig. 2(a) and Fig. 2(b) depict $\frac{\partial\,\text{tr}(\boldsymbol{J})}{\partial P_{k}}$ versus $P_{k}$ for different values of $\delta_{k}$ and $\sigma_{n_{k}}$ , respectively. We observe that, for all different values of $\delta_{k}$ and $\sigma_{n_{k}}$ , we have $\frac{\partial\,\text{tr}(\boldsymbol{J})}{\partial P_{k}}\!>\!0$ , $\forall P_{k}$ . Similar observations were made for both types of noncoherent receivers. However, due to lack of space we have omitted those plots.

Since tr $(\boldsymbol{J})$ is an increasing function of $P_{k}$ ’s, the Lagrange multiplier $\lambda$ in (32) should be determined such that it satisfies the network average transmit power constraint with equality, that is, $\sum_{k\in S_{\cal A}}P_{k}=P_{tot}$ . Furthermore, for the set of active sensors $S_{\cal A}$ the Lagrange multiplier $\eta_{k}=0$ . Hence, we can reformulate the KKT optimality conditions in (32) as:

[TABLE]

Let $\boldsymbol{P}\!=\![P_{1},\dots,P_{K}]$ be the vector of sensors’ transmit powers. The Hessian of $\text{tr}(\boldsymbol{J})$ with respect to $\boldsymbol{P}$ is a diagonal matrix, since using (33) we find $\frac{\partial^{2}\text{tr}(\boldsymbol{J})}{\partial P_{i}\partial P_{j}}\!=\!0,\ i,j\!=\!1,\dots,K,\ i\!\neq\!j$ . Fig. 3(a) and Fig. 3(b) depict $\frac{\partial^{2}\text{tr}(\boldsymbol{J})}{\partial P_{k}^{2}}$ versus $P_{k}$ for different values of $\delta_{k}$ and $\sigma_{n_{k}}$ , respectively, for coherent receiver, showing that $\frac{\partial^{2}\text{tr}(\boldsymbol{J})}{\partial P_{k}^{2}}<0$ , which implies the Hessian matrix is negative definite. The negative definiteness of the Hessian matrix means that $\text{tr}(\boldsymbol{J})$ is jointly concave over $P_{k}$ ’s. Moreover, the constraints are linear, and thus, the problem in (II) is concave. For noncoherent receivers, unlike coherent receiver, our simulations show that the sign of $\frac{\partial^{2}\text{tr}(\boldsymbol{J})}{\partial P_{k}^{2}}$ for various system parameters changes, and thus, $\text{tr}(\boldsymbol{J})$ is not necessarily a concave function over $P_{k}$ ’s. The optimal solutions for $\lambda$ and $P_{k}$ for $k\in S_{\cal A}$ cannot be obtained in closed-form expressions. Therefore, we resort to Newton-Raphson algorithm to solve the set of nonlinear equations in (V-A). For coherent receiver, since the problem is concave, it is guaranteed that the numerical solution obtained via the algorithm is globally optimal. Therefore, only one (carefully chosen) initial point suffices to run the algorithm. However, for noncoherent receivers, since the problem is not concave, we consider multiple initial points to run the algorithm. The description of this algorithm for noncoherent receivers follows.

Let $\boldsymbol{z}\coloneqq\left[\boldsymbol{P},\lambda\right]^{T}$ be the vector that contains the vector of sensors’ transmit powers as well as the Lagrange multiplier $\lambda$ . We let $\boldsymbol{f}$ and $\boldsymbol{\mathcal{G}}$ , respectively be the gradient vector and the Jacobian matrix of the right side of the equality in (31) with respect to $\boldsymbol{z}$ . We have:

[TABLE]

Let $N_{i}$ be the total number of initial points. We choose $\boldsymbol{z}_{i}^{(j)},\ j\!=\!1,...,N_{i}$ initial points (solutions), where $j$ is the index of the initial points. The Newton-Raphson algorithm is carried out to obtain $\boldsymbol{z}_{f}^{(j)}$ and $T^{(j)}=\text{tr}(\boldsymbol{J}(\boldsymbol{z}_{f}^{(j)})),\ j\!=\!1,...,N_{i}$ , which respectively are the final solution and the final value of the objective function obtained when the algorithm terminates, corresponding to the initial point $\boldsymbol{z}_{i}^{(j)}$ . Suppose the algorithm runs for the initial point $\boldsymbol{z}_{i}^{(j)}$ . We initialize the iteration index $n\!=\!0$ and the initial point $\boldsymbol{z}_{0}\!=\!\boldsymbol{z}_{i}^{(j)}$ . We denote $\boldsymbol{z}_{n}$ as the solution at $n$ -th iteration, and $\boldsymbol{f}\left(\boldsymbol{z}_{n}\right)$ , $\boldsymbol{\mathcal{G}}\left(\boldsymbol{z}_{n}\right)$ , respectively, as the gradient vector and the Jacobian matrix evaluated at $\boldsymbol{z}_{n}$ . At iteration $n$ , if the Jacobian matrix $\boldsymbol{\mathcal{G}}\left(\boldsymbol{z}_{n}\right)$ becomes singular, or $\sum_{k\in S_{\cal A}}P_{k}>P_{tot}$ , the algorithm terminates. Otherwise, we let $\boldsymbol{z}_{n+1}=\boldsymbol{z}_{n}-\boldsymbol{\mathcal{G}}^{-1}\left(\boldsymbol{z}_{n}\right)\boldsymbol{f}\left(\boldsymbol{z}_{n}\right)$ . As the stopping criterion, we check whether $\frac{\left|\left|\boldsymbol{z}_{n}-\boldsymbol{z}_{n-1}\right|\right|}{\left|\left|\boldsymbol{z}_{n}\right|\right|}\leq\epsilon_{0}$ , where $\epsilon_{0}$ is a predetermined error tolerance, or whether the number of iterations exceeds a predetermined maximum $I_{max}$ . Let $\boldsymbol{z}^{*}\!=\!\left[\boldsymbol{P}^{*},\lambda^{*}\right]^{T}$ be the optimal solution to this constrained optimization problem. After finding all $\{T^{(j)}\}_{j=1}^{N_{i}}$ , $\boldsymbol{z}^{*}$ is $\boldsymbol{z}_{f}^{(j)}$ associated with the largest value among $T^{(j)},\ j\!=\!1,...,N_{i}$ .

V-B Solving Optimization Problem in (II)

We follow the same procedure as we described in Section V-A to solve (II). Specifically, we have:

[TABLE]

where we have used (23) and the fact $\text{tr}(\boldmath{A}\boldmath{B}\boldmath{C})\!=\!\text{tr}(\boldmath{C}\boldmath{A}\boldmath{B})$ to reach (36). Since $\mathbb{E}\{\frac{\partial\,G_{k}(\boldsymbol{\theta})}{\partial P_{k}}\}\!>\!0$ and ${\boldsymbol{J}}^{-1}\!\succeq\!0$ we conclude $\frac{\partial\,\text{log}_{2}(|\boldsymbol{J}|)}{\partial P_{k}}>0$ and thus $\text{log}_{2}(|\boldsymbol{J}|)$ is an increasing function of $P_{k}$ ’s. The Lagrangian $\mathcal{L}$ of this problem is $\mathcal{L}(\lambda,\{\eta_{k},P_{k}\}_{k=1}^{K})\!=\!\text{log}_{2}(|\boldsymbol{J}|)\!-\!\sum_{k=1}^{K}P_{k}\left(\lambda-\eta_{k}\right)\!+\!\lambda P_{tot}$ . The corresponding KKT optimality conditions are:

[TABLE]

For coherent receiver our simulations show that the Hessian of $\text{log}_{2}(|\boldsymbol{J}|)$ with respect to $\boldsymbol{P}$ is diagonal and negative definite matrix, and thus, $\text{log}_{2}(|\boldsymbol{J}|)$ is jointly concave function over $P_{k}$ ’s. However, for noncoherent receivers the sign of $\frac{\partial^{2}\text{tr}(\boldsymbol{J})}{\partial P_{k}^{2}}$ varies for different system parameters and hence $\text{log}_{2}(|\boldsymbol{J}|)$ is not necessarily concave function of $P_{k}$ ’s. We employ Newton-Raphson algorithm with multiple initial points as we described in Section V-A to solve the set of equations in (V-B). A remark on the difference between power allocation schemes based on maximization of tr $(\boldsymbol{J})$ and $\text{log}_{2}(|\boldsymbol{J}|)$ follows.

Remark 2.

Regarding the solution of (V-A) on constrained maximization of tr $(\boldsymbol{J})$ , we note that $\lambda^{*}$ is common and fixed for all active sensors and thus this power allocation scheme can be implemented in a distributed fashion, i.e., the FC sends $\lambda^{*}$ to the set of active sensors and each sensor calculates its own power $P_{k}^{*}$ using its local parameters. Unlike the solution of (V-A), the solution of (V-B) on constrained maximization of log ${}_{2}(|\boldsymbol{J}|)$ cannot be implemented in a distributed fashion. In other words, the FC needs to find $\{P_{k}^{*}\}_{k\in S_{\cal A}}$ and informs the active sensors of their transmit powers.**

VI LMMSE Estimator and its MSE

Given $\boldsymbol{\hat{m}}$ , finding the optimal MMSE estimate of $\boldsymbol{\theta}$ in a closed form is mathematically intractable, since it requires $q$ dimensional integrals that cannot be simplified. To curb computational complexity, we assume that the FC employs the LMMSE estimator to process $\boldsymbol{\hat{m}}$ and forms the estimate $\hat{\boldsymbol{\theta}}$ . We derive the LMMSE estimator $\hat{\boldsymbol{\theta}}$ and its corresponding MSE matrix $\boldsymbol{\mathcal{D}}$ . Let vector $\boldsymbol{\breve{m}}=\boldsymbol{\hat{m}}-\mathbb{E}\{\boldsymbol{\hat{m}}\}$ . We have:

[TABLE]

Since $\boldsymbol{\theta}$ is zero-mean, we obtain $\mathbb{E}\{\boldsymbol{\theta}\boldsymbol{\breve{m}}^{T}\}=\mathbb{E}\{\boldsymbol{\theta}(\boldsymbol{\hat{m}}-\mathbb{E}\{\boldsymbol{\hat{m}}\})^{T}\}=\mathbb{E}\{\boldsymbol{\theta}\boldsymbol{\hat{m}}^{T}\}$ . The $k$ -th column of the cross-covariance matrix $\mathbb{E}\{\boldsymbol{\theta}\boldsymbol{\hat{m}}^{T}\}$ describes the correlation between $\hat{m}_{k}$ and $\boldsymbol{\theta}$ . Using the Bayes’ rule we obtain:

[TABLE]

where $V_{\theta}$ denotes the $q$ -dimensional volume over which we take integral, and in the first equality we have used the fact that $\boldsymbol{\theta}$ , $m_{k}$ , $\hat{m}_{k}$ form a Markov chain and thus, given $m_{k}$ , $\boldsymbol{\theta}$ and $\hat{m}_{k}$ are conditionally independent. Since $p(\hat{m}_{k,t}|m_{k,l})=\alpha_{k,t,l}$ and $p(m_{k,l}|\boldsymbol{\theta})=\beta_{k,l}(\boldsymbol{\theta})$ , we reach:

[TABLE]

and the expression for vector $\boldsymbol{\mathcal{I}}_{k,l}^{1}$ is given in (42). By definition, the $(i,j)$ -th entry of matrix $\mathbb{E}\{\boldsymbol{\breve{m}}\boldsymbol{\breve{m}}^{T}\}$ is:

[TABLE]

Similar to what we did in (39), to obtain $\mathbb{E}\{\hat{m}_{k}\}$ and the diagonal entries of $\mathbb{E}\{\boldsymbol{\hat{m}}\boldsymbol{\hat{m}}^{T}\}$ (i.e., $\mathbb{E}\{\hat{m}_{k}^{2}\}$ ), we condition on $m_{k}$ ; however, for the non-diagonal entries of $\mathbb{E}\{\boldsymbol{\hat{m}}\boldsymbol{\hat{m}}^{T}\}$ (i.e., $\mathbb{E}\{\hat{m}_{i}\hat{m}_{j}\}$ ), we condition on $\boldsymbol{\theta}$ . Then using (11), we obtain:

[TABLE]

where ${\mathcal{I}}_{k,l}^{2}$ and ${\mathcal{I}}_{i,j,l_{1},l_{2}}^{3}$ are scalars. We find these integrals (see Appendix A-C for derivations) as below:

[TABLE]

in which:

[TABLE]

Substituting (39)-(43) in (VI), the MSE matrix $\boldsymbol{\mathcal{D}}$ is computed.

For $\boldsymbol{\mathcal{D}}$ in (VI) there exists two baselines. For the first baseline, we consider the centralized estimation case in Section III-C with the LMMSE estimator at the FC and let $\boldsymbol{\mathcal{D}}_{0}$ denote the corresponding MSE matrix. We have:

[TABLE]

where $\mathbb{E}\{\boldsymbol{x}\boldsymbol{x}^{T}\}$ and $\mathbb{E}\{\boldsymbol{\theta}\boldsymbol{x}^{T}\}$ respectively are, auto-covariance matrix of noisy observations, and cross-covariance matrix between $\boldsymbol{\theta}$ and $\boldsymbol{x}$ . For linear observation model in (1) we get:

[TABLE]

For the second baseline, suppose communication channels between sensors and the FC are error-free and hence vector $\boldsymbol{m}$ is available at the FC. Let vector $\boldsymbol{\mathring{m}}\!=\!\boldsymbol{m}\!-\!\mathbb{E}\{\boldsymbol{m}\}$ . Then, the corresponding MSE matrix is $\boldsymbol{\mathcal{D}}^{ideal}\!=\!\boldsymbol{\mathcal{C}}_{\boldsymbol{\theta}}\!-\!\mathbb{E}\{\boldsymbol{\theta}\boldsymbol{\mathring{m}}^{T}\}(\mathbb{E}\{\boldsymbol{\mathring{m}}\boldsymbol{\mathring{m}}^{T}\})^{-1}\mathbb{E}\{\boldsymbol{\theta}\boldsymbol{\mathring{m}}^{T}\}^{T}$ . Since $\boldsymbol{\theta}$ is zero-mean, we obtain $\mathbb{E}\{\boldsymbol{\theta}\boldsymbol{\mathring{m}}^{T}\}\!=\!\mathbb{E}\{\boldsymbol{\theta}\boldsymbol{m}^{T}\}$ . We let $\mathbb{E}\{\boldsymbol{\theta}m_{k}\}$ and $[\mathbb{E}\{\boldsymbol{\mathring{m}}\boldsymbol{\mathring{m}}^{T}\}]_{ij}$ , respectively, be the $k$ -th column of matrix $\mathbb{E}\{\boldsymbol{\theta}\boldsymbol{m}^{T}\}$ , and the $(i,j)$ -th entry of matrix $\mathbb{E}\{\boldsymbol{\mathring{m}}\boldsymbol{\mathring{m}}^{T}\}$ . Taking steps similar to the ones we took to obtain (39)-(41), we find $\mathbb{E}\{\boldsymbol{\theta}m_{k}\}\!=\!\sum_{l=1}^{M_{k}}m_{k,l}\boldsymbol{\mathcal{I}}_{k,l}^{1}$ , $\mathbb{E}\{m_{k}\}\!=\!\sum_{l=1}^{M_{k}}m_{k,l}\boldsymbol{\mathcal{I}}_{k,l}^{2}$ , $[\mathbb{E}\{\boldsymbol{\mathring{m}}\boldsymbol{\mathring{m}}^{T}\}]_{ij}\!=\!\mathbb{E}\{m_{i}m_{j}\}\!-\!\mathbb{E}\{m_{i}\}\mathbb{E}\{m_{j}\},\ i,j=1,...,K$ , in which $\mathbb{E}\{m_{i}m_{j}\}\!=\!\sum_{l=1}^{M_{k}}\!m^{2}_{k,l}\boldsymbol{\mathcal{I}}_{k,l}^{2}$ for $i\!=\!j\!=\!k$ , and $\mathbb{E}\{m_{i}m_{j}\}\!=\!\sum_{l_{1}=1}^{M_{i}}\sum_{l_{2}=1}^{M_{j}}m_{i,l_{1}}\!m_{j,l_{2}}{\mathcal{I}}_{i,j,l_{1},l_{2}}^{3}$ for $i\!\neq\!j$ . Clearly, $\boldsymbol{\mathcal{D}}_{0}\!\preceq\!\boldsymbol{\mathcal{D}}^{ideal}\!\preceq\!\boldsymbol{\mathcal{D}}$ .

Remark 3.

If $\boldsymbol{\theta}$ has a known nonzero-mean $\boldsymbol{\mu}_{\theta}$ , the expressions for the LMMSE estimator $\hat{\boldsymbol{\theta}}$ and its corresponding MSE matrix $\boldsymbol{\mathcal{D}}$ change as the following:

[TABLE]

where $\boldsymbol{\mathcal{C}}_{\boldsymbol{\theta}}=\mathbb{E}\{\boldsymbol{\theta}\boldsymbol{\theta}^{T}\}-\boldsymbol{\mu}_{\theta}{\boldsymbol{\mu}_{\theta}}^{T}$ .**

VII Discussion on Appropriateness and Achievability of Bayesian CRB

One may wonder how the FIM-max schemes in Section V are compared with the power allocation that can be obtained from constrained minimization of the MSE of the LMMSE estimator derived in Section VI. On the other hand, the literature [29] suggests that the WWB in Section IV is a tighter bound (compared to Bayesian CRB). This observation raises the question whether using the WWB as the optimization metric would be a more appropriate choice. This section provides answers to these questions.

VII-A Appropriateness of Bayesian FIM as the Optimization Metric

Let $\mathcal{D}=\text{tr}(\boldsymbol{\mathcal{D}})$ , where $\boldsymbol{\mathcal{D}}$ is the MSE matrix of the LMMSE estimator given in (VI). We consider the following constrained optimization problem:

[TABLE]

In the absence of analytical solution, we resort to exhaustive search method to find the solution of the problem in (VII-A). Let MSE-min scheme corresponds to this solution. For all three types of receivers, our extensive simulations show that $\frac{\partial\,\text{tr}(\boldsymbol{\mathcal{D}})}{\partial P_{k}}>0$ , however, the sign of $\frac{\partial^{2}\text{tr}(\boldsymbol{\mathcal{D}})}{\partial P_{k}^{2}}$ for various system parameters changes, and hence, tr $(\boldsymbol{\mathcal{D}})$ is not necessarily a convex function over $P_{k}$ ’s. Furthermore, the cost function in (VII-A) cannot be decoupled over the optimization variables $P_{k}$ ’s and thus $P_{k}$ ’s across sensors are related to each other. Because of this, finding MSE-min is computationally complex, and the solution cannot be implemented in a distributed fashion (i.e., sensor $k$ cannot find $P_{k}$ relying on its own local information only). This contrasts FIM-max scheme obtained from solving the problem in (II), where the cost function in (II) can be decoupled over $P_{k}$ ’s and thus $P_{k}$ ’s across sensors are not related to each other. Because of this, finding FIM-max is computationally simple, and the solution can be implemented in a distributed fashion. Figures 8 and 9 in Section IX illustrate the numerical evaluations of (i) trace of $\boldsymbol{\mathcal{D}}$ at power allocation obtained from solving the problem in (VII-A), denoted as ${\cal D}_{m}\!=\!\text{tr}(\boldsymbol{\mathcal{D}}(\text{MSE-min}))$ and (ii) trace of $\boldsymbol{\mathcal{D}}$ at power allocation obtained from solving the problem in (II), denoted as ${\cal D}_{t}\!=\!\text{tr}(\boldsymbol{\mathcal{D}}(\text{FIM-max}))$ , given $P_{tot}$ . The figures show that:

[TABLE]

where $a\!\lesssim\!b$ means that $a$ is less than $b$ , but very close to $b$ . Obviously, from the estimation theory we know ${\cal D}_{m}\!<\!{\cal D}_{t}$ . What our numerical results reveal is that in our problem they are very close to each other. This indicates the appropriateness of using Bayesian FIM as the optimization metric, since the loss in terms of the MSE performance is not significant.

VII-B Tightness and Achievability of Bayesian CRB

Although the WWB is a tighter bound (compared to Bayesian CRB)[29], we note that finding the WWB matrix is computationally much more expensive (compared to finding the Bayesian FIM), due to required matrix inversions ${\boldsymbol{G}}^{-1}$ for each test point in (27). Consequently, finding the power allocation that minimizes the trace or log-determinant of the WWB is computationally much more expensive than finding the solutions for the problems in (II) or (II). Furthermore, (46) indicates that by not using power allocation obtained from minimizing trace of the WWB matrix (which is tighter than Bayesian CRB) we are not in disadvantage, in terms of the MSE performance.

According to [23] Bayesian CRB is attainable if and only if the posterior probability density of $\boldsymbol{\theta}$ given “observation” is Gaussian. In that case, the MMSE and MAP estimators coincide and both are efficient (i.e., their MSE matrices are equal to Bayesian CRB matrix) [23]. This bound is attained in the limit as $K$ becomes infinite [29]. In our work, the recovered quantization levels for all sensors at the FC, denoted as vector $\hat{\boldsymbol{m}}$ , plays the role of “observation”. Since the posterior probability density of $\boldsymbol{\theta}$ given $\hat{\boldsymbol{m}}$ is not Gaussian, Bayesian CRB is not attainable. However, as $K$ increases, we expect that the MSE of MMSE estimator approaches to Bayesian CRB. Let $\mbox{$ {\mbox{tr}} $}(\text{CRB(FIM-max)})$ denote trace of Bayesian CRB matrix evaluated at FIM-max power allocation, and Let $\mbox{$ {\mbox{tr}} $}(\text{CRB(MSE-min)})$ denote trace of Bayesian CRB matrix evaluated at MSE-min power allocation. From the estimation theory we know:

[TABLE]

Combining (47) and (46) we reach:

[TABLE]

This suggests that, although Bayesian CRB is not attainable, it is still proper to use Bayesian FIM for transmit power optimization, since the loss in terms of the MSE performance is not significant.

VIII Classical CRB and BLUE for Estimating Deterministic Vector $\boldsymbol{\theta}$

In this section, we derive the classical FIM (assuming vector $\boldsymbol{\theta}$ to be estimated is deterministic), the BLUE and its corresponding MSE matrix. We also discuss the behavior of the classical FIM and the MSE of BLUE in low-region and high-region of $P_{tot}$ . Finally, we discuss optimizing transmit power considering the classical FIM and the MSE of BLUE as the optimization metric.

VIII-A Characterization of Classical FIM

Let ${\boldsymbol{J}}_{c}$ denote the $q\times q$ classical FIM and represents the $(i,j)$ -th entry of ${\boldsymbol{J}}_{c}$ . We have [23]:

[TABLE]

where $p(\boldsymbol{\hat{m}};\boldsymbol{\theta})$ is the joint probability distribution of $\hat{m}_{1},...,\hat{m}_{K}$ parameterized by $\boldsymbol{\theta}$ . Notice that $[{\boldsymbol{J}}_{c}]_{ij}$ in (48) is similar to $[\boldsymbol{\Lambda}(\boldsymbol{\theta})]_{ij}$ in (8), with the difference that for Bayesian FIM we deal with the conditional pdf $p(\boldsymbol{\hat{m}}|\boldsymbol{\theta})$ . Therefore, ${\boldsymbol{J}}_{c}$ has the same expression as $\boldsymbol{J}$ in (23), which depends on $\boldsymbol{\theta}$ . That is:

[TABLE]

in which $G_{k}(\boldsymbol{\theta})$ is defined in (22), and the probabilities $\alpha_{k,t,l}$ and $\beta_{k,l}(\boldsymbol{\theta})$ have the same expressions as for Bayesian FIM.

VIII-B Characterization of BLUE and its MSE Matrix

Recall $\hat{\boldsymbol{m}}$ is the data at the FC based on which we wish to form the BLUE. To satisfy the unbiasedness requirement for BLUE, we need to have $\mathbb{E}\{\hat{\boldsymbol{m}}\}=\boldsymbol{H}\boldsymbol{\theta}$ , for a known matrix $\boldsymbol{H}$ [41]. The unbiasedness requirement is not satisfied in general for our system model. However, under three conditions (coherent receiver at the FC, uniform quantizer555For sensor $k$ , we define the quantization noise $\epsilon_{k}\!=\!x_{k}-m_{k}$ . Since $n_{k}$ ’s in (1) are uncorrelated Gaussian, $x_{k}$ ’s are uncorrelated Gaussian. [42] shows that when uncorrelated Gaussian are quantized with uniform quantizers of quantization step sizes $\Delta_{k}$ ’s, $\epsilon_{k}$ ’s are independent zero mean uniform random variables with variance $\sigma^{2}_{\epsilon_{k}}\!\approx\!\frac{\Delta_{k}^{2}}{12}$ . Also, $\epsilon_{k}$ ’s and $x_{k}$ ’s are uncorrelated., and natural binary encoder at the sensors to map quantization levels to information bits), we can establish a linear relationship between $\hat{\boldsymbol{m}}$ and $\boldsymbol{\theta}$ , that is $\hat{\boldsymbol{m}}=\boldsymbol{H}\boldsymbol{\theta}+\boldsymbol{\nu}$ , where $\boldsymbol{\nu}$ is a zero-mean vector with covariance $\boldsymbol{\mathcal{C}}_{\boldsymbol{\nu}}$ , and show that for this linear model the unbiasedness requirement is met, i.e., $\mathbb{E}\{\hat{\boldsymbol{m}}\}=\boldsymbol{H}\boldsymbol{\theta}$ . Then using this linear model, we derive BLUE and its corresponding MSE matrix as the following [41]:

[TABLE]

First we verify the unbiasedness requirement under the three stated condition. Under these three conditions, we can use the approximations given in [43] and write:

[TABLE]

Equation (VIII-B) shows that the unbiasedness constraint is satisfied. Next, we establish the linear relationship $\hat{\boldsymbol{m}}=\boldsymbol{H}\boldsymbol{\theta}+\boldsymbol{\nu}$ , where $\boldsymbol{\nu}$ is a zero-mean vector with covariance $\boldsymbol{\mathcal{C}}_{\boldsymbol{\nu}}$ , and we find $\boldsymbol{\mathcal{C}}_{\boldsymbol{\nu}}$ . Knowing $\boldsymbol{\mathcal{C}}_{\boldsymbol{\nu}}$ and $\boldsymbol{H}$ we can then use (VIII-B) to express BLUE and its corresponding MSE. To establish the linear relationship, suppose:

[TABLE]

where $\nu_{k}$ is zero-mean with variance $var(\nu_{k})=var(\hat{m}_{k})$ . The equivalent vector-matrix representation of (52) becomes $\hat{\boldsymbol{m}}=\boldsymbol{H}\boldsymbol{\theta}+\boldsymbol{\nu}$ , in which $\boldsymbol{\nu}=[\nu_{1},...,\nu_{K}]^{T}$ , $\boldsymbol{\mathcal{C}}_{\boldsymbol{\nu}}=\boldsymbol{\mathcal{C}}_{\hat{\boldsymbol{m}}}$ , and $\boldsymbol{\mathcal{C}}_{\hat{\boldsymbol{m}}}$ denotes the covariance matrix of vector $\hat{\boldsymbol{m}}$ . Hence, to find $\boldsymbol{\mathcal{C}}_{\boldsymbol{\nu}}$ we need to find $\boldsymbol{\mathcal{C}}_{\hat{\boldsymbol{m}}}$ . Let $[\boldsymbol{\mathcal{C}}_{\hat{\boldsymbol{m}}}]_{kl}$ be the $(k,l)$ -th entry of matrix $\boldsymbol{\mathcal{C}}_{\hat{\boldsymbol{m}}}$ . Starting with the diagonal entries of $\boldsymbol{\mathcal{C}}_{\hat{\boldsymbol{m}}}$ , we find $[\boldsymbol{\mathcal{C}}_{\hat{\boldsymbol{m}}}]_{kk}=var(\hat{m}_{k})$ . Under the three stated conditions, we can use the approximations given in [43] and write:

[TABLE]

where ${\Delta_{k}}\!=\!\frac{2\tau_{k}}{(2^{L_{k}}-1)}$ . Next, we compute the non-diagonal elements ${[\boldsymbol{\mathcal{C}}_{\hat{\boldsymbol{m}}}]}_{kl}\!=\!\mathbb{E}\{\hat{m}_{k}\hat{m}_{l}\}\!-\!\mathbb{E}\{\hat{m}_{k}\}\mathbb{E}\{\hat{m}_{l}\}$ , where the mean $\mathbb{E}\{\hat{m}_{k}\}$ is given in (VIII-B). Hence, we need to find $\mathbb{E}\{\hat{m}_{k}\hat{m}_{l}\}$ as the following:

[TABLE]

in which ( $a$ ) follows from the fact that, given $m_{k},m_{l}$ , then $\hat{m}_{k},\hat{m}_{l}$ are independent, ( $b$ ) comes from (VIII-B), ( $c$ ) is obtained from the fact that the quantization noises $\epsilon_{k}$ ’s are uncorrelated from each other, and $\epsilon_{k}$ ’s and $x_{k}$ ’s are uncorrelated, and ( $d$ ) follows from (VIII-B). Recall according to (VIII-B) $var(\hat{m}_{k})\leq\Upsilon_{k}$ . Let $\boldsymbol{\mathcal{C}}_{\boldsymbol{Q}}=\text{diag}(\Upsilon_{1},...,\Upsilon_{K})$ be a diagonal matrix. Clearly by the construction of $\boldsymbol{\mathcal{C}}_{\boldsymbol{Q}}$ we have $\boldsymbol{\mathcal{C}}_{\hat{\boldsymbol{m}}}\preceq\boldsymbol{\mathcal{C}}_{\boldsymbol{Q}}$ and thus $\boldsymbol{\mathcal{C}}_{\boldsymbol{\nu}}\preceq\boldsymbol{\mathcal{C}}_{\boldsymbol{Q}}$ . Replacing $\boldsymbol{\mathcal{C}}_{\boldsymbol{\nu}}$ with its upper bound $\boldsymbol{\mathcal{C}}_{\boldsymbol{Q}}$ and substituting $\boldsymbol{H}$ in (VIII-B), we find quasi BLUE and its corresponding MSE matrix as shown in (54). The notion of quasi BLUE in the context of distributed estimation of an unknown deterministic scalar has been used before in [15, 20], where an upper bound on the variance of the data at the FC (based on which BLUE is formed) is utilized, instead of the variance of the data itself, to derive the unbiased estimator and its corresponding MSE.

VIII-C Behavior of the classical FIM and the MSE of BLUE in low-region and high-region of $P_{tot}$

Consider coherent receiver where we model the channel between sensor $k$ and the FC as a BSC with the probability of flipping a bit ${\cal E}_{k}=Q(2\gamma_{k})$ and $\gamma_{k}$ , defined in (14), depends on $P_{k}$ . In low-region of $P_{tot}$ (when $P_{k}\rightarrow 0$ ) we have ${\cal E}_{k}\rightarrow\frac{1}{2}$ (worst communication channel effect). Then (15) implies that $\alpha_{k,t,l}\approx\frac{1}{2^{L_{k}}}$ and one can show that $G_{k}(\boldsymbol{\theta})\rightarrow 0$ . Therefore ${\boldsymbol{J}}_{c}\rightarrow\boldsymbol{0}$ . On the contrary, in high-region of $P_{tot}$ (when $P_{k}\rightarrow\infty$ ) we have ${\cal E}_{k}\rightarrow 0$ . This implies that

[TABLE]

Then one can show that $G_{k}(\boldsymbol{\theta})\!\rightarrow\!G_{k}^{ideal}(\boldsymbol{\theta})$ and ${\boldsymbol{J}}_{c}\!\rightarrow\!{\boldsymbol{J}}_{c}^{ideal}$ , where $G_{k}^{ideal}(\boldsymbol{\theta})$ is given in Section III-C and ${\boldsymbol{J}}_{c}^{ideal}$ is obtained from (49) after substituting $G_{k}(\boldsymbol{\theta})$ with $G_{k}^{ideal}(\boldsymbol{\theta})$ . Similar discussions can be made and similar conclusions can be reached for both types of noncoherent receivers. For coherent receiver in low-region of $P_{tot}$ (when $P_{k}\rightarrow 0$ ) we have ${\cal E}_{k}\rightarrow\frac{1}{2}$ . Examining (54) we realize that this implies ${\boldsymbol{\mathcal{D}}}_{QBLUE}\rightarrow\boldsymbol{\infty}$ . On the contrary, in high-region of $P_{tot}$ (when $P_{k}\rightarrow\infty$ ) we have ${\cal E}_{k}\rightarrow 0$ and ${\boldsymbol{\mathcal{D}}}_{QBLUE}\rightarrow{\boldsymbol{\mathcal{D}}}_{QBLUE}^{ideal}={(\sum_{k=1}^{K}\frac{\mathbf{a}_{k}\mathbf{a}_{k}^{T}}{\sigma_{n_{k}}^{2}+\frac{\Delta_{k}^{2}}{12}})}^{-1}$ , where ${\boldsymbol{\mathcal{D}}}_{QBLUE}^{ideal}$ denotes ${\boldsymbol{\mathcal{D}}}_{QBLUE}$ when communication channels between sensors and the FC are error free.

VIII-D Transmit Power Optimization Using MSE of Quasi BLUE and Classical FIM

One can consider the following constrained transmit power optimization problem, where trace (or log-determinant) of ${\boldsymbol{\mathcal{D}}}_{QBLUE}$ is minimized, subject to the network transmit power constraint as follows:

[TABLE]

It is straightforward to show $\frac{\partial\,\text{tr}({\boldsymbol{\mathcal{D}}}_{QBLUE})}{\partial P_{k}}\!<\!0$ . This implies tr $({\boldsymbol{\mathcal{D}}}_{QBLUE})$ is a decreasing function of $P_{k}$ ’s and the constraint holds with equality. Furthermore, we have $\frac{\partial^{2}\text{tr}({\boldsymbol{\mathcal{D}}}_{QBLUE})}{\partial P_{k}^{2}}\!>\!0$ , implying that the Hessian is a positive definite matrix and $\text{tr}({\boldsymbol{\mathcal{D}}}_{QBLUE})$ is jointly convex over $P_{k}$ ’s. Moreover, the constraints are linear, and thus, the problem in (55) is convex. We could not find a closed-form solution for $P_{k}$ ’s. One needs to solve (55) numerically to find the optimal $P_{k}$ ’s. Since the problem is convex, it is guaranteed that the numerical solution (obtained via the numerical search algorithm) is globally optimal. Since the cost function in (55) can be decoupled over $P_{k}$ ’s the solution can be implemented in a distributed fashion.

On the other hand, a constrained optimization problem based on maximizing tarce (or log-determinant) of classical FIM ${\boldsymbol{J}}_{c}$ in (49) is not meaningful, since ${\boldsymbol{J}}_{c}$ depends on $\boldsymbol{\theta}$ and thus the power allocation is not realizable.

IX Numerical Results

In this section through simulations we corroborate our analytical results. Our analytical results are valid as long as sensors use symmetric mid-rise quantizers. We consider uniform quantizer [16, 22, 28], and Lloyd-Max quantizer [44]. For the uniform quantizer, quantization levels are $m_{k,l}\!=\!\frac{(2l-1-M_{k})\Delta_{k}}{2}$ for $l\!=\!1,...,M_{k}$ and quantization boundaries are $u_{k,l}\!=\!\frac{(2l-2-M_{k})\Delta_{k}}{2}$ for $l\!=\!2,...,M_{k}$ , where $\Delta_{k}$ denotes the quantization step size. Similar to [16], we assume $x_{k}$ lies in the interval $[-\tau_{k},\tau_{k}]$ with a high probability for some reasonably large666Consider quantizing a zero-mean Gaussian $x_{k}$ . For $\tau_{k}\!=\!3\sigma_{x_{k}}$ we have $p(|x_{k}|\!\geq\!\tau_{k})\!=\!2\Phi(-3)\!=\!2.6\times{10}^{-3}$ and for $\tau_{k}\!=\!5\sigma_{x_{k}}$ we have $p(|x_{k}|\!\geq\!\tau_{k})\!=\!2\Phi(-5)\!=\!2.86\times{10}^{-5}$ , where $\Phi(.)$ is the cumulative distribution function of the standard Gaussian random variable. On the other hand, $\tau_{k}$ can be decided by the sensor’s sensing dynamic range, considering its hardware limitation and sensing capability [15]. $\tau_{k}$ , i.e., $p(|x_{k}|\!\geq\!\tau_{k})\!\approx\!0$ . To this end, we assume $\tau_{k}=3\sigma_{k}$ where $\sigma_{k}$ is defined in (43). Hence, we choose ${\Delta_{k}}\!=\!\frac{2\tau_{k}}{(2^{L_{k}}-1)}$ [16, 22]. For the Lloyd-Max quantizer, quantization levels are $m_{k,l}\!=\!\frac{\int_{u_{k,l}}^{u_{k,l+1}}x_{k}f(x_{k})dx_{k}}{\int_{u_{k,l}}^{u_{k,l+1}}f(x_{k})dx_{k}}$ for $l\!=\!1,...,M_{k}$ and quantization boundaries are $u_{k,l}\!=\!\frac{m_{k,l-1}+m_{k,l}}{2}$ for $l\!=\!2,...,M_{k}$ that can be found via iterative design.

IX-A Comparison of WWB, Bayesian CRB, and MSE of LMMSE Estimator

We numerically compare traces of the MSE matrix of LMMSE estimator, the WWB matrix and the Bayesian CRB matrix in Fig. 4 for various $P_{tot}$ , assuming $P_{tot}$ is uniformly distributed among sensors, and uniform quantization and coherent receiver are employed. The figure suggests that the WWB is a tighter bound, compared to the Bayesian CRB. Similar observations can be made for two types of noncoherent receivers, and also when we compare the determinant of these three matrices. Due to lack of space, we have omitted those plots.

IX-B Behavior of tr $(\boldsymbol{J})$ and $|\boldsymbol{J}|$ in terms of $P_{tot}$ and Quantizer

Without loss of generality and for the simplicity of presentation, we let $K\!\!=\!\!2$ and consider a zero-mean Gaussian vector $\boldsymbol{\theta}=\left[\theta_{1},\theta_{2}\right]^{T}$ with $\boldsymbol{\cal C}_{\boldsymbol{\theta}}=[4,0.5;0.5,0.25]$ . We assume $\mathbf{a}_{k}=[0.6,0.8]^{T}$ , $\sigma_{n_{k}}\!=\!1$ , $\sigma_{w_{k}}\!=\!1,L_{k}=3$ bits, $\forall k$ .

Assuming $|h_{k}|\!=\!0.5$ , Fig. 5 depicts tr $(\boldsymbol{J})$ and $|\boldsymbol{J}|$ versus $P_{tot}$ for coherent receiver, considering both uniform and Lloyd-Max quantizers. Fig. 5 shows as $P_{tot}$ increases, both metrics increase and asymptotically approach their corresponding baseline (i.e., centralized estimation when full precision observations are used to derive Bayesian FIM and form $\hat{\boldsymbol{\theta}}$ ). There is also a gap between each metric and its corresponding baseline, which is due to quantization. Note that this gap for Lloyd-Max quantizer is smaller than that of uniform quantizer. Comparing Lloyd-Max and uniform quantizers, we observe that when $P_{tot}$ is less than a certain threshold (which depends on the network setup parameters), the latter slightly outperforms the former, and when $P_{tot}$ is greater than the threshold, the former outperforms the latter. As $L_{k}$ increases, this threshold becomes larger and the performance of both quantizers get closer to each other. The behaviors of tr $(\boldsymbol{J})$ and $|\boldsymbol{J}|$ for noncoherent receivers are the same as those of coherent receiver, hence are omitted due to lack of space. Regarding the behaviors of the two metrics with respect to the observation model parameters, we state that tr $(\boldsymbol{J})$ and $|\boldsymbol{J}|$ increase as the variance of observation noise $\sigma_{n_{k}}^{2}$ decreases.

IX-C FIM-max vs. Uniform Power Allocation

We investigate how the behavior of tr $(\boldsymbol{J})$ changes as communication channel and observation model parameters vary. Let $\overline{\delta}_{k}=\frac{\sigma_{h_{k}}^{2}}{\sigma_{w_{k}}^{2}}$ . For coherent receiver, Fig. 6(a) plots tr $(\boldsymbol{J})$ evaluated at the corresponding optimal power allocation (i.e., $P_{k}$ ’s are the solutions of the problem in (II)) versus $P_{tot}$ , for both uniform and Lloyd-Max quantizers, when $\sigma_{n_{1}}\!=\!\sigma_{n_{2}}\!=\!1$ , $\overline{\delta}_{1}\!=\!2$ dB, $\overline{\delta}_{2}\!=\!14$ dB.

Fig. 6(b) plots the same, with the difference that $\sigma_{n_{1}}\!=\!4,\sigma_{n_{2}}\!=\!0.5$ , $\overline{\delta}_{1}\!=\!\overline{\delta}_{2}\!=\!4$ dB. To demonstrate the effectiveness of the proposed FIM-max schemes, we also include tr $(\boldsymbol{J})$ evaluated at uniform power allocation $P_{k}\!=\!P_{tot}/K$ in these figures. Overall, Fig. 6(a), Fig. 6(b) show that for coherent receiver the proposed FIM-max schemes outperform uniform power allocation, for both quantizers and for all ranges of $P_{tot}$ . Moreover, it is evident that Lloyd-Max quantizer outperforms uniform quantizer in moderate-region to high-region of $P_{tot}$ . Similar observations can be made for two types of noncoherent receivers, and also when the optimization metric is $|\boldsymbol{J}|$ (i.e., $P_{k}$ ’s are the solutions of the problem in (II)). Due to lack of space, we have omitted those plots. Comparing three types of receivers, our simulations demonstrate that for a given $P_{tot}$ , coherent receiver and noncoherent receiver with known channel statistics have the best and the worst performance in terms of tr $(\boldsymbol{J})$ and $|\boldsymbol{J}|$ .

IX-D Behavior of FIM-max Power Allocation Across Sensors

We study the behavior of the FIM-max power allocation across sensors as $P_{tot}$ increases. Recall $\delta_{k}=\frac{|h_{k}|^{2}}{2\sigma_{w_{k}}^{2}}$ . We let $K\!=\!3$ , $\delta_{1}\!=\!14,\delta_{2}\!=\!8,\delta_{3}\!=\!2$ , $\mathbf{a}_{k}=[0.6,0.8]^{T}$ , $\sigma_{n_{k}}\!=\!1$ , $L_{k}=3$ bits, $\forall k$ . Fig. 7 illustrates $\{10\text{log}_{10}(P_{k})\}_{k=1}^{3}$ versus $P_{tot}$ for coherent receiver, where $P_{k}$ ’s are the solutions of the problem in (II), for both uniform and Lloyd-Max quantizers. Regarding Fig. 7 we make the following four observations: 1) $P_{k}$ increases as $P_{tot}$ increases, 2) the power allocations obtained for Lloyd-Max quantizer are very close to those obtained for uniform quantizer, 3) when $P_{tot}$ is small, only sensor 1 is active, and as $P_{tot}$ increases, sensors 2 and 3 become active in a sequential order, 4) in low-region of $P_{tot}$ , a sensor with a larger $\delta_{k}$ is allotted a larger $P_{k}$ (water filling), and in high-region of $P_{tot}$ , a sensor with a smaller $\delta_{k}$ is allotted a larger $P_{k}$ (inverse of water filling). Although we don’t have a closed-form solution for $P_{k}$ ’s, our conjecture is that its change of behavior in terms of $P_{tot}$ , can be explained by examining the $P_{k}$ ’s solution provided in [22], where the authors have considered a related problem. In particular, [22] considered minimizing an upper bound on the MSE of LMMSE estimator, subject to a network transmit power constraint, given quantization bits. For coherent receiver, based on the closed-form solutions of $P_{k}$ ’s the authors in [22] found the following:

[TABLE]

Equation (58) shows that the behavior of $P_{k}$ ’s can change, depending on whether $\delta_{k}$ is larger or smaller than the threshold $\delta^{th}_{k}=\frac{e{\lambda}^{*}}{\alpha_{k}}$ . The parameter $\alpha_{k}$ in (58) depends on the observation vectors and quantization. The optimal value of Lagrange multiplier ${\lambda}^{*}$ in (58) is related to $P_{tot}$ according to ${\lambda}^{*}=e^{a(-P_{tot}+b)}$ where $a>0,b$ are common terms among sensors. Revisiting the results in [22], now we return to Fig. 7. Given the observation vectors and quantization (given $\alpha_{k}$ ) and given $\delta_{k}$ , suppose $P_{tot}$ increases. Increasing $P_{tot}$ implies that ${\lambda}^{*}$ and thus the thresholds $\delta^{th}_{k}$ ’s decrease. Therefore, $\delta_{k}$ ’s are being compared against smaller thresholds $\delta^{th}_{k}$ ’s. In high-region of $P_{tot}$ the thresholds $\delta^{th}_{k}$ ’s are so small that each $\delta_{k}$ exceeds $\delta^{th}_{k}$ (all channels can be viewed as “strong”). In this case, the allocation of power among sensors is such that, if $\delta_{1}\!<\!\delta_{2}\!<\!\delta_{3}$ then $P_{3}\!<\!P_{2}\!<\!P_{1},$ (the sensor with a less stronger channel is allocated more transmit power). In contrary, given $\alpha_{k}$ and given $\delta_{k}$ suppose $P_{tot}$ decreases. Decreasing $P_{tot}$ implies that ${\lambda}^{*}$ and thus the thresholds $\delta^{th}_{k}$ ’s increase. Hence, $\delta_{k}$ ’s are being compared against larger thresholds $\delta^{th}_{k}$ ’s. In low-region of $P_{tot}$ the thresholds $\delta^{th}_{k}$ ’s are so large that each $\delta_{k}$ is below $\delta^{th}_{k}$ (all channels can be viewed as “weak”). In this case, the allocation of power among sensors is such that, if $\delta_{1}\!<\!\delta_{2}\!<\!\delta_{3}$ then $P_{1}\!<\!P_{2}\!<\!P_{3},$ (the sensor with a less weaker channel is allocated more transmit power).

Note that the behavior of $P_{k}$ ’s as the solutions of the problem in (II) with respect to $P_{tot}$ is analogous to that depicted in Fig. 7. Moreover, the behavior of $P_{k}$ ’s for two types of noncoherent receivers are similar to that of coherent receiver. Due to lack of space, we have omitted those plots.

IX-E FIM-max vs. MSE-min Power Allocation

We explore how the FIM-max schemes are compared with the power allocation that can be obtained from constrained minimization of the MSE of the LMMSE estimator derived in Section VI. Let ${\cal D}_{l}\!=\!\text{tr}(\boldsymbol{\mathcal{D}}(\text{FIM-max}))$ and ${\cal D}_{unif}=\text{tr}(\boldsymbol{\mathcal{D}}(\{P_{k}\!=\!P_{tot}/K\}_{k=1}^{K}))$ , denote trace of $\boldsymbol{\mathcal{D}}$ at $P_{k}$ ’s obtained from solving the problem in (II) and uniform power allocation, respectively. Fig. 8(a) and Fig. 8(b) illustrate the numerical evaluations of ${\cal D}_{m}$ , ${\cal D}_{t}$ defined in Section VII-A, as well as ${\cal D}_{l}$ , ${\cal D}_{unif}$ , versus $P_{tot}$ for coherent receiver and two types of noncoherent receivers, respectively, and for the same setup parameters as Fig. 6(a). To fairly compare the performance of different receivers, we obtain the numerical results for coherent receiver and noncoherent receiver with known channel envelopes by taking expectation over fading channel envelope vector $\boldsymbol{|h|}$ , such that $\mathbb{E}\left[|h_{k}|^{2}\right]=2\sigma_{h_{k}}^{2},\forall k$ . Fig. 8(c) and Fig. 8(d) plot the same as Fig. 8(a) and Fig. 8(b), with different setup parameters though (the same parameters as Fig. 6(b)). These figures show ${\cal D}_{m}\leq{\cal D}_{l}{\color[rgb]{0.2,0.3,0.8}\approx}{\cal D}_{t}\leq{\cal D}_{unif}$ for all three receivers and all ranges of $P_{tot}$ , i.e., performance of both FIM-max schemes are very close to that of MSE-min scheme (when we average over $\boldsymbol{|h|}$ ). We also plot tr(CRB(tr-FIM-max)) versus $P_{tot}$ for coherent receiver. Fig. 8(a) and Fig. 8(c) illustrate the inequality $\mbox{$ {\mbox{tr}} $}(\text{CRB}({\text{FIM-max}}))\!<\!{\cal D}_{m}$ in (47). The same observation is made for two types of noncoherent receivers. Due to lack of space, these plots are omitted.

It is worth mentioning that from the estimation theory we know ${\cal D}_{m}\!<\!{\cal D}_{t}$ and ${\cal D}_{m}\!<\!{\cal D}_{l}$ . What our simulations suggest is that in our problem they are indeed very close to each other. This observation is very important since it indicates that, although Bayesian CRB is not attainable in our problem and the WWB is tighter than Bayesian CRB, it is still proper to use FIM-max power allocation (instead of power allocation that minimizes the WWB or the MSE of the LMMSE estimator), since the differences ${\cal D}_{m}\!-\!{\cal D}_{t}$ and ${\cal D}_{m}\!-\!{\cal D}_{l}$ are small and not significant. While in low-region and high-region of $P_{tot}$ , ${\cal D}_{t}$ and ${\cal D}_{l}$ are much closer to ${\cal D}_{m}$ , in moderate-region of $P_{tot}$ , there is a small gap between them. Comparing three types of receivers for a given $P_{tot}$ , coherent receiver and noncoherent receiver with known channel statistics have the best and the worst performance. Similar observations can be made for Lloyd-Max quantizers. Due to lack of space we have omitted those plots.

IX-F Estimation Performance of a Randomly Deployed Network

We investigate the impact of network size $K$ on the MSE performance and compare tr(MSE) that is evaluated at different transmit power allocation. We assume $K\!=\!20$ sensors are randomly deployed in a $2m\times 2m$ field, where the origin is the center of the field, and compare the numerical results with $K\!=\!2$ sensors. We consider a zero-mean Gaussian vector $\boldsymbol{\theta}=\left[\theta_{1},\theta_{2}\right]^{T}$ with $\boldsymbol{\cal C}_{\boldsymbol{\theta}}=[4,0.5;0.5,0.25]$ . The distance between each external signal source $\theta_{i}$ located at $(x_{t_{i}},y_{t_{i}})$ and sensor $k$ located at $(x_{s_{k}},y_{s_{k}})$ is:

[TABLE]

Let $d_{0i}$ be the distance of source $\theta_{i}$ from the origin. Without loss of generality, we assume $d_{01}=d_{02}=1m$ . To characterize the observation gain vectors $\mathbf{a}_{k},\forall k$ in (1) we adopt an isotropic intensity attenuation model, where $\mathbf{a}_{k}\!=\![(\frac{d_{01}}{d_{k1}})^{n},(\frac{d_{02}}{d_{k2}})^{n}]^{T}$ and $n$ is the signal decay exponent which is approximately 2 for distances $\leq\!1km$ [45]. We assume $\sigma_{w_{k}}=1$ . For coherent receiver and noncoherent receiver with known channel envelopes we let $|h_{k}|\!=\!1,\forall k$ , and for noncoherent receiver with known channel statistics we let $\sigma_{h_{k}}\!=\!1,\forall k$ .

Fig. 9 plots ${\cal D}_{m},{\cal D}_{l},{\cal D}_{t},{\cal D}_{unif}$ versus $P_{tot}$ for coherent receiver, and both uniform and Lloyd-Max quantizers. Fig. 9 demonstrates the superiority of FIM-max schemes, compared to uniform power allocation for all ranges of $P_{tot}$ . Furthermore, the observation ${\cal D}_{l}\leq{\cal D}_{t}$ , suggests that log-det-FIM-max power allocation is closer to MSE-min power allocation, compared to tr-FIM-max power allocation (for a given realization of $\boldsymbol{|h|}$ ). This is intuitively appealing, since the Bayesian FIM $\boldsymbol{J}$ is not a diagonal matrix and log-det-FIM-max power allocation extracts and utilizes more information from $\boldsymbol{J}$ , compared to tr-FIM-max power allocation. Similar observations can be made for two types of noncoherent receivers. Due to lack of space we have omitted those plots.

X Conclusions

We derived the Bayesian FIM $\boldsymbol{J}$ and the WWB for distributed estimation of a Gaussian vector, when sensors transmit their digitally modulated quantized observations to the FC over power-constrained orthogonal noisy fading channels. We formulated and addressed constrained maximization of tr $(\boldsymbol{J})$ and log ${}_{2}(|\boldsymbol{J}|)$ under the constraint on $P_{tot}$ . We also derived the LMMSE estimator and its corresponding MSE. Through simulations we observed that both tr $(\boldsymbol{J})$ and $|\boldsymbol{J}|$ increase as $P_{tot}$ increases. Regarding the solutions of the formulated constrained maximization problems, we noticed that in low-region and high-region of $P_{tot}$ , $P_{tot}$ is alloted among sensors in a water filling and inverse of water filling fashion, respectively. We also considered the power allocation solution obtained from minimizing the MSE of the LMMSE estimator (MSE-min scheme). Numerical results demonstrated the effectiveness of FIM-max schemes for different network setup parameters, as the MSE associated with FIM-max schemes are very close to that of MSE-min scheme and outperform that of uniform power allocation in all simulation scenarios. These suggest that, although the WWB is tighter than the Bayesian CRB in our problem (and Bayesian CRB is not attainable), it is still appropriate to use FIM-max schemes, since the performance loss in terms of the MSE of the LMMSE estimator is not significant. Comparing the performance of three types of receivers, our numerical results revealed that coherent receiver and noncoherent receiver with known channel statistics have the best and the worst performance, respectively. Comparing uniform and Lloyd-Max quantizers, we observed that the latter outperforms the former in moderate-region to high-region of $P_{tot}$ for all receivers.

Appendix A Appendix

A-A Proof of Lemma 1

By using the Bayes’ rule, we have:

[TABLE]

Since the communication channels are orthogonal and communication channel noises are independent, we can write:

[TABLE]

Moreover, given $\boldsymbol{m}$ , $\boldsymbol{\hat{m}}$ depends on communication channel noises and $\boldsymbol{\theta}$ depends on observation noises. However, observation and channel noises are two independent random processes. Hence, given $\boldsymbol{m}$ , $\boldsymbol{\theta}$ and $\boldsymbol{\hat{m}}$ are conditionally independent. That is, $\boldsymbol{\theta}$ , $\boldsymbol{m}$ and $\boldsymbol{\hat{m}}$ form a Markov chain777We say that random variables $x,y,z$ form a Markov chain, denoted by $x\rightarrow y\rightarrow z$ , if Markov property holds $p(z|x,y)=p(z|y)$ [36]. and we conclude:

[TABLE]

Combining (60) and (61), $p(\boldsymbol{\hat{m}}\arrowvert\boldsymbol{m},\boldsymbol{\theta})$ in (59) becomes:

[TABLE]

Let $\boldsymbol{x}\!=\![x_{1},...,x_{K}]^{T}$ be the observation vector. Since Gaussian observation noises $n_{k}$ ’s are uncorrelated across the sensors and also uncorrelated with Gaussian $\theta$ , we have $f(\boldsymbol{x}\arrowvert\boldsymbol{\theta})=\prod_{k=1}^{K}f(x_{k}\arrowvert\boldsymbol{\theta})$ . This implies:

[TABLE]

Substituting (62) and (63) in (59), we reach (64) bellow in which ( $a$ ) is obtained from some straightforward mathematical manipulations and $(b)$ is obtained using the Bayes’ rule and the fact that $\boldsymbol{\theta},m_{k},\hat{m}_{k}$ form a Markov chain.

A-B Proof of Lemma 2

Given the assumptions made in lemma 2 and the number of quantization bits $L_{k}$ , Fig. 10 illustrates how the noisy observation $x_{k}$ is quantized and encoded. Define $p\left(u_{k,l}<x_{k}\leq u_{k,l+1}\right)\!=\!p_{k,l}$ , where $u_{k,l}$ ’s are the quantization boundaries specified in Section II. Since $x_{k}$ has a symmetric pdf and the quantizer is symmetric, we have:

[TABLE]

Define $o_{k,l}$ as the number of ones in encoded quantization index $l$ . When the quantization indices are encoded using natural binary coding we can show that $o_{k,l}=L_{k}-o_{k,M_{k}-l+1}$ . Therefore, the prior probability $p({\cal H}_{1,i}),\ i=1,...,L_{k}$ can be computed as:

[TABLE]

Similarly, we can show that $p({\cal H}_{0,i})=1/2$ .

A-C Calculation of $\boldsymbol{\mathcal{I}}_{k,l}^{1}$ in (39), and ${\mathcal{I}}_{k,l}^{2}$ and ${\mathcal{I}}_{i,j,l_{1},l_{2}}^{3}$ in (41)

We first calculate ${\mathcal{I}}_{k,l}^{2}$ . We consider the eigenvalue decomposition of $\boldsymbol{\mathcal{C}}_{\boldsymbol{\theta}}\!=\!\boldsymbol{U}\boldsymbol{\Sigma}\boldsymbol{U}^{T}$ where $|\boldsymbol{\mathcal{C}}_{\boldsymbol{\theta}}|\!=\!|\boldsymbol{\Sigma}|$ , $|U|\!=\!\pm 1$ . We define $\boldsymbol{v}\!=\!\boldsymbol{U}^{T}\boldsymbol{\theta}$ and therefore $d\boldsymbol{v}\!=\!{|\boldsymbol{U}|}^{q}d\boldsymbol{\theta}$ [46], and also $\boldsymbol{\psi}_{k}\!=\!\boldsymbol{U}^{T}\mathbf{a}_{k}$ in which $\mathbf{a}_{k}$ is sensor $k$ observation gain vector. Using these definitions and changes of variables along with the definition of $\beta_{k,l}(\boldsymbol{\theta})$ in (III), ${\mathcal{I}}_{k,l}^{2}$ becomes:

[TABLE]

where $s_{1k}\!=\!\frac{1}{\sqrt{(2\pi)^{q+1}|\boldsymbol{\Sigma}|}\sigma_{n}{|U|}^{q}}$ and $V_{v}$ denotes the $q$ -dimensional volume over which we take integral in the new coordinate. After expanding the argument of exponential function of the integrand and using completing square, and defining:

[TABLE]

${\mathcal{I}}_{k,l}^{2}$ can be obtained as in (65), in which $s_{2k}\!=\!\frac{\sqrt{\left|{\boldsymbol{Q}}_{k}\right|}}{\sqrt{2\pi|\boldsymbol{\Sigma}|}\sigma_{n_{k}}}$ , and for the second equality, we have used the fact that integral of pdf of Gaussian random vector $\boldsymbol{v}$ over $V_{v}$ is equal to 1. The term ${|U|}^{q}\!=\!\pm 1$ in the denominator of $s_{1k}$ is absorbed in the integration over $\boldsymbol{v}$ , because the effects of change of variable from $\boldsymbol{\theta}$ to $\boldsymbol{v}$ on $V_{\theta}$ to $V_{v}$ and $d\boldsymbol{\theta}$ to $d\boldsymbol{v}$ cancel each other. Since $\left|{\boldsymbol{Q}}_{k}\right|\!=\!1\big{/}\left|{\boldsymbol{Q}}_{k}^{-1}\right|$ , using the Matrix Determinant Lemma which performs a rank-1 update to a determinant [47], we obtain:

[TABLE]

and therefore $s_{2k}\!=\!\frac{1}{\sqrt{2\pi\left(\sigma_{n_{k}}^{2}+{\boldsymbol{\psi}}_{k}^{T}\boldsymbol{\Sigma}\boldsymbol{\psi}_{k}\right)}}$ . One can also use the Binomial Inversion Lemma [47] to compute ${\boldsymbol{Q}}_{k}$ in (66) as:

[TABLE]

Substituting (67) in (66) and (65), we obtain:

[TABLE]

From the definition of $\boldsymbol{\psi}_{k}$ we have ${\boldsymbol{\psi}}_{k}^{T}\boldsymbol{\Sigma}\boldsymbol{\psi}_{k}\!=\!\mathbf{a}_{k}^{T}\boldsymbol{\mathcal{C}}_{\boldsymbol{\theta}}\mathbf{a}_{k}$ . Having $\sigma_{k}$ from (43), we conclude:

[TABLE]

Taking a similar approach, we can calculate $\boldsymbol{\mathcal{I}}_{k,l}^{1}$ in (39) and ${\mathcal{I}}_{i,j,l_{1},l_{2}}^{3}$ in (41).

Bibliography47

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] M. Shirazi and A. Vosoughi, “Bayesian Cramer-Rao bound for distributed vector estimation with linear observation model,” in 2014 IEEE 25th Annual International Symposium on Personal, Indoor, and Mobile Radio Communication (PIMRC) , Sep 2014, pp. 712–716.
2[2] ——, “Bayesian Cramer-Rao bound for distributed estimation of correlated data with non-linear observation model,” in 2014 48th Asilomar Conference on Signals, Systems and Computers , 2014, pp. 1484–1488.
3[3] M. Hosseini, A. S. Maida, M. Hosseini, and G. Raju, “Inception-inspired lstm for next-frame video prediction,” 2019.
4[4] M. J. Moghaddam, M. Hosseini, and R. Safabakhsh, “Traffic light control based on fuzzy q-leaming,” in 2015 The International Symposium on Artificial Intelligence and Signal Processing (AISP) , 2015, pp. 124–128.
5[5] A. Sani and A. Vosoughi, “Resource allocation optimization for distributed vector estimation with digital transmission,” in 2014 48th Asilomar Conference on Signals, Systems and Computers , 2014.
6[6] A. Amar, A. Leshem, and M. Gastpar, “Recursive implementation of the distributed karhunen-loève transform,” IEEE Transactions on Signal Processing , vol. 58, no. 10, pp. 5320–5330, Oct 2010.
7[7] L. Gispan, A. Leshem, and Y. Be’ery, “Decentralized estimation of regression coefficients in sensor networks,” Digital Signal Processing , vol. 68, pp. 16 – 23, 2017.
8[8] I. D. Schizas, G. B. Giannakis, and Z. Luo, “Distributed estimation using reduced-dimensionality sensor observations,” IEEE Transactions on Signal Processing , vol. 55, no. 8, pp. 4284–4299, Aug 2007.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

On Bayesian Fisher Information Maximization for Distributed Vector Estimation

Abstract

Index Terms:

I Introduction

II System Model and Problem Formulation

III Characterization of Bayesian FIM

Lemma 1**.**

Proof.

III-A Coherent Receiver

III-B Noncoherent Receiver

Lemma 2**.**

III-C *Finding Bayesian FIM J\boldsymbol{J}J in (III) *

Remark 1**.**

IV WWB Bound: Derivation and Computation

IV-A Deriving μ(r)\mu(\boldsymbol{r})μ(r) in (26) Based on Our System Model

IV-B Computation of the Tightest WWB

V Power Constrained Bayesian Fisher Information Maximization

V-A Solving Optimization Problem in (II)

V-B Solving Optimization Problem in (II)

Remark 2**.**

VI LMMSE Estimator and its MSE

Remark 3**.**

VII Discussion on Appropriateness and Achievability of Bayesian CRB

VII-A Appropriateness of Bayesian FIM as the Optimization Metric

VII-B Tightness and Achievability of Bayesian CRB

VIII Classical CRB and BLUE for Estimating Deterministic Vector θ\boldsymbol{\theta}θ

VIII-A Characterization of Classical FIM

VIII-B Characterization of BLUE and its MSE Matrix

VIII-C Behavior of the classical FIM and the MSE of BLUE in low-region and high-region of PtotP_{tot}Ptot​

VIII-D Transmit Power Optimization Using MSE of Quasi BLUE and Classical FIM

IX Numerical Results

IX-A Comparison of WWB, Bayesian CRB, and MSE of LMMSE Estimator

IX-B Behavior of tr(J)(\boldsymbol{J})(J) and ∣J∣|\boldsymbol{J}|∣J∣ in terms of PtotP_{tot}Ptot​ and Quantizer

IX-C FIM-max vs. Uniform Power Allocation

IX-D Behavior of FIM-max Power Allocation Across Sensors

IX-E FIM-max vs. MSE-min Power Allocation

IX-F Estimation Performance of a Randomly Deployed Network

X Conclusions

Appendix A Appendix

A-A Proof of Lemma 1

A-B Proof of Lemma 2

A-C Calculation of Ik,l1\boldsymbol{\mathcal{I}}_{k,l}^{1}Ik,l1​ in (39), and Ik,l2{\mathcal{I}}_{k,l}^{2}Ik,l2​ and Ii,j,l1,l23{\mathcal{I}}_{i,j,l_{1},l_{2}}^{3}Ii,j,l1​,l2​3​ in (41)

Lemma 1.

Lemma 2.

III-C Finding Bayesian FIM* $\boldsymbol{J}$ in (III) *

Remark 1.

IV-A Deriving $\mu(\boldsymbol{r})$ in (26) Based on Our System Model

Remark 2.

Remark 3.

VIII Classical CRB and BLUE for Estimating Deterministic Vector $\boldsymbol{\theta}$

VIII-C Behavior of the classical FIM and the MSE of BLUE in low-region and high-region of $P_{tot}$

IX-B Behavior of tr $(\boldsymbol{J})$ and $|\boldsymbol{J}|$ in terms of $P_{tot}$ and Quantizer

A-C Calculation of $\boldsymbol{\mathcal{I}}_{k,l}^{1}$ in (39), and ${\mathcal{I}}_{k,l}^{2}$ and ${\mathcal{I}}_{i,j,l_{1},l_{2}}^{3}$ in (41)