Tight Bounds on the Weighted Sum of MMSEs with Applications in Distributed Estimation
Michael Fau{\ss}, Abdelhak M. Zoubir, Alex Dytso, H. Vincent Poor, and, K. G. Nagananda

TL;DR
This paper derives tight bounds on the weighted sum of MMSEs in Gaussian channels using KL divergence constraints, providing insights for robust distributed estimation in wireless systems.
Contribution
It introduces new tight bounds on MMSE sums constrained by KL divergence, characterizes the optimal Gaussian distributions, and demonstrates robustness of estimators against distribution deviations.
Findings
Derived tight upper and lower bounds on weighted MMSE sums.
Identified Gaussian distributions that attain these bounds.
Showed estimators that attain the upper bound are minimax robust.
Abstract
In this paper, tight upper and lower bounds are derived on the weighted sum of minimum mean-squared errors for additive Gaussian noise channels. The bounds are obtained by constraining the input distribution to be close to a Gaussian reference distribution in terms of the Kullback--Leibler divergence. The distributions that attain these bounds are shown to be Gaussian whose covariance matrices are defined implicitly via systems of matrix equations. Furthermore, the estimators that attain the upper bound are shown to be minimax robust against deviations from the assumed input distribution. The lower bound provides a potentially tighter alternative to well-known inequalities such as the Cram\'{e}r--Rao lower bound. Numerical examples are provided to verify the theoretical findings of the paper. The results derived in this paper can be used to obtain performance bounds, robustness…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDistributed Sensor Networks and Detection Algorithms · Control Systems and Identification · Multi-Criteria Decision Making
Tight Bounds on the Weighted Sum of MMSEs
with Applications in Distributed Estimation
Michael Fauß, Abdelhak M. Zoubir
Signal Processing Group
*Technische Universität Darmstadt
*D-64283 Darmstadt, Germany
{mfauss, zoubir}@spg.tu-darmstadt.de
Alex Dytso, H. Vincent Poor
Dept. of Electrical Engineering
*Princeton University
*Princeton, NJ 08544, USA
{adytso, poor}@princeton.edu
Nagananda Kyatsandra
LTCI, Télécom ParisTech
*Institut Mines - Télécom
*Paris 75013, France
Abstract
In this paper, tight upper and lower bounds are derived on the weighted sum of minimum mean-squared errors for additive Gaussian noise channels. The bounds are obtained by constraining the input distribution to be close to a Gaussian reference distribution in terms of the Kullback–Leibler divergence. The distributions that attain these bounds are shown to be Gaussian whose covariance matrices are defined implicitly via systems of matrix equations. Furthermore, the estimators that attain the upper bound are shown to be minimax robust against deviations from the assumed input distribution. The lower bound provides a potentially tighter alternative to well-known inequalities such as the Cramér–Rao lower bound. Numerical examples are provided to verify the theoretical findings of the paper. The results derived in this paper can be used to obtain performance bounds, robustness guarantees, and engineering guidelines for the design of local estimators for distributed estimation problems which commonly arise in wireless communication systems and sensor networks.
Index Terms:
MMSE bounds, distributed estimation, robust estimation, convex optimization, Cramér–Rao bound.
I Introduction
The mean square error (MSE) is a natural and commonly used measure for the accuracy of an estimator. The minimum MSE (MMSE) plays a central role in statistics [1, 2], information theory [3, 4], signal processing [5, 6, 7], and has close connections to entropy and mutual information [8, 9]. In [10] and [11], lower and upper bounds on the MMSE are derived when the random variable of interest is contaminated by additive Gaussian noise and its distribution is -close to a Gaussian reference distribution in terms of the Kullback–Leibler (KL) divergence. The estimator that attains this upper bound is shown to be minimax robust in the sense that it minimizes the maximum MMSE over the set of feasible distributions. That is, within the specified KL divergence ball, it is robust to arbitrary deviations of the prior from the nominal Gaussian case. The lower bound provides a fundamental limit on the estimation accuracy and is a potentially tighter alternative to the Bayesian Cramér-Rao bound.
This paper extends the bounds in [10] and [11] to a weighted sum of MMSEs. Similar to [10] and [11], the bounds derived in this paper are obtained by constraining the input distribution to be -close to a Gaussian reference distribution in terms of the KL divergence. The estimators that attain the upper bound are minimax robust against deviations from the assumed input distribution. Finally, the proposed bounds are evaluated for generalized Gaussian distributions and the uniform distribution. It is shown that, in some cases, the lower bounds derived in this paper are tighter than the Cramér–Rao lower bound. Interestingly, the performance of the proposed bounds improves as the dimension of the input vector increases.
The weighted sum of MMSEs arises in various practical applications in signal processing and communications. For example, it has been shown that for a Gaussian prior the MMSE in the time domain can be expressed as a sum of MMSEs in the frequency domain [12]. In multiple input multiple output (MIMO) wireless communications, the MMSE is frequently expressed as a sum of MMSEs or sum of inverse MMSEs of multiple parallel channels [13, 14, 15, 16]. In the context of distributed statistical inference, weighted sums of MMSEs play an important role in parameter estimation problems, where noisy measurements from multiple randomly deployed sensors are used to estimate the parameter of interest. In such scenarios, typically the estimates from the local sensors are fused to obtain a global estimate of the parameter. However, in practice, not much is known about the analytical characterization the optimal performance of the global estimator [17]. The study of the weighted sum of MMSEs reported in this paper is a step in this direction. The weighted sum of MMSEs is not only an appropriate objective function for distributed estimation, since it provides a platform to establish a global performance measure, but it also allows to prioritize sensors by assigning them weights. Thus, a highly informative sensor will be assigned a higher weight in the linear combination of MMSEs of all the sensors. This could have important applications in energy-efficient sensor networking where only highly informative sensors transmit their local decisions, while the sensors deemed less-informative abstain from transmission, thus saving energy and time for decision making [18], [19].
The rest of the paper is organized as follows. In Section II, we provide a mathematical statement of the problem addressed in the paper. The upper and lower bounds on the weighted sum of MMSEs, which are the main results of this work, are stated in Section III. The usefulness of the proposed bounds in distributed estimation is illustrated via an example application and related details are discussed in Section IV. Concluding remarks are provided in Section V.
II Problem Formulation
Let denote the -dimensional Borel space. Consider additive-Gaussian-noise channels , , where and are independent -valued random variables. All are assumed to be zero-mean Gaussian distributed, i.e., , where denotes the Gaussian distribution with mean and covariance .
We define the individual MSEs as functions of the estimator and the input distribution , i.e.,
[TABLE]
The individual MMSEs are accordingly defined as
[TABLE]
where denotes the set of all all feasible estimators, i.e.,
[TABLE]
The following two optimization problems are investigated in this paper:
[TABLE]
where are fixed positive weights and the set of feasible distribution is defined as
[TABLE]
Note that, is a KL divergence ball centered at of radius . As we proceed, it will become clear that it is useful to choose the reference distribution to be Gaussian:
[TABLE]
Finally, to allow for a compact notation, the following matrices are introduced:
[TABLE]
where and denotes the transpose of .
III Main Result
The main result of the paper is provided in the following theorem.
Theorem 1**.**
If , with positive definite and , solve
[TABLE]
then
[TABLE]
solves problem (1). Analogously, if , with positive definite and , solve (8) and (9), then
[TABLE]
solves problem (2).
Since the input distributions that attain these bounds are Gaussian of the form P_{X}\sim\mathcal{N}\bigl{(}\mu_{0},\Sigma_{X}\bigr{)}, the individual MMSE estimators in both cases are given by
[TABLE]
where denotes the observed realization of . The corresponding MMSEs are given by
[TABLE]
The lower and upper bounds on the weighted sum of MMSEs in (1) and (2) are then given by
[TABLE]
where and are shorthand notations for in (6) evaluated at and , respectively.
Proof.
The proof of the main result follows along the same lines as the proof in [10, Sec. 4]. Consider the maximization in (1), which can be written as the minimax problem
[TABLE]
where the infimum is taken jointly over all . A sufficient condition for and to solve (17), and hence (1), is that they satisfy the saddle point conditions [20, Exercise 3.14]
[TABLE]
for all and
[TABLE]
for all . The fact that the estimators in (12) minimize the right hand side of (18) follows directly from the definition of the MMSE [21, Chapter 10.4]. In the remainder of the proof, it is shown that in (10) satisfies (19).
First, the right hand side of (19) is written as
[TABLE]
where is independent of and given by
[TABLE]
with c\coloneqq\sum_{j=1}^{J}\lambda_{j}\operatorname{tr}\bigl{(}(I-W_{j})\Sigma_{N}(I-W_{j})^{\text{T}}\bigr{)} being a constant that is independent of . Using the auxiliary result on bounds on expectations under -divergence constraints in [10, Sec. 4.1], it follows that the density of the optimal distribution is of the form
[TABLE]
where denotes the density of the reference distribution and , need to be chose appropriately. Substituting in (25) with (24) and using (4) yields
[TABLE]
where, without loss of generality, has been scaled by and
[TABLE]
Multiplying both sides of (29) by from the left and the right and rearranging the terms yields (8).
Knowing that and are Gaussian distributions with identical means, their KL divergence is given by
[TABLE]
Equating (30) with yields the optimality condition (9). This concludes the proof of optimality of .
The proof of optimality of follows analogously, the only difference being that the sign of is reversed; relevant details are provided in [10, Sec. 4.1]. ∎
IV An application involving distributed estimation
In this section, we first present an example application involving distributed estimation where the weighted sum of MMSEs is relevant. We show how a simple modification of the conventional distributed processing leads to significant improvements in the performance characterization of practical distributed estimators. We then derive bounds on the weighted sum of MMSEs for distributed estimation with arbitrary distributions in Gaussian noise. Lastly, we specialize these bounds for the generalized Gaussian and uniform distributions. Numerical evaluations, presented to verify the theoretical findings, reveal surprising features of the proposed bounds.
Consider the problem of distributed estimation using wireless sensor networks (WSNs), where sensors are randomly distributed in the region of interest (ROI). The sensor is located at a distance from a target. By “target” we are referring to some activity; for example, fire in the ROI. The target’s signal power is assumed to follow the isotropic power attenuation model [22]. The signal power at sensor is given by , , where is the target’s signal power at distance zero, is the signal decay exponent taking values between 2 and 3, and is a constant (larger implies faster power decay). In practice, the parameters and pertaining to the wireless medium are obtained by performing experiments before the WSN is deployed, though uncertainty is associated with this knowledge. For the purpose of this paper, let us consider the simple goal of estimating the distances based on the noisy observations made by the sensors. The knowledge of is typically used to infer the presence/absence of a target in the ROI (see [22]). In conventional distributed estimation, the estimates of computed by the local sensor is transmitted to a central processing unit, which aggregates , to compute a system-level estimate of the distance vector . However, to the best of our knowledge, theoretical insights into the system-level estimator’s accuracy are lacking in the literature.
Let us now consider a simple modification to the above scheme. Instead of the local estimates, if the sensors transmit their local MSEs to the central unit, then a weighted sum of MMSEs can be thoroughly analyzed using the findings of this paper. This new scheme provides a comprehensive view of the performance of system-level (or, global) estimators unlike existing distributed estimation wherein the local estimates are simply fused at the control unit without insightful performance guarantees. Our study provides engineering guidelines for the design and analysis of large-scale sensor networks which are important components in several critical infrastructures like the Smart Grid, IoT and other cyber-physical systems. Possible improvements in global system performance are demonstrated with the following two examples. We first derive upper and lower bounds on the linear combination of MMSEs for arbitrary distributions in Gaussian noise and then derive these bounds for the generalized Gaussian distribution.
IV-A Bounds on the linear combinations of MMSEs for arbitrary distributions in Gaussian noise
Consider the linear combination of MMSEs for an arbitrary distribution such that
[TABLE]
For a given , upper and lower bounds on the linear combination of MMSEs can be derived by the following steps:
Find the best Gaussian approximation of in terms of the KL divergence. In other worlds, find a Gaussian that minimizes and compute . 2. 2.
Use the value of found in Step 1 to compute the upper and lower bounds in (1) and (2), respectively.
We can evaluate the effectiveness of this procedure by comparing it to the bounds attained by individually bounding each term in the linear combination. The most popular bounds on the individual MMSEs are the following:
[TABLE]
where is the Fisher information of . The upper bound in (32) is obtained by using the best linear estimator instead of the optimal estimator. The lower bound in (33) is the Cramér–Rao lower bound. We refer to the bounds obtained by bounding individual MMSEs as local and the bounds that work directly on the linear combination as global.
IV-B Generalized Gaussian and Uniform distributions
Let us now consider distributions that are either concentrated or heavy-tailed. A classic example is the generalized Gaussian distribution, whose density is given by
[TABLE]
where is the normalization constant. The covariance matrix, Fisher information and the best Gaussian approximation for this distribution are given by
[TABLE]
To evaluate the performance of our bounds we set , , and
[TABLE]
and compare the resulting bounds in Fig. 1. Specifically, Fig. 1 comprises the following bounds:
lower bounds obtained via (2) (solid black line); 2. 2.
upper bounds obtained via (1) (dotted black line); 3. 3.
local upper bounds attained via (32) (solid gray line); 4. 4.
local lower bounds attained via (33) (dashed gray line); 5. 5.
local lower bounds attained via (2) (dashed-dotted black line). The local version of the lower bound in (2) is obtained by individually minimizing each MMSE with the same KL constraint. Hence, the resulting solution is independent of ’s; and 6. 6.
local upper bounds attained via (1) (dashed black line). The local version of the upper bound in (1) is obtained by individually maximizing each MMSE with the same KL constraint.
Another important feature of the proposed bounds is that they hold for prior distributions that do not necessarily have a well-defined Fisher information. Note that, as a consequence of Stem’s inequality, the finiteness of Fisher information implies finite . The converse statement, however, is not true, and there are distributions without well-defined Fisher information but with finete . The practical implications of this phenomenon is that, our bounds hold for a larger set of distributions. This property of the proposed bounds has been discussed in detail in [11]. An example of such a prior distribution is a uniform distribution over a -ball. We demonstrate this in Fig. 2, where the plots of our lower and upper bounds versus the radius of the -ball using parameters in (39) is shown. It is interesting to observe that the larger the radius of the -ball the better is the performance of the proposed lower and upper bounds.
V Concluding remarks
Upper and lower bounds on the weighted sum of MMSEs for additive Gaussian noise channels have been derived. It has been shown that these bounds take the coupling between the individual MMSEs into account and thereby are significantly tighter than the existing bounds. Examples have been provided to show how the presented results can be particularly useful for the design and analysis of sensor networks for distributed estimation, where the weighted sum of MMSEs can be used to monitor the performance, establish operating regions and to design robust local estimators. Further insights into the robustness of the local estimators for distributed setups require a more careful analysis and will be subject of future research.
Acknowledgment
The work of A. Dytso and H. V. Poor was supported by the U. S. National Science Foundation under Grant CCF–1513915. The work of Nagananda K. G. was supported by the European Research Council under grant agreement 715111.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] E. L. Lehmann and G. Casella, Theory of Point Estimation , Springer, New York City, New York, USA, 2 edition, 1998.
- 2[2] Y. Dodge, The Concise Encyclopedia of Statistics , chapter Criterion of Total Mean Squared Error, pp. 141–144, Springer, New York City, New York, USA, 2008.
- 3[3] D. Guo, Y. Wu, S. Shamai (Shitz), and S. Verdú, “Estimation in Gaussian Noise: Properties of the Minimum Mean-Square Error,” IEEE Transactions on Information Theory , vol. 57, no. 4, pp. 2371–2385, 2011.
- 4[4] A. Dytso, R. Bustin, D. Tuninetti, N. Devroye, H. V. Poor, and S. Shamai (Shitz), “On Communication Through a Gaussian Channel With an MMSE Disturbance Constraint,” IEEE Transactions on Information Theory , vol. 64, no. 1, pp. 513–530, 2018.
- 5[5] S. M. Kay, Fundamentals of Statistical Signal Processing: Estimation Theory , Prentice-Hall, Upper Saddle River, NJ, USA, 1993.
- 6[6] L. A. Dalton and E. R. Dougherty, “Exact Sample Conditioned MSE Performance of the Bayesian MMSE Estimator for Classification Error—Part I: Representation,” IEEE Transactions on Signal Processing , vol. 60, no. 5, pp. 2575–2587, 2012.
- 7[7] L. A. Dalton and E. R. Dougherty, “Exact Sample Conditioned MSE Performance of the Bayesian MMSE Estimator for Classification Error—Part II: Consistency and Performance Analysis,” IEEE Transactions on Signal Processing , vol. 60, no. 5, pp. 2588–2603, 2012.
- 8[8] D. Guo, S. Shamai (Shitz), and S. Verdú, “Mutual Information and Minimum Mean-Square Error in Gaussian Channels,” IEEE Transactions on Information Theory , vol. 51, no. 4, pp. 1261–1282, 2005.
