Capacity and Normalized Optimal Detection Error in Gaussian Channels
Edwin Hammerich

TL;DR
This paper establishes a precise differential relationship between channel capacity and normalized optimal detection error (NODE) in Gaussian channels, extending it to continuous-time channels and comparing it with the I-MMSE relationship.
Contribution
It introduces a new differential connection between capacity and NODE for Gaussian channels and extends this to continuous-time channels using waterfilling, comparing it with existing I-MMSE results.
Findings
Derived a differential relationship between capacity and NODE in vector Gaussian channels.
Extended the C-NODE relationship to continuous-time Gaussian channels with asymptotic accuracy.
Compared the C-NODE relationship with the I-MMSE relationship, highlighting similarities and differences.
Abstract
For vector Gaussian channels, a precise differential connection between channel capacity and a quantity termed normalized optimal detection error (NODE) is presented. Then, this C-NODE relationship is extended to continuous-time Gaussian channels drawing on a waterfilling characterization recently found for the capacity of continuous-time linear time-varying channels. In the latter case, the C-NODE relationship becomes asymptotic in nature. In either case, the C-NODE relationship is compared with the I-MMSE relationship due to Guo et al. connecting mutual information in Gaussian channels with the minimum mean-square error (MMSE) of estimation theory.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsWireless Communication Security Techniques · Distributed Sensor Networks and Detection Algorithms · Radar Systems and Signal Processing
Capacity and Normalized Optimal Detection Error in Gaussian Channels
Edwin Hammerich
Ministry of Defence, 95030 Hof, Germany
E-mail: [email protected]
Abstract
For vector Gaussian channels, a precise differential connection between channel capacity and a quantity termed normalized optimal detection error (NODE) is presented. Then, this C–NODE relationship is extended to continuous-time Gaussian channels drawing on a waterfilling characterization recently found for the capacity of continuous-time linear time-varying channels. In the latter case, the C–NODE relationship becomes asymptotic in nature. In either case, the C–NODE relationship is compared with the I–MMSE relationship due to Guo et al. connecting mutual information in Gaussian channels with the minimum mean-square error (MMSE) of estimation theory.
I Introduction
The central result of Guo et al. in [1] is an identity connecting mutual information in Gaussian channels with the MMSE of estimation theory. This I–MMSE relationship reads in the case of a vector Gaussian channel
[TABLE]
where , is a noise vector with independent standard Gaussian components, independent of the random vector , , and is a deterministic matrix of appropriate dimension; is the MMSE in estimating given
[TABLE]
In [2], for a particular, effectively finite-dimensional vector Gaussian channel, an identity analogous to (1) has been derived. There, the probability distribution of the input vector depends on such that the mutual information occurring in (1) achieves capacity of the vector Gaussian channel; however, the right-hand side (RHS) of that identity differs from (half) the MMSE as given in (1). In [3], the same vector Gaussian channel as in [2] arose from a particular continuous-time Gaussian channel through discretization by optimal detection [4] of the channel output signals with the use of matched filters, following the approach in [5] for linear time-invariant channels; after a certain normalization, the aforementioned RHS has been recognized as (half) the NODE (to be defined later) of the channel output signals. In this way, a first instance of the C–NODE relationship has been encountered.
The goal of the present paper is to extend this C–NODE relationship 1) to more general vector Gaussian channels, 2) to continuous-time Gaussian channels in the form of the linear time-varying (LTV) channels considered in [6], and 3) to compare the C–NODE relationship with the I–MMSE relationship in either case.
Notation
We use natural logarithms and so the unit nat for all information measures. is the Gaussian distribution with mean 0 and variance . is the Schwartz space of rapidly decreasing functions on ; is the set of all non-negative real-valued functions in . denotes the positive part of , . For any two functions the notation means as , where is the standard Landau little-o symbol (cf. [6]).
II Vector Gaussian Channels
II-A Detection, Capacity, and Parameter
Consider the vector Gaussian channel
[TABLE]
where is a determinstic real matrix and the noise vector has independent random components with the noise variance , is the random input vector, and the corresponding output. If and are realizations of the random vectors and , resp., then the realization of is determined by the equation
[TABLE]
has the singular value decomposition (SVD) with orthogonal matrices and and a diagonal matrix , where are the eigenvalues of (counting multiplicity). Occasionally, we shall use the invertible matrix obtained from by replacing zeros on the diagonal with some . Writing , with the column vectors , it then holds for every (column) vector that
[TABLE]
where . Since only the coefficients carry information, the linear combination would be a suitable channel input vector. At the receiver, the perturbed vector , is passed through a bank of matched filters . The matched filter output signals are , where , and the detection errors are realizations of independent identically distributed Gaussian random variables . From the detected values we get the estimates of the coefficients for the input vector , where are realizations of independent Gaussian random variables (put ). Thus, we are led to the new vector Gaussian channel
[TABLE]
where the random components of the noise vector are distributed as described. The vector Gaussian channels (3) and (5) are equivalent in the sense that for any average input energy their capacity is the same. Indeed, since mutual information is invariant with respect to invertible linear transformations, we have
[TABLE]
where is an arbitrary vector with the property that , and has independent components (as ); consequently,
[TABLE]
The capacity of the vector Gaussian channel (5) is computed by waterfilling on the noise variances [5, Th. 7.5.1]. Let be the noise variance in the st subchannel of the channel (5). Precluding the trivial case , the “water level” is then uniquely determined by the condition
[TABLE]
where is the number of active subchannels. The capacity is achieved, if the input vector has independent components for and else; then
[TABLE]
where is the signal-to-noise ratio. Since always is the smallest feasible (assumed when ).
Remark 1
Since, in the case of , only the portion contributes to the signal, is, then, rather a signal plus noise-to-noise ratio; we stick to the notation “snr” to conform with [1].
II-B C–NODE Relationship in Vector Gaussian Channels
Because of Eq. (6), for the channel (5) it holds that
[TABLE]
where , and has independent standard Gaussian components, being the noise vector in (3). Capacity is achieved, if has independent components for and else. Putting , we obtain
[TABLE]
where has independent standard Gaussian components, independent of . Capacity is achieved, if where is distributed as above. Since the RHS of Eq. (8) then only depends on , we may write (with slight abuse of notation)
[TABLE]
The RHS of Eq. (9) is reminescent of the mutual information occurring in the I–MMSE relationship (1). It is therefore tempting to take the derivative of the RHS of Eq. (10) with respect to . Before doing so, observe that depends on since (put ); on the other hand, is piecewise constant. Excluding those ’s where makes a jump, we thus obtain
[TABLE]
Due to the application of matched filters for detection, optimal detection has been performed [4]. Therefore, is the (total) optimal detection error; division by may be regarded as a normalization. Likewise, after normalization (retaining the ), would be the (total) optimal detection error. Anyway, the following definition appears appropriate:
Definition 1
For any feasible , the NODE in the vector Gaussian channel (3) is given by the function
[TABLE]
where is called the primitive NODE. If is infeasible, we put formally .
Note that for any feasible , is identical to the primitive NODE ; so, no further normalization is needed when working with (or ) in the feasible case.
Theorem 1
For all but for at most exceptions, the capacity of the vector Gaussian channel (3) is differentiable and satisfies
[TABLE]
Proof:
For growing , differentiability breaks down when a new subchannel is added. This occurs as soon as ( being the actual number of subchannels) exceeds , which happens at most times. The rest of the theorem has already been proved. ∎
We observe a striking similarity between the I–MMSE relationship (1) and the C–NODE relationship (11). Note that the part of the estimation error in (1) is taken by a detection error in (11).
II-C Comparison of the NODE With the MMSE in Vector Gaussian Channels
To understand the difference between Eq. (1) and Eq. (11) in more detail, we calculate the MMSE.111For simplicity, we assume that the diagonal matrix in the SVD of is invertible and that the number of active subchannels is equal to (both assumptions can be removed). Following [1], given
[TABLE]
the MMSE in estimating is
[TABLE]
where is the minimum mean-square estimate of , and is the inverse of the covariance matrix of which is given by
[TABLE]
where is the covariance matrix of . If has independent Gaussian components and (as assumed), then . We now compute
[TABLE]
where is the identity matrix. So, we obtain
[TABLE]
or, equivalently,
[TABLE]
Remark 2
Similarly to [1, Sec. II-D.2], it can be shown that the expression in curly brackets in Eq. (14) is the trace of a Fisher information matrix.
Now, the strict inequality (15) prompts the following observation: The increase of capacity with growing as given by Eq. (11) is always larger than anticipated by the I–MMSE relationship (1). The resolution of this seeming contradiction is the implicit assumption in [1, Th. 2] that the probability distribution of the channel input vector does not depend on . Refer to [7] concerning possible extensions of the I–MMSE relationship to the -dependent case.
III Continuous-Time Gaussian Channels
III-A Channel Model and Discretization
Consider as in [6] for any spreading factor held constant the LTV channel
[TABLE]
where is the LTV filter (operator) with the spread Weyl symbol ; the kernel of operator is assumed to be real-valued. The real-valued filter input signals are of finite energy and the noise signals at the filter output are realizations of white Gaussian noise with two-sided power spectral density (PSD) . This channel is depicted in Fig. 1. As in [6], it may be assumed that the operator has infinitely many eigenvalues (counting multiplicity) and that as . As shown in [6, Sec. III], optimal detection by means of matched filters then leads to the infinite-dimensional vector Gaussian channel
[TABLE]
where the noise is independent from subchannel to subchannel.
III-B C–NODE Relationship in Continuous-Time Gaussian Channels
From the waterfilling theorem [6, Th. 2] we know that under a quadratic growth condition imposed on the average input energy , the capacity of the LTV channel (16) is given with the use of the “cup” function
[TABLE]
by the equation
[TABLE]
where the “water level” is chosen so that
[TABLE]
Eq. (19) has been derived in [6] from the original waterfilling condition
[TABLE]
where are the noise variances and . In the present context, so that does not depend on (and the quadratic growth condition imposed on is automatically fulfilled); the number of active subchannels depends on and since
[TABLE]
again putting .
Theorem 2
For any fixed it holds that
[TABLE]
Proof:
With the use of the (modified) Heaviside function
[TABLE]
we can write
[TABLE]
For , replace with the continuous function
[TABLE]
and define . Putting
[TABLE]
we then obtain
[TABLE]
Since
[TABLE]
where and
[TABLE]
the Szegő theorem [6, Th. 1] applies and yields
[TABLE]
that is,
[TABLE]
where as . For it is readily seen that
[TABLE]
where as . Therefore, , where if becomes arbitrarily small and, then, ; a similar result is obtained for . In combination with Ineq. (21), this proves the theorem. ∎
The RHS of Eq. (18) reduces to the double integral
[TABLE]
We say that a function is non-flat, if for every constant the Lebesgue measure (or area) of the level curve is zero (no assumption is made about the area of the set ).
Lemma 1
Let be non-flat. Define for all the function
[TABLE]
Then for all it holds that
[TABLE]
Proof:
We consider only the region (see Fig. 2) in the first quadrant enclosed by the coordinate axes and the boundary line . Additionally, we assume that has the representation .
For the integral
[TABLE]
we get by differentiation that
[TABLE]
A generalization to the other quadrants and other boundary geometries should now be obvious. Addition of the separate double integrals yields Eq. (23) and completes the proof. ∎
For ease of presentation, we assume from now on that the squared absolute value of the Weyl symbol of is non-flat.
Theorem 3
For any fixed , the approximate capacity (22) satisfies
[TABLE]
where the NODE is given by
[TABLE]
Proof:
By means of Lemma 1 and Theorem 2 we get in combination with Definition 1 [generalized to the -dependent, effectively finite-dimensional vector Gaussian channel (17)] that
[TABLE]
which proves both Eq. (24) and Eq. (25). ∎
III-C Comparison of the NODE With the MMSE in Continuous-Time Gaussian Channels
If and are held constant, then is finite so that Eq. (13) for the MMSE carries over to the infinite-dimensional setting (17) without changes and yields
[TABLE]
Recalling that , the Szegő theorem [6, Th. 1] may be applied to the last expression; so we continue
[TABLE]
In order to get rid of the error term involved in the dotted equations (25) and (26), we average with respect to and obtain
[TABLE]
Thus, for all it holds the strict inequality
[TABLE]
which is similar to Ineq. (15) for and in finite-dimensional vector Gaussian channels. In the case of it holds, of course, that
[TABLE]
Example 1
Consider the operator , , with the bivariate Gaussian function
[TABLE]
* fixed, as the (spread) Weyl symbol. Here we have . Computation of the RHS of Eq. (25) yields for any held constant the equation*
[TABLE]
In virtue of the C–NODE relationship (24) we obtain from the foregoing NODE by integration the capacity
[TABLE]
which indeed coincides with the capacity directly obtained from Eq. (18) (expressed as a function of and ).
Further, computation of the RHS of Eq. (26) gives
[TABLE]
Averaging with respect to as finally yields
[TABLE]
In Fig. 3, and are plotted against for . Observe the difference in size.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] D. Guo, S. Shamai (Shitz), and S. Verdú, “Mutual information and minimum mean-square error in Gaussian channels,” IEEE Trans. Inf. Theory , vol. 51, pp. 1261–1282, 2005.
- 2[2] E. Hammerich, “On the heat channel and its capacity,” in Proc. IEEE Int. Symp. Inf. Theory , Seoul, South Korea, 2009, pp. 1809–1813.
- 3[3] E. Hammerich, “On the capacity of the heat channel, waterfilling in the time-frequency plane, and a C-NODE relationship,” 2014 [Online]. Available: ar Xiv:1101.0287
- 4[4] D. O. North, Analysis of the factors which determine signal/noise discrimination in pulsed-carrier systems . Rept. PTR-6C, RCA Labs., Princeton, NJ, USA, 1943.
- 5[5] R. G. Gallager, Information Theory and Reliable Communication . New York, NY: Wiley, 1968.
- 6[6] E. Hammerich, “Waterfilling theorems for linear time-varying channels and related nonstationary sources,” IEEE Trans. Inf. Theory , vol. 62, pp. 6904–6916, 2016.
- 7[7] R. Bustin, M. Payaro, D. P. Palomar, and S. Shamai (Shitz), “On MMSE crossing properties and implications in parallel vector Gaussian channels,” IEEE Trans. Inf. Theory , vol. 59, pp. 818–844, 2013.
- 8[8] E. Hammerich, “Capacity and normalized optimal detection error in Gaussian channels,” 2017 [Online]. Available: ar Xiv:1701.05523
