The multidimensional truncated Moment Problem: Shape and Gaussian Mixture Reconstruction from Derivatives of Moments
Philipp J. di Dio

TL;DR
This paper develops a theory for representing moment functionals using Gaussian mixtures and polytopes, establishing bounds on the number of Gaussians needed for such representations.
Contribution
It introduces derivatives of moments to analyze Gaussian mixture representations and determines minimal Gaussian counts for representing certain moment functionals.
Findings
Identifies exact Gaussian counts needed for specific moment functionals.
Establishes bounds on the number of Gaussians required for representation.
Provides theoretical limits for Gaussian mixture reconstruction from moments.
Abstract
In this paper we introduce the theory of derivatives of moments and (moment) functionals to represent moment functionals by Gaussian mixtures, characteristic functions of polytopes, and simple functions of polytopes. We study, among other measures, Gaussian mixtures, their reconstruction from moments and especially the number of Gaussians needed to represent moment functionals. We find that there are moment functionals which can be represented by a sum of Gaussians but not less. Hence, for any and we find an such that can be represented by a sum of Gaussians but not less. An upper bound is .
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDrug Transport and Resistance Mechanisms · Molecular spectroscopy and chirality · Bayesian Methods and Mixture Models
The multidimensional truncated Moment Problem: Shape and Gaussian Mixture Reconstruction from Derivatives of Moments
Philipp J. di Dio
Technische Universität Berlin, Institut für Mathematik, Straße des 17. Juni 136, D-10623 Berlin, Germany
Abstract.
In this paper we introduce the theory of derivatives of moments and (moment) functionals to represent moment functionals by Gaussian mixtures, characteristic functions of polytopes, and simple functions of polytopes. We study, among other measures, Gaussian mixtures, their reconstruction from moments and especially the number of Gaussians needed to represent moment functionals. We find that there are moment functionals which can be represented by a sum of Gaussians but not less. Hence, for any and we find an such that can be represented by a sum of Gaussians but not less. An upper bound is .
AMS Subject Classification (2010). 44A60, 14P99, 30E05, 65D32, 35R30.
Key words: truncated moment problem, Carathéodory number, measure reconstruction, Gaussian mixture, generalized eigenvalues, shape reconstruction, algebraic statistics, integral representation
1. Introduction
Reconstructing measures from moments is a key problem in statistics [Pea94, TSM85, MMR05, dD19], shape reconstruction [Bal61, MN68, MR80, LR82, MVKW95, GMV99, BGL07, GLPR12, GNPR14, GPSS18, KSS18], pattern recognition [Hu62, DBN92, Che93, SMD*+*07, APST19], financial mathemiatics [Ana06, Sto16], and many other fields, and attracts increasing attention especially with the growing usage of computer programs and algorithms to handle such problems. But despite of its growing importance and wide range of application, the theoretical knowledge on the problem of reconstructing measures from moments is very small, especially when only finitely many moments are known. For instance, only recently [dD19] the question of which truncated moment sequences are represented by Gaussian, log-normal, and more general mixtures was fully answered and the first non-trivial bounds on the required number of summands were given.
While derivatives in the context of moments have been used before, surprisingly no unified approach was introduced so far. In the present paper we present the first unified and systematic approach to reconstruct and investigate measures from moments: derivatives of moments. In Section 4 we define and investigate derivatives of moments and show that the derivative of a (moment) functional is represented by the distributional derivative of a representing measure of . From this treatment it is clear that is an object that is interesting to investigate on its own account and not only because it solves problems and appears (implicitly or explicitly) in proofs and calculations.
In Section 5 we use the concept of derivatives of moments to reprove several known results on reconstructing polytopes and special measures in a unified and efficient way. Proofs formerly presented over several pages now reduce to a few lines and their key arguments become much more apparent. We use these simplified arguments and proofs to extend these results, e.g., we extend the results from polynomial moments
[TABLE]
with to non-polynomial moments:
[TABLE]
where is a measurable (differentiable) function. This allows us to formulate results in full generality and can still be easily calculated from .
In Section 6 we return to the reconstruction and investigation of (Gaussian) mixtures. Based on derivatives of moments we fully characterize moment sequences from one (-dimensional) Gaussian distribution and we determine and from the moments. While this was known before, our simplified arguments and proofs using derivatives of moments enable us to extend this to mixtures, i.e., linear combinations of e.g. Gaussian distributions:
[TABLE]
with (), and for all . In the one-dimensional case () we give an explicit way to determine the parameters in (1). Simple formulas are gained under the restriction that are all equal: . But before we allow the possible relaxation to arbitrary we examine the number of mixtures required to represent a moment sequence , i.e., its minimal number, the (mixture) Carathéodory number . Based on very recent results on the Carathéodory number (number of Dirac delta measures, i.e., point evaluations) in [RS18, dDS18a, dDS18b] and especially [dDK19] we derive new lower bounds and asymptotic limits for the case of mixtures as well. We show that a non-zero (polynomial) function with finitely many zeros gives a moment sequence , resp. moment functional , which needs as many components in a mixtures representation as there are linearly independent point evaluation located at , see 6.22. As a consequence (6.26) we find that there are moment functionals , which can be represented by a sum of
[TABLE]
Gaussian distributions but not less. This disproves the belief that allowing arbitrary with reduces the number of components. Finally, (2) shows that for each and there is an and a moment functional which can be represented by a sum of
[TABLE]
Gaussian distributions but not less.
2. Preliminaries
Let be a (finite dimensional) real vector space of measurable functions on a measurable space . Denote by a continuous linear functional. If there is a (positive) measure on such that
[TABLE]
then is called a moment functional. If is finite dimensional, it is a truncated moment functional. By we denote a basis of the -dimensional real vector space and by
[TABLE]
the -th (or simply -th) moment of (or for a as in (3)). Given a sequence we define the Riesz functional by setting for all and extending it linearly to , i.e., the Riesz functional induces a bijection between moment sequences and moment functionals . By we denote the set of all measures on such that all are integrable and by or we denote all representing measures of the moment sequence resp. moment functional . Since the polynomials are of special importance, we denote by
[TABLE]
the monomial basis, where we have with . On we work with the partial order if for all .
Definition 2.1**.**
Let be a basis of the finite dimensional vector space of measurable functions on the measurable space . We define by
[TABLE]
Of course, is the moment sequence of the Dirac measure and the corresponding moment functional is the point evaluation with . By a measure we always mean a positive measure unless it is explicitly denoted as a signed measure.
The fundamental theorem in the theory of truncated moments is the following.
Theorem 2.2** (Richter Theorem [Ric57]).**
Let , , be finitely many measurable functions on a measurable space . Then every moment sequence resp. moment functional has a -atomic representing measure
[TABLE]
with , , and .
The theorem can also be called Richter–Rogosinski–Rosenbloom Theorem [Ric57, Rog58, Ros52], see the discussion after Example 20 in [dDS18a] for more details. That every truncated moment sequence has a -atomic representing measure ensures that the Carathéodory number is well-defined.
Definition 2.3**.**
Let be linearly independent measurable functions on a measurable space . For we define the Carathéodory number of by
[TABLE]
We define the Carathéodory number of by
[TABLE]
The same definition holds for moment functionals .
The following theorem turns out to be a convenient tool for proving lower bounds on the Carathéodory number .
Theorem 2.4** ([dDS18b, Thm. 18]).**
Let be measurable functions on a measurable space , , and with on , and . Then
[TABLE]
Remark 2.5**.**
Note that in 2.4 it is crucial that the zero set of is finite: Take and for a simple example where the statement fails when the zero set is not finite.
It is well-known that in general not every sequence or linear functional has a positive representing measure. But of course it always has a signed -atomic representing measure with .
Lemma 2.6** ([dDS18a, Prop. 12]).**
Let be a basis of the finite dimensional space of measurable functions on a measurable space . There exist points such that every vector has a signed -atomic representing measure with and all atoms are from , i.e., every functional is the linear combination , .
It is well-known that in dimension the atom positions of a moment sequence can be calculated from the generalized eigenvalue problem, see e.g. [GMV99]. To formulate this and other results we introduce the following shift.
Definition 2.7**.**
Let and . For with we define by , i.e., .
For a space of measurable functions with basis the Hankel matrix of a linear functional is given by . The atom positions of a truncated moment sequence (resp. moment functional ) are then determined from results in Section 3.
We use the following notation.
Definition 2.8**.**
Let be multi-indexed sequences and . We define the matrix
[TABLE]
For (Gaussian) mixtures we use the following general setting as in [dD19]:
Definition 2.9**.**
Let be some fixed set of parameters (in a larger metric space). For all and we let denote probability measures on the measurable space such that:
- i)
All are -measurable for all , i.e.,
[TABLE] 2. ii)
There exists a (unique) (closure of ) such that
[TABLE]
for all and .
For and , and a (-)mixture is then
[TABLE]
where is its -th component. We have unless we explicitly speak of signed mixtures ().
Examples of this general setting are Gaussian and log-normal measures, see [dD19]. There we already treated the Carathéodory number of mixtures and answered which moment sequences can be represented by mixtures.
Definition 2.10**.**
If has a mixture representation, then we define its (mixture) Carathéodory number by
[TABLE]
We call the mixture cone, i.e., the set of all moment sequences which have a (finite) mixture representation. The (mixture) Carathéodory number is then defined by
[TABLE]
Of course, since we always have , is well-defined. In [dD19] we gave upper bounds on .
Theorem 2.11** ([dD19, Thm. 17(ii)]).**
Let be a finite-dimensional space of continuous functions and probability measures as in 2.9. Then
[TABLE]
More on the (truncated) moment problem can be found e.g. in [Sti94, ST43, Akh65, KN77, Kem68, Kem87, Lan80, Mar08, Lau09, FN10, Las15, Sch17] and references therein.
3. Reconstruction of atomic Measures
For one-dimensional moment sequences the atom positions of an atomic representing measure can be determined by the following to results.
Lemma 3.1**.**
Let , , and with
[TABLE]
for some , , and . Then the are unique and are the eigenvalues of the generalized eigenvalue problem
[TABLE]
Proof.
That the are the eigenvalues of (4) and therefore uniqueness follows from
[TABLE]
and
[TABLE]
Lemma 3.2**.**
Let be a sequence with and is singular (with kernel dimension one). Let with . Then has a -atomic representing measure with
[TABLE]
Proof.
, i.e., . Equality holds since we work in the one-dimensional framework: . ∎
Compare the preceding results with Vieta’s Formulas (6.6).
4. Derivatives of Moments and Measures
The following simple and well-known example from the theory of distributions is our motivation in this section. As in the theory of distributions we denote by the set of all test functions and by the set of all distributions (continuous linear functionals on ). Most of our applications and examples will work on , .
Example 4.1**.**
Let on be given by , where is the characteristic function of the set , , and is the Lebesgue measure on . For we have
[TABLE]
where we understand in the distributional sense [Gru09] and as defined above.
Derivatives of Moments
Distribution theory motivates the following definition.
Definition 4.2**.**
Let be a (finite dimensional) vector space of measurable functions, be a linear functional, and . If for some we define the -th derivative of by
[TABLE]
Let , , be a basis of . If , then we define the -th derivative of the sequence by
[TABLE]
or equivalently is defined by
[TABLE]
for all with finite or infinite dimensional.
Since we can calculate directly from .
Lemma 4.3**.**
If , , then .
Proof.
. ∎
This provides us with explicit ways to calculate as the next examples show.
Example 4.4**.**
- a)
Let on , , and . We have
[TABLE]
see also (12) in 4.17 for . 2. b)
Let on with and . Then . 3. c)
Let on (or ) for a and . Then
[TABLE]
Note that and in 2.7 “almost” commute.
Lemma 4.5**.**
For , we have
[TABLE]
Remark 4.6**.**
When resp. is a moment sequence/functional, then resp. (or resp. ) is in general not a moment sequence. Let be the moment sequence of with , then , i.e., but .
Lemma 4.7**.**
Let be a vector space of measurable functions on the measurable space , , , , , and a linear functional. The following are equivalent:
- i)
* is a moment functional.* 2. ii)
.
Proof.
While (ii) (i) is clear, for (i) (ii) let be a representing measure of . Then
[TABLE]
i.e., and therefore . ∎
Remark 4.8**.**
Let be a vector space of measurable functions on the measurable space , , such that . The following are equivalent:
- i)
For every linear functional there exists a with . 2. ii)
is injective.
Indeed, is injective, if and only if the induced endomorphism
[TABLE]
of the dual space is surjective. In 4.4 (a) () is not injective, in (c) is injective, and in (b) is injective if and only if for all .
Derivatives of Measures
In 4.1 we have seen that for the specific measure with the derivative is , of course in the distributional sense:
[TABLE]
Here we make use of the notation for from the theory of distributions that comes in very handy. Note that we can even choose since is compact and therefore compactness of can be omitted. For the rest of this section we want to define for measures , especially , if it exists.
Definition 4.9**.**
Let be a (finite dimensional) vector space of measurable functions, a (signed) measure and . Assume that and there exists a such that
[TABLE]
If is a (signed) measure such that all are -integrable, then we say the -th derivative of exists on and is defined by
[TABLE]
The following statement, which connects 4.2 with 4.9, is the crucial observation of this section. It enables us to apply results from the theory of distributions to derivatives of moment functionals.
Theorem 4.10**.**
Let be a (finite or infinite dimensional) vector space of measurable functions on the measurable space , be a moment functional with representing measure , and such that . If exists on , then is a (signed) representing measure of , i.e.,
[TABLE]
Proof.
Since exists for all we have
[TABLE]
Remark 4.11**.**
4.10 says that we can compute the derivative of a moment functional on by taking the derivative of a representing measure (if its derivative exists on ) and vice versa. In particular, the result does not depend on the choice of the representing measure.
Example 4.12**.**
Let , , , and , then is given by
[TABLE]
Hence, is an example of a measure whose derivative is no longer a measure.
Besides the Dirac measures also measures of the form are very important, where is the -dimensional Lebesgue measure and is a measurable function.
Definition 4.13** ([Gru09, Eq. (3.2)]).**
Let and the -dimensional Lebesgue measure on . We define the distribution by
[TABLE]
Theorem 4.14** ([Gru09, Eqs. (3.15) and (3.21)]).**
Let . Then
[TABLE]
If exists on , then by 4.10 we have
[TABLE]
The following example will be most important in the reconstruction of polytopes and simple functions from their moments, see Section 5.
Example 4.15**.**
Let be a continuous and piece-wise linear function with compact support. Let be the points where is not differentiable. Then and where for and are the slopes of . In particular, is a signed -atomic measure.
Example 4.16**.**
Let be points, and . We define the -dimensional hyperrectangle of by
[TABLE]
The vertices of are for all . Since is compact all moments
[TABLE]
for exist. Here we abbreviated the characteristic function of as . Set . From the Definitions 4.2 and 4.9 as well as 4.14 we find that
[TABLE]
has the signed representing measure
[TABLE]
supported only at the vertices of where .
Gaussian distributions will be considered in Section 6.
Example 4.17**.**
For with all -th moments
[TABLE]
exist where , , and . For we find from the Definitions 4.2 and 4.9 as well as 4.14 that
[TABLE]
has a signed representing measure given by
[TABLE]
for suitable polynomials . For we have where is the -th Hermite polynomial:
[TABLE]
5. Applications
Polytope Reconstruction
The problem of reconstructing a (convex and full-dimensional) polytope , i.e., finding all vertices, is an extensively studied question and several algorithms have been proposed, see e.g. [Bal61, MN68, MR80, LR82, MVKW95, GMV99, BGL07, GLPR12, GNPR14, GPSS18, KSS18], and references therein.
Based on derivatives of moments we will present a simple proof of one version of these algorithms which calculates the vertices from finitely many moments
[TABLE]
We use the Brion–Lawrence–Khovanskii–Pukhlikov–Barvinok (BBaKLP) formulas [Bri88, Law91, Bar91, PK92, Bar92] and the generalized eigenvalue problem (as in 3.1). The aim is to convince the reader that derivatives of moments are a convenient tool for proving and extending the statement in a concise and conceptual way.
Let us state the BBaKLP formulas. This presentation is taken from [GLPR12]. Let be a polytope in with vertices (), then
[TABLE]
see [GLPR12, Eq. (3)], and for we have
[TABLE]
see [GLPR12, Eq. (4)], where is a rational function on , i.e., can be chosen in general position such that has no zero or pole at . The is the -th directional moment with direction .
Definition 5.1**.**
Let , be a polytope with vertices , a vector (of length 1), , and be an affine hyperplane with normal vector . We define the area function to be the -dimensional volume of
[TABLE]
where is the -dimensional Lebesgue measure on .
Of course, the area function is integration by parts
[TABLE]
The area function is a continuous piecewise polynomial function of degree if is not a normal vector of any facet of . 4.15 motivates the following lemma which is the only step where we need the BBaKLP formulas.
Lemma 5.2**.**
Let be a vector of unit length such that is non-zero and well-defined, i.e., its numerator and denominator is non-zero. Then
[TABLE]
Proof.
Set . From (13) for we have
[TABLE]
and from (14) with we have
[TABLE]
Here and hold since is compact. Thus the claim follows since the set of polynomial functions on a compact set is dense in . ∎
In the previous proof the BBaKLP formulas were used for all monomials () and the Weiserstraß Theorem gives the assertion. But the proof of the lemma can be weakened to the Müntz–Szász Theorem [Mün14, Szá16], i.e., only monomials with (and ) are necessary. Additionally, the BBaKLP formulas hold only for polynomials but the previous lemma applies to all -functions. So we have the following.
Theorem 5.3**.**
Let be a (finite-dimensional) vector space of measurable functions on with basis such that , i.e., for all . Let be a polytope with vertices , , be such that it is neither a pole nor a zero of any , and consider the directional moments
[TABLE]
Then has an at most -atomic signed representing measure
[TABLE]
supported only at the projections of the vertices .
Proof.
Since has the representing measure , the has the at most -atomic representing (signed) measure by 4.10 and 5.2. ∎
What remains is to extract the positions from . If consists of polynomials, the generalized eigenvalue problem in 3.1 can be applied. From this we easily get the following corollary, cf. e.g. [GLPR12, Main Theorem]. Note that we propose to replace Prony’s Method/Vandermonde factorization of finite Hankel matrices by the (numerically more stable) generalized eigenvalue problem (as in 3.1), see [GMV99, p. 1225]. For simplicity we assume uniform distribution on . Polynomial distributions on semi-algebraic sets are treated below.
Corollary 5.4**.**
Let be a polytope with vertices , and let be such that it is neither a pole nor a zero of any , and for let be the directional moments
[TABLE]
Then the projections are the eigenvalues of the generalized eigenvalue problem
[TABLE]
Proof.
As in 5.3 has the representing measure and has the at most -atomic representing (signed) measure by 4.10 and 5.2. By 3.1 the positions are the eigenvalues of the generalized eigenvalue problem (16). ∎
Remark 5.5**.**
Besides the simple proof, the method of derivatives of moments has another advantage. Since 5.2 holds in the distributional sense, 5.3 holds for more general functions , especially non-polynomial directional moments like in 4.4(b) or (c). However, the generalized eigenvalue problem must then be replaced by a suitable method to determine the atoms from .
Remark 5.6**.**
In [GLPR12, Eq. (5)] a “scaled vector of moments” is defined in a similar way as in 4.4(a). However, the strength of 4.10, in particular in combination with 4.14, has not been used.
Remark 5.7**.**
With different directions the vertices can be reconstructed using the previous theorem and moments are required. If is unknown, the previous theorem also determines if sufficiently many directional moments are given.
Now we extend 5.1 to functions :
[TABLE]
i.e., integration by part over .
By linearity of integration and differentiation 5.4 also detects the vertices , , of full-dimensional polytopes , , from the moments
[TABLE]
of the simple function
[TABLE]
if the or are in general position. We say that a set of polytopes is in general position iff for all . Furthermore, we say that are in general position iff
[TABLE]
has non-zero mass for in general position, i.e., coefficients in (20) do not cancel out for vertices with the same projection .
Theorem 5.8**.**
Let , , be full-dimensional polytopes with vertices , . Let the vertices or be in general position. Let . Then for a direction in general position the projections are the eigenvalues of the generalized eigenvalue problem
[TABLE]
where are the directional moments (18) of (19).
Proof.
By linearity of and 5.2 we have that
[TABLE]
is a (signed) representing measure of (4.10). Then for all since the or are in general position. Hence the projections are the eigenvalues of (21) by 3.1. ∎
Reconstruction of Simple Functions from Moments
We want to adapt 5.8 to simple functions
[TABLE]
of hyperrectangles , see 4.16. Similar to polytopes we say that the hyperrectangles are in general position if no two facets of the ’s lie in a common hyperplane. The ’s are called in general position if is an at most -atomic signed measure supported exactly at (, ) and is an at most -atomic signed measure supported exactly at all (, , ). We have the following.
Theorem 5.9**.**
Let and
[TABLE]
the simple function of hyperrectangles with or in general position. Consider the moments
[TABLE]
Then for each we have
[TABLE]
i.e., the vertices of the hyperrectangles are contained in the grid
[TABLE]
where the are the eigenvalues of the generalized eigenvalue problem
[TABLE]
Proof.
are the moments of the area function which has by assumption jumps exactly at the ’s, , . Hence is represented by a signed atomic measure supported exactly at the ’s by 4.10 and the positions are gained from the generalized eigenvalue problem (3.1) ∎
Remark 5.10**.**
For the grid (23) we can then chose an in general position such that between grid points and their projection is a bijection. Since the ’s or ’s are in general position we can extract these projections from 5.8 and uniquely recover the vertices of all ’s. The ’s can then easily (successively) be calculated from evaluation polynomials and .
Compared to 5.4 and 5.8 we no longer have the disadvantage that we need to chose random directions . We can choose the directions and only in 5.10 needs to be in general direction but can be chosen based on the grid (23) from the ’s. We need to solve generalized eigenvalue problems (24) of size at most . The choice of is essential so that we cover vertices of the same by at once and hence get small generalized eigenvalue problems. Only when we cut the vertices of out of the grid (23) we need to go to much higher degrees and have to solve one much larger generalized eigenvalue problem based on 5.8. But better options for cutting the vertices out of (23) might be possible.
Reconstruction of Measures on Semi-Algebraic Sets
So far we avoided to deal with non-constant densities on bounded sets. Inspired by the work of F. Bréhard, M. Joldes, and J.-B. Lasserre [BJL19] we want to demonstrate how our approach can be applied in this case. This and the previous works [LPHT08], [HK14], [MWHL18] from (optimal) control applications of the moment-SOS-hierarchy were pointed out to us by the authors of [BJL19].
Let be a semi-algebraic set and such that . For and we have the Leibniz formula [Gru09, Lem. 3.7] and if then acts on test functions as (weighted) -dimensional Lebesgue measure supported on , i.e., for all [Gru09, p. 33]. For we set , where the are the shifts from 2.7. Remember the matrix notation from 2.8.
Theorem 5.11** ([BJL19, Thm. 1]).**
Let be a semi-algebraic set, let with and , with , and the moments of ,
[TABLE]
for all with for some . The following are equivalent:
- i)
. 2. ii)
For each let with denote an enumeration of with and . The kernel of
[TABLE]
is spanned by for every .
* is determined by normalization. If on then is sufficient.*
Proof.
Note that is represented by , is presented by and since for all we finally have that is represented by .
(ii) (i): So by the previous note , i.e., , is in the kernel and all ’s with are determined.
(i) (ii): Again , i.e., , is in the kernel of (25). It is sufficient to show that is full dimensional to show that the kernel is one-dimensional. Assume the columns of are linearly dependent, then by the linearity of the shift also the columns of
[TABLE]
are linearly dependent. But () is the Hankel matrix of , a moment sequence with representing measure , i.e., has full rank. This proves that the kernel of (25) is one-dimensional.
If on , squaring in “(ii) (i)” is not necessary and linear independence already holds for . ∎
The bound , resp. , comes from the maximal , i.e., , needed to construct (25). If is unknown, then the previous theorem also recovers if is large enough. For the kernel of (25) is one-dimensional, i.e., determines as . For (resp. ) (25) is full rank.
In [BJL19] also the problem of finding from for an unknown is addressed, but then all moments are necessary.
6. Gaussian Mixtures
One component
For a Gaussian distribution on we have
[TABLE]
So integration over gives
[TABLE]
see also [AFS16, Eq. (5)]. This implies the following result.
Lemma 6.1** ([AFS16, Prop. 1]).**
Let , , be a natural number and be a real sequence with . The following are equivalent:
- i)
* is the moment sequence of the Gaussian distribution with , , , i.e., .* 2. ii)
There are with such that the matrix
[TABLE]
has rank two with kernel .
In this case, one has , and .
Proof.
While (i) (ii) is clear, we show (ii) (i) by induction on . Since for and , we have by (ii), (25), (26) and the induction hypothesis that
[TABLE]
i.e., is the -th moment of . ∎
On we have the following.
Theorem 6.2**.**
Let , be a symmetric and positive definite matrix, , , , and with . Set
[TABLE]
For a multi-indexed real sequence the following are equivalent:
- i)
* is the moment sequence of , i.e., for all with .* 2. ii)
For the matrix has the -dimensional kernel
[TABLE]
Proof.
For we have
[TABLE]
(i) (ii): From () we find that (27) is contained in the kernel of the matrix . It suffices to show that the kernel of the matrix is at most one-dimensional. Consider
[TABLE]
the Hankel matrix of . Let . Then implies , i.e., has full rank . Therefore has rank at least since it has as submatrix. Its kernel can thus be at most one-dimensional.
(ii) (i): Let be an orthogonal matrix such that , . The coordinate change on given by induces a linear transformation on the space of moment sequences. Let be the moment sequence obtained from via this transformation. A straight-forward calculation shows that
[TABLE]
where . This means that we are in the 1-dimensional setting
[TABLE]
where the -dimensional assertion holds by 6.1. Hence, is represented by . The inverse transformation together with gives the -dimensional assertion. ∎
Hence, the previous theorem provides an easy way to determine and from the moments .
Algorithm 6.3**.**
- Input:
, ; . 2. Step 1:
For :
- a)
Calculate and from
[TABLE]
If the kernel is not one-dimensional, then is not represented by one Gaussian distribution. 2. b)
Check: ?
If FALSE: is not represented by one Gaussian distribution. 3. Step 2:
Check: is symmetric and positive definite?
If FALSE: is not represented by one Gaussian distribution. 4. Step 3:
Calculate and . 5. Out:
“* is represented by a Gaussian distribution”: TRUE or FALSE. If TRUE: , , .*
With
[TABLE]
we get a result similar to 5.11 but with integration over instead of a semi-algebraic set .
Theorem 6.4**.**
Let with and be a real sequence with . The following are equivalent:
- i)
* is the moment sequence of the distribution with , , .* 2. ii)
There are with such that the matrix
[TABLE]
has a one-dimensional kernel spanned by
[TABLE]
and is the -th moment of for .
In this case .
Proof.
Similar to the proof of 6.1 using (27) instead of (26) in the induction. The formula for follows from , . ∎
Multiple components in dimension one with same variance.
While we fully characterized all moment sequences represented by one Gaussian distribution and showed how to determine the parameters, let us investigate mixtures with more than one component. In this study the elementary symmetric polynomials play a crucial role.
Definition 6.5**.**
For with we denote by
[TABLE]
the elementary symmetric polynomials.
The elementary symmetric polynomials have the following property.
Lemma 6.6** (Vieta’s Formulas).**
Let and be pairwise different points. For the following are equivalent:
- i)
[TABLE] 2. ii)
* for all .* 3. iii)
* with*
[TABLE]
Proof.
Follows directly from
[TABLE]
Since we assume all Gaussian distributions to have the same variance, we introduce the following convenient operator.
Definition 6.7**.**
Let be a linear functional () with and a differentiable function . For we define
[TABLE]
Note, that we use as an operator acting on functionals and on functions to emphasize the close connection between the operations performed on and measures provided by 4.10.
has the following properties (Lemmas 6.8–6.11).
Lemma 6.8**.**
Let be a linear functional with and . Then
[TABLE]
Proof.
: .
: With for all it follows that . ∎
Lemma 6.9**.**
Let , , and
[TABLE]
for some pairwise different and . Then
[TABLE]
for every .
Proof.
Follows by induction on . is clear. We have to show :
[TABLE]
Lemma 6.10**.**
Let , , , and . Define
[TABLE]
i.e., is the moment vector of the moments of . Then there is an invertable matrix such that
[TABLE]
and it follows that
[TABLE]
Proof.
Since
[TABLE]
we find that the -th entry in is a polynomial of degree in . The coordinate change to is . The second statement follows immediately from
[TABLE]
i.e., and . ∎
Lemma 6.11**.**
Let and be the Gaussian mixture
[TABLE]
for pairwise different and . Let
[TABLE]
be the moments of up to degree . The following matrix has full rank:
[TABLE]
Proof.
Take from 6.10 and set . Then
[TABLE]
is full rank as in the one-dimensional case. ∎
With these properties of we can characterize moments sequences which are represented by (30) and determine the parameters if is known.
Theorem 6.12**.**
Let with , . The following are equivalent:
- i)
For , , and pairwise different we have that has the representing measure with
[TABLE] 2. ii)
For and pairwise different we have that
[TABLE]
If additionally , then both are equivalent to the following:
- iii)
For and pairwise different we have that
[TABLE] 2. iv)
For and pairwise different we have that
[TABLE]
and for
[TABLE]
If one of the equivalent statements (i)–(iv) and hold, then .
Proof.
Using from 6.10 transforms each statement (i)–(iv) into the corresponding one-dimensional statements for Dirac measures (i’)–(iv’). Then the equivalence of all statements (i)–(iv) follows from the equivalence of (i’)–(iv’). ∎
Remark 6.13**.**
From the proof it is evident that by a coordinate change induced by from 6.10 the one-dimensional case of Gaussian mixtures with the same known variance is the same as the one-dimensional case of Dirac measures. This can also be seen from .
So the highly non-linear problem of finding and from the moments reduces to the linear problem of calculating the kernel of (31) and the well-studied problem of finding all roots of a univariate polynomial (32). The coefficients can then be determined by linear algebra.
But 6.12 only applies if we know beforehand. We therefore have to determine from as well. Set , i.e., , and observe
[TABLE]
holds for some and all . Applying this to (30), i.e., , shows that the linear dependence
[TABLE]
from 6.6, resp. 6.12, implies the linear dependence of and ,
[TABLE]
for some , and therefore also the moments and ,
[TABLE]
Let us have a look at a small example.
Example 6.14**.**
For in (30) we have
[TABLE]
i.e., the matrix
[TABLE]
contains the following vector in its kernel:
[TABLE]
For sufficiently large the kernel is one-dimensional. Hence,
[TABLE]
and by Vieta’s formulas (6.6) we have that and are the zeros of
[TABLE]
The previous example provides one way to find . It determines uniquely (and the simultaneously) but with the cost that more moments are required than in 6.12. In 6.12 we need moments, while for the generalized method of the previous example the matrix must be of size with . Hence, moments of degree at least are required since the last line contains .
However, with the following approach we also get from 6.12.
Definition 6.15**.**
Let and . We define
[TABLE]
Example 6.16**.**
- a)
For , i.e., , we have
[TABLE] 2. b)
For , i.e., , we have
[TABLE]
Lemma 6.17**.**
Let and . The following holds:
- i)
. 2. ii)
If is represented by with
[TABLE]
for some , , and pairwise different. Then
[TABLE]
Proof.
This follows immediately from 6.15 and 6.12. ∎
The previous lemma combined with 6.12 provides the following algorithm to determine a Gaussian mixture representation of with equal variance for each Gaussian component.
Algorithm 6.18**.**
- Input:
* and with .* 2. Step 1:
- a)
Calculate . 2. b)
Calculate .
If is empty, has no -Gaussian mixtures with equal variance. 3. Step 2:
For :
- a)
Calculate from (31):
[TABLE]
If () does not hold: is not a variance for . Goto . 2. b)
Calculate zeros of (32):
[TABLE]
If has complex solutions: is not a variance for . Goto . 4. Step 3:
Calculate from the ’s in (29):
[TABLE] 5. Out:
, , and .
This algorithm can of course be modified to determine as well. Add an outer loop testing 6.18 for .
Multiple components in dimension one.
Now we want to investigate the one-dimensional case with arbitrary (e.g., pairwise different). For we have the problem already considered by Pearson [Pea94].
Example 6.19**.**
Let , with and . For
[TABLE]
we have that are linearly independent. But adding (without ) makes the system linearly dependent:
[TABLE]
We have a one-dimensional solution set spanned by
[TABLE]
So and are the zeros of
[TABLE]
by the Vieta’s Formulas (6.6).
One might to be seduced by this example and the opinion that by replacing the restriction by arbitrary that less Gaussian distribution are required. But 6.22 shows that there are moment sequences with very large mixture Carathéodory numbers.
Multi-dimensional Gaussian mixtures.
So far we only dealt with the one-dimensional case of Gaussian mixture reconstruction from moments. And this was even done with the restriction . In [dDK19] we proved new lower bounds for the Carathéodory numbers for Dirac measures which grow asymptotically close to the Richter upper bound. Now we show that for Gaussian mixtures the same lower bounds hold even when arbitrary variances are allowed.
Before we can state our last main theorem, we need the following definition.
Definition 6.20**.**
Let be a finite-dimensional vector space of measurable functions on a measurable space and probability measures as in 2.9. A function is called non-negative of highest order (with respect to the measures ) if and for any sequence with
[TABLE]
there exists a subsequence with one of the following properties:
- i)
and , or 2. ii)
for all .
Note, being of highest order depends in general on the measures . The following are examples for non-negative polynomials of highest order.
Example 6.21**.**
Let , , and . Let be non-negative, with finitely many zeros and without zeros at infinity (its homogenization has no zeros with ). Then is non-negative of highest order with respect to Gaussian or log-normal measures. In particular
[TABLE]
is non-negative of highest order.
Recall from [dDS18b] that with open and a finite-dimensional space of differentiable functions on , then is the smallest such that has full rank for some where
[TABLE]
with and .
Theorem 6.22**.**
Let be a measurable space, be a finite-dimensional space of measurable functions on with an such that on , and probability measures on as in 2.9. Let be non-negative of highest order with finitely many zeros . Then there exists a moment sequence with
[TABLE]
If additionally is open, , and is -differentiable with , then has an open neighborhood such that (33) holds for all .
Proof.
Let and . Then and by [dDS18b, Thm. 18] (2.4) we have .
Let be such that as . By [dD19, Thm. 17(ii)] (2.11) any , , has a mixture representation
[TABLE]
with (i.e., are minimal), , , and . Since we have and after choosing a subsequence of we can assume that for all .
Let us show that holds. Since is non-negative of highest order, we can assume that the fulfill (i) or (ii) in 6.20 by taking a subsequence . By reordering the ’s in () we can assume that (i) holds for all and (ii) for all . Since and we can assume that for all . But (ii) implies
[TABLE]
and therefore we have
[TABLE]
i.e., . Hence implies that all fulfill .
If are -differentiable functions, then the sequence can be chosen to contain only regular moment sequences by Sard’s Theorem [Sar42] (see [dDS18b]). Hence, for each there is an open neighborhood of such that all fulfill . ∎
So from the proof it is evident that the constructed with (33) is close to the boundary of the moment cone, more precisely close to the boundary face represented by . [dD19], [dDK19], 6.22, and 6.21 explicitly provide the following.
Corollary 6.23**.**
Let and . For the one-dimensional Gaussian (and log-normal) measures with we have
[TABLE]
Proof.
6.21 and 6.22 gives the lower bound and [dD19, Cor. 36] the upper bound. ∎
6.23 (and 6.26) explains why we only discussed the reconstruction of one-dimensional Gaussian mixtures with equal variances at the beginning of this section. There are moment sequences where it is sufficient to represent them by mixtures of with and relaxation of this restriction does not improve the required number of components. Especially in higher dimensions we will see that even in the case of with Gaussian measures the number of components becomes very large, close to , see 6.27.
Example 6.24** ().**
Let and , . By a rotation of we can assume that for homogeneous polynomials in with finitely many zeros no zero is at infinity ().
- a)
: The Motzkin polynomial [Mot67] has projective zeros, is non-negative of highest order, and the point evaluations at these zeros are linearly independent, see [dDS18b, Exm. 31]. So , i.e., there is a moment sequence/functional on which can be represented by a sum of Gaussians but not less. The upper bound for the Dirac measures in the projective case is also [Rez92]. 2. b)
: The Robinson polynomial [Rob69] has projective zeros, is non-negative of highest order and all point evaluations at these zeros are also linearly independent, see [dDS18b, p. 1635]. So . Note, that for Dirac measures we have the Carathéodory number in the projective case, see [Kun14]. 3. c)
: The Harris polynomial [Har99] has zeros, is non-negative of highest order, and the point evaluations at these zeros are linearly independent, see [dDS18b, Exm. 63]. Hence, . An upper bound for Dirac measures in the projective setting is , see [dDS18b, Exm. 63]. 4. d)
: In [RS18, Lem. 8.6] it was shown that the point evaluations on the grid
[TABLE]
are linearly independent on . Hence . Additionally, it was shown that holds. With , 6.22, and [dD19, Thm. 35] we have
[TABLE]
In [dDK19] the point evaluation on the grid was extended to higher dimensions and improved lower bounds where found. In fact, the following result was shown.
Proposition 6.25** ([dDK19, Prop. 5.3]).**
Let , , , and . Then
[TABLE]
supported on the grid with the representing measure has the Carathéodory number
[TABLE]
Since the grid is the zero set of a non-negative polynomial of highest order (6.21), 6.22 implies the following.
Corollary 6.26**.**
Let , , and . Then there is a moment sequence and an open neighborhood of such that
[TABLE]
for all , i.e., every is a linear combination of (33) many Gaussian distributions but not less.
Hence, like in [dDK19, Thm. 5.6] we have
[TABLE]
We end with the following asymptotic result which, as in the case of atomic measures [dDK19, Cor. 5.8], demonstrates that also the truncated moment problem with Gaussian mixtures is cursed by high dimensions. Note, an upper bound for the number of components is , see [dD19, Thm. 32].
Corollary 6.27**.**
Let and . Then there is an such that there is a moment functional which can be written as a sum of
[TABLE]
Gaussian distributions but not less.
Acknowledgment
We thank Mario Kummer for the productive discussions and advise on the paper. We thank Mioara Joldes, Florent Bréhard, and Jean-Bernard Lasserre to provide the references [LPHT08, HK14, MWHL18, BJL19]. We want to thank Bernard Mourrain for the fruitful discussion at the Arctic Applied Algebra conference organized by Philippe Moustrou, Verena Reichle, Cordian Riener, and Hugues Verdure in Tromsø, April 2019.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[AFS 16] C. Améndola, J.-C. Faugère, and B. Sturmfels, Moment varieties of gaussian mixtures , J. Alg. Stat. 7 (2016), 14–28.
- 2[Akh 65] N. I. Akhiezer, The classical moment problem and some related questions in analysis , Oliver & Boyd, Edinburgh, 1965.
- 3[Ana 06] G. A. Anastassiou, Applications of geometric moment theory related to optimal portfolio management , Comput. Math. Appl. 51 (2006), 1405–1430.
- 4[APST 19] H. Ammari, M. Putinar, A. Streenkamp, and F. Triki, Identification of an algebraic domain in two dimensions from a finite number of its generalized polarization tensors , Math. Ann. (2018/19), in press, https://doi.org/10.1007/s 00208-018-1780-y.
- 5[Bal 61] M. L. Balinski, An algorithm for finding all vertices of convex polyhedral sets , J. Soc. Indust. Appl. Math. 9 (1961), 72–88.
- 6[Bar 91] A. I. Barvinok, Calculation of exponential integrals , Zap. Nauč. Semin. POMI 192 (1991), 175–176.
- 7[Bar 92] by same author, Exponential integrals and sums over convex polyhedra , Funkc. Anal. Prilozh. 26 (1992), 64–66.
- 8[BGL 07] B. Beckermann, G. H. Golub, and G. Labahn, On the numerical condition of a generalized Hankel eigenvalue problem , Numer. Math. 106 (2007), 41–68.
