Conditional Expectation, Entropy, and Transport for Convex Gibbs Laws in Free Probability
David Jekel

TL;DR
This paper establishes a connection between free Gibbs laws and semicircular families using conditional expectations, entropy, and transport, providing new isomorphisms and inequalities in free probability.
Contribution
It introduces a novel approach to construct measure transport and isomorphisms between free Gibbs laws and semicircular families via matrix models.
Findings
Conditional expectations and entropy converge from matrix models to free Gibbs laws.
Constructed measure transport maps induce isomorphisms between free probability algebras.
Proved Talagrand inequality for free Gibbs laws relative to semicircular laws.
Abstract
Let be self-adjoint non-commutative random variables distributed according to the free Gibbs law given by a sufficiently regular convex and semi-concave potential , and let be a free semicircular family. We show that conditional expectations and conditional non-microstates free entropy given , \dots, arise as the large limit of the corresponding conditional expectations and entropy for the random matrix models associated to . Then by studying conditional transport of measure for the matrix models, we construct an isomorphism which maps to for each , and which also witnesses the Talagrand inequality for the law of relative to the law of .
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Conditional Expectation, Entropy, and Transport for Convex Gibbs Laws in Free Probability
David Jekel
Department of Mathematics, UCLA, Los Angeles, CA 90095
[email protected] www.math.ucla.edu/$\sim$davidjekel/
Abstract.
Let be self-adjoint non-commutative random variables distributed according to the free Gibbs law given by a sufficiently regular convex and semi-concave potential , and let be a free semicircular family. We show that conditional expectations and conditional non-microstates free entropy given , …, arise as the large limit of the corresponding conditional expectations and entropy for the random matrix models associated to . Then by studying conditional transport of measure for the matrix models, we construct an isomorphism which maps to for each , and which also witnesses the Talagrand inequality for the law of relative to the law of .
Key words and phrases:
free Gibbs state, free entropy, free transport, free group factor, invariant random matrix ensembles, asymptotic random matrix theory, Talagrand inequality
1991 Mathematics Subject Classification:
Primary: 46L54, Secondary: 35K10, 37A35, 46L52, 46L53, 60B20
1. Introduction
1.1. Motivation
Free probability initiated a fruitful exchange between random matrix theory and operator algebras. In many situations, tuples of random matrices can be described in the large limit by non-commutative random variables , …, which are operators in a tracial -algebra. Conversely, many properties of non-commutative random variables (and the -algebras that they generate) are easier to understand when they can be simulated by finite-dimensional random matrix models. For instance, Voiculescu used free entropy, defined in terms of matricial microstates, to prove the absence of Cartan subalgebras for free group -algebras [50]; similar techniques were used to give sufficient conditions for a von Neumann algebra to be non-prime and non-Gamma (a convenient list of results and references can be found in [11]). Further applications of random matrices to the properties of - and -algebras can be found for instance in [25] and [22, §4].
Free Gibbs laws are a prototypical example of the connection between random matrices and -algebras. Free Gibbs laws describe the large behavior of self-adjoint tuples of random matrices given by a probability measure of the form
[TABLE]
where is a self-adjoint tuple, denotes Lebesgue measure, is a function (known as a potential) chosen so that is integrable, and is normalizing constant to make a probability measure. Here could be given by , where and is a non-commutative polynomial; for instance, taking
[TABLE]
produces the Gaussian unitary ensemble (GUE). Under certain assumptions on (e.g. convexity and good asymptotic behavior as ), there will be non-commutative random variables , …, in a tracial -algebra such that
[TABLE]
see [21, Theorems 3.3 and 3.4], [14, Proposition 50 and Theorem 51], [29, Theorem 4.1]. The random matrix models satisfy the relation, derived from integration by parts, that
[TABLE]
where is a normalized gradient with respect to the coordinates of and denotes the free difference quotient, and hence the non-commutative tuple satisfies
[TABLE]
see [21, §2.2 - 2.3]. The non-commutative law of a tuple satisfying such an equation is known as a free Gibbs law for the potential .
Given sufficient assumptions on (for instance, Assumption 5.1), many of the classical quantities associated to will converge in the large limit to their free counterparts, besides obviously the convergence of the non-commutative moments . For instance, the normalized classical entropy will converge to the microstates free entropy (see [48, §2], [22, Theorem 5.1], [29, §5.2]), and the normalized classical Fisher information will converge to the free Fisher information (see [29, §5.3]). The monotone transport maps of Guionnet and Shlyakhtenko are well-approximated by classical transport maps for the random matrix models [23, Theorem 4.7]. The solutions of classical SDE associated to the random matrix models approximate the solutions of free SDE; see for instance [3], [22, §2], [12, §4].
1.2. Summary of Main Results
This paper will further develop the connection between classical and free probability for convex free Gibbs laws, by studying conditional expectation (§5), conditional entropy and Fisher information (§6), and conditional transport (§7). This is an extension of our previous work [29].
We consider a sequence of random matrix tuples given by a uniformly convex and semi-concave sequence of potentials such that the normalized gradient is asymptotically approximable by trace polynomials (a notion of good asymptotic behavior as defined in §3.3). Then the following results hold:
- (1)
The non-commutative moments converge in probability to for some tuple of non-commutative random variables in a tracial -algebra. See Theorem 5.2. 2. (2)
The classical conditional expectation behaves asymptotically like the non-commutative conditional expectation where comes from an appropriate non-commutative function space and is a sequence of uniformly Lipschitz functions that “behaves like in the large limit” in the sense of §3. See Theorem 5.9. 3. (3)
The classical conditional entropy converges to the conditional free entropy . This is a similar to a conditional version of . See Theorem 6.6. 4. (4)
There exists a function such that in non-commutative law, where is a free semicircular -tuple freely independent of , and this function also arises from functions such that , where is an independent GUE -tuple. This is the conditional version of transport to the Gaussian/semicircular law. See Theorems 8.10. 5. (5)
This transport map also witnesses the conditional entropy-cost inequality for the law of relative to semicircular conditioned on . See Theorem 8.10. 6. (6)
This transport map furnishes an isomorphism , which shows that is freely complemented in . 7. (7)
Actually, a second application of transport shows that is isomorphic to the -algebra generated by a semicircular -tuple, or in other words . So altogether there is an isomorphism that maps to the canonical copy of inside .
Furthermore, the results about transport can be iterated to produce a “lower-triangular transport” as shown in Theorem 8.11 and discussed further in §1.6. This is analogous to the classical results on triangular transport of measure such as [6].
In the rest of the introduction, we will review notation and then motivate and explain the main results in more detail. In the course of the paper, it will become clear that not only are our main results proved all by the similar techniques, but in fact their statements and proofs are tightly interrelated.
1.3. Notation and Background
We will continue to use the same notation and background as in [29]. The one major change is that we will write superscript rather than subscript for measures and functions defined on matrices. Moreover, we will use the original notation for Voiculescu’s free difference quotient, even though [29] used .
We assume familiarity with the basic properties of tracial -algebras (or tracial von Neumann algebras); see for instance [2]. In particular, a tracial -algebra is a finite -algebra with a specified trace . If is a -algebra, then there is a unique trace-preserving conditional expectation . If is a tuple of operators in , then we denote by the -subalgebra which they generate.
There is an inner product on defined by , and the completion of in this inner product is a Hilbert space known as . We denote the self-adjoint elements of by and recall that if and are self-adjoint, then is real. If and are tuples, we denote . We define , that is, the maximum of the operator norms of .
We denote by the -algebra of non-commutative polynomials in self-adjoint variables. A non-commutative law is a linear map satisfying
- (A)
. 2. (B)
for all . 3. (C)
for all . 4. (D)
for some constant .
The set of non-commutative laws that satisfy (D) for a fixed value of is denoted , and it is equipped with the topology of pointwise convergence on . Likewise, the space of all laws, equipped with the topology of pointwise convergence, will be denoted by .
If is a tuple of self-adjoint elements of , then we may define a non-commutative law by . Conversely, every non-commutative law can be realized in this way through the GNS construction. In particular, a free Gibbs law can be realized by a tuple of self-adjoint operators, and thus the free Gibbs law has a corresponding -algebra , that is unique up to isomorphism.
We always consider as a tracial -algebra with the normalized trace , and in particular, we use the notation , , and as defined above when is an -tuple of matrices. The notation and will never be used for the or norms of functions on matrices, but if we write an norm it will be expressed .
For a smooth function , we denote by and the gradient and Hessian with respect to the normalized inner product . In other words, is the vector in and is the -linear transformation of satisfying
[TABLE]
For functions or , we denote the Lipschitz (semi)norm with respect to using on and .
Note that can also be equipped with the real inner product . Being a real inner-product space, may be identified with by choosing an orthonormal basis in . Lebesgue measure on should be understood with respect to this identification. Moreover, the gradient , Jacobian matrix , divergence , and Laplacian for functions on should also be understood with respect to this identification. Beware that this is not equivalent to using entrywise coordinates for since the off-diagonal entries are complex and conjugate-symmetric, while the diagonal entries are real, and that the normalized gradient above satisfies . For further discussion see [29, §2.1].
1.4. Main Results on Conditional Expectation
Consider a tuple
[TABLE]
of random self-adjoint matrices given by a probability density . We assume that is uniformly convex and semi-concave and that the normalized gradient is asymptotically approximable by trace polynomials (a certain notion of good asymptotic behavior as , explained below). The precise hypotheses are listed in Assumption 5.1. We showed in [29, Theorem 4.1] that in this case, there exists an -tuple of non-commutative random variables such that in probability.
Our first main result (Theorem 5.9) says roughly that the classical conditional expectation given well approximates the -algebraic conditional expectation . This is motivated in general by the importance of conditional expectation in free probability, e.g. its relationship to free independence with amalgamation and to free score functions. See [3, §4] for a study of the large limits of conditional expectations related to matrix SDE. The relationship between classical and free conditional expectation also has implications for the study of relative matricial microstate spaces, such as the “external averaging property” introduced in the upcoming work with Hayes, Nelson, and Sinclair [27].
Applications of conditional expectation within this paper include our results on free Fisher information and entropy (see Theorem 6.6 and Remark 6.8), as well as our proof that Assumption 5.1 is preserved under marginals (see Proposition 8.2).
The statement and proof of Theorem 5.9 rely on a notion of asymptotic approximation for functions on explained in §3. We define a class of non-commutative functions as a certain Fréchet space completion of trace polynomials, such that if and , …, are self-adjoint elements in an -embeddable tracial -algebra , then is a well-defined element of . In particular, every can be evaluated on a tuple of self-adjoint matrices. Now if , we say that if for every ,
[TABLE]
Moreover, if such an exists, then we say that is asymptotically approximable by trace polynomials.
Consider the random matrices and non-commutative random variables as above, and suppose that is uniformly Lipschitz in and that . Then we show that is given by a function such that , and moreover .
A curious feature of this result is that the function is defined for all self-adjoint -tuples of non-commutative random variables, not only for the specific -tuple that we are concerned with. Similarly, the claim that describes the asymptotic behavior of for all , even though the distribution of the random matrix is highly concentrated as on much smaller sets, namely the “matricial microstate spaces” consisting of tuples with non-commutative moments close to those of . Thus, the statement we prove about the functions is stronger than an asymptotic result about approximation such as [23, Theorem 4.7].
1.5. Main Results on Entropy
Voiculescu defined two types of free entropy (see [49], [51], [54]). The first, called , is based on measuring the size of matricial microstate spaces, which is closely related to the classical entropy of the random matrix models (see [29, §5.2]). The second, called , is defined in terms of free Fisher information, which is based on classical Fisher information. Either one should heuristically be the large limit of the classical entropy of random matrix models, but there were many technical obstacles to proving this. The inequality is known in general thanks to [3]. However, even for non-commutative laws as well-behaved and explicit as free Gibbs states given by convex potentials, the equality of and when was not proved until Dabrowski’s paper [12], and the problem is still open for non-convex Gibbs states.
Our previous work [29] gave a proof of this equality in the convex case based on the asymptotic analysis of functions and PDE related to the random matrix models. Here we will use similar techniques for the conditional setting. We will show (Theorem 6.6) that for a random tuple of matrices given by a convex potential as above, the classical conditional entropy converges to the conditional free entropy . Actually, the proof here is shorter than those of [12] and [29] (see Remark 6.8), even considering the results we used from [29].
We focus here only on the non-microstates entropy (defined using Fisher information). It is not yet resolved in the literature what the correct definition of conditional microstates free entropy should be. In light of [29, §5.2], the conditional classical entropy for the random matrix models seems to be a reasonable substitute for microstates entropy, and in the convex setting we expect this to agree with any plausible definition of conditional microstates entropy due to the exponential concentration of measure.
1.6. Main Results on Transport
A transport map from a probability measure and to another probability measure is a function such that . In probabilistic language, if and are random variables, then means that in distribution. The theory of transport (and in particular optimal transport) has numerous and significant applications in the classical setting. For instance, if we have a function such that and we can numerically simulate the random variable , then we can also simulate .
In the non-commutative world, transport is even more significant. As remarked in [23, §1.1], there is no known analogue of a probability density in free probability. However, the existence of transport maps that would express our given random variables as functions of a free semicircular family (for instance) would serve a similar purpose to a density, namely to provide a fairly explicit and analytically tractable model for a large class of non-commutative laws.
Moreover, in contrast to the classical setting, the very existence of transport maps is a nontrivial condition. Being able to express a non-commutative tuple as a function of another non-commutative tuple implies that embeds into , and having a transport map in the other direction as well implies that . In the classical setting, any two diffuse (non-atomic) standard Borel probability spaces are isomorphic. On the other hand, there are many non-isomorphic diffuse tracial -algebras, even after restricting our attention to factors (those which cannot be decomposed as direct sums); see [33]. Moreover, Ozawa [40] showed that there is no separable tracial factor that contains an isomorphic copy of all the others. Thus, there are many instances where it is not even possible to transport one given non-commutative law to another.
The papers [22] and [14] showed the existence of monotone transport maps between certain free Gibbs laws given by convex potentials and the law of a free semicircular family, and thus concluded that each of the corresponding -algebras was isomorphic to a free group factor . In particular, this result applies to the -Gaussian variables for sufficiently small . These transport techniques have been extended to type III von Neumann algebras [36], to planar algebras [37], and to interpolated free group factors [26]. We will focus on “conditional transport” in the tracial setting.
Our first main result about transport is contained in Theorems 7.11 and 7.13. Let be an -tuple of random matrices arising from a sequence of convex potentials satisfying Assumption 5.1. Let be an -tuple of non-commutative self-adjoint variables realizing the limiting free Gibbs law. Then we construct functions such that in distribution, where is a GUE -tuple independent of . We think of this as a conditional transport, which transports the law of to the law of conditioned on .
Moreover, we show that the transport maps satisfy . In the large limit, we obtain in non-commutative law, where is a free semicircular -tuple freely independent of . In particular, this means that (where denotes free product). In other words, is freely complemented in .
By iterating this result, we can show that if is a tuple of non-commutative random variables given by a convex free Gibbs state as above, then there is an isomorphism such that is mapped onto for each , …, . In other words, there is a “lower-triangular transport.” See Theorem 8.11. This is a (partial) free analogue of [6, Corollary 3.10].
This result implies in particular that is a maximal abelian subalgebra and in fact maximal amenable (since the subalgebra is known to be maximal amenable thanks to Popa [42]), and the same holds for each by symmetry. For context on maximal amenable subalgebras, see for instance [42] [7] [8]. More generally, any von Neumann algebraic properties of the sequence of inclusions are the same as for the case of free semicirculars, that is, for the standard inclusions .
Denote by the transport map from the law of to the law of in our construction, so that . We can also arrange that witnesses the Talagrand entropy-cost inequality relative to the semicircular law, that is,
[TABLE]
where the left hand side is twice the entropy relative to semicircular (see §8.3). This is not surprising because it was already known in the classical case that the Talagrand inequality can be witnessed by some triangular transport [6, Corollary 3.10]. Moreover, our construction of the transport maps is a direct application of the same method that Otto and Villani used to prove the Talagrand entropy-cost inequality under the assumption of the log-Sobolev inequality [39, §4]. Thus, our main contribution is to study the large limit of the transport maps using asymptotic approximation by trace polynomials. We also show that is -Lipschitz, and we estimate in terms of the constants and specifying the uniform convexity and semi-concavity of . These estimates will in fact go to zero as .
Unfortunately, the maps constructed here are not optimal triangular transport maps with respect to the -Wasserstein distance, since Otto and Villani’s proof of [39, Theorem 1] uses a diffusion-semigroup interpolation between the two measures, not the displacement interpolation from optimal transport theory. In that sense, the results of this paper do not fully prove an analogue of [6, Corollary 3.10]. Even in the work of Guionnet and Shlyakhtenko [22], which constructed monotone transport maps in the free setting, the question of whether these maps furnish an optimal coupling between and inside a tracial von Neumann algebra was left unresolved. Future research should study optimal transport in the free setting, and determine whether the classical optimal transport (or more generally optimal triangular transport) maps for the random matrix models converge in the large limit in the sense of this paper.
1.7. Outline
The paper is organized as follows. We remark that §2 and §4 are mostly technical background, and the reader may treat them like appendices if desired. In other words, it is feasible to read through the other sections in order and only refer to §2 and §4 as needed to verify technical details of the main results.
§2 gives standard background on convex and semi-concave functions and on log-concave random matrix models.
§3 sets up the algebra of trace polynomials, and the spaces and of functions that can be approximated by trace polynomials. These spaces provide a framework for functional calculus in multiple self-adjoint variables , …, that can realize every element of . They are a convenient tool to describe the large behavior of functions of several matrices, and thus will be used in the statements of our main theorems.
§4 describes solving ODE’s and the heat equation over . These are the technical lemmas used in the rest of the paper to show that the solutions of certain PDE’s have well-defined large limits.
§5 explains the setup of our random matrix models given by convex potentials, and then proves our main result on conditional expectation (Theorem 5.9).
§6 shows that the conditional entropy for random matrix models converges to the conditional non-microstates entropy (Theorem 6.6).
§7 proves the existence of transport maps from a free Gibbs law to the law of a free semicircular -tuple which arise as the large limit of transport maps for the random matrix models (Theorem 7.11 and 7.13).
§8 discusses applications of our results. We show that our standard set of assumptions for log-concave random matrix models is preserved under marginals, independent joins, linear change of variables, and convolution (§8.1). We show that the transport maps constructed above witness (the conditional version of) Talagrand’s entropy-cost inequality relative to Gaussian measure (Theorem 8.10). Then by iterating our conditional transport results, we show the existence of triangular transport (Theorem 8.11).
2. Multi-matrix Models from Convex Potentials
This section is a review and reference for basic results we will use throughout the paper.
We will be concerned with probability measures on of the form
[TABLE]
where is a tuple of self-adjoint matrices, such that is integrable, and is the normalizing constant. Here denotes Lebesgue measure where we identify with using the inner product associated to the trace (the normalization of Lebesgue measure is irrelevant here because if we multiply it by a constant, the normalizing constant for will change to compensate). In this case, we will say that is the measure given by the potential . We will often assume is convex. Note that only determines up to an additive constant, but we will still say that “ is the potential corresponding to ” with a slight abuse of terminology.
A primary motivating example is , where is the normalized trace and is a non-commutative polynomial in , …, . Unlike the notation in many random matrix papers, we prefer to write rather than . This seems natural because is a function with dimension-independent normalization and it would make sense for self-adjoint elements of a tracial -algebra. Meanwhile, is the dimension of and also the scale (in the sense of large deviations) for the standard concentration estimates that hold when is uniformly convex (see for instance [3] or §2.3 below).
2.1. Semi-convex and Semi-concave Functions
Definition 2.1**.**
Let be a self-adjoint linear transformation and let . We say that if is concave. We say that if is convex.
We will also regularly use the following observation:
Lemma 2.2**.**
Suppose that , and let and be self-adjoint linear transformations. The following are equivalent:
- (1)
. 2. (2)
For each , there exists such that
[TABLE]
for all . 3. (3)
* is continuously differentiable and we have*
[TABLE]
for all .
Moreover, in this case, is -Lipschitz with respect to .
Sketch of proof.
(1) (3). Suppose (1) holds. If , then for each there exists such that
[TABLE]
Hence, it follows from [29, Proposition 2.13] that must be continuously differentiable and is -Lipschitz (which proves the last claim of our lemma as well). To prove the inequality asserted by (3), we can reduce to the case when is smooth using a similar argument as in [29, Proposition 2.13]). But in the smooth case, the claim follows by estimating from above and below the formula
[TABLE]
where is the Hessian defined in the standard pointwise sense.
(3) (2). Recall the formula
[TABLE]
This implies that
[TABLE]
This proves the upper bound, and the lower bound is symmetrical.
(2) (1). This follows from the characterization of convex functions by supporting hyperplanes. Indeed, is convex if and only if for every , there exists satisfying
[TABLE]
which is equivalent to the right inequality of (2), and the concavity of follows similarly. ∎
Lemma 2.3**.**
Suppose that for some linear transformation . Then is differentiable and we have
[TABLE]
so that in particular, .
Proof.
As in [29, Proposition 2.13], we obtain differentiability; and moreover to prove the asserted estimate, it suffices to prove the claim for smooth functions . In this case,
[TABLE]
2.2. Some Basic Lemmas
Let satisfy . Then one can check that is integrable; indeed, must achieve a minimum at some and we have and clearly is integrable. Therefore, the probability measure given by is well-defined.
Lemma 2.4**.**
Let satisfy for some scalars . Let be the probability measure given by and let be a random variables whose distribution is . Then
[TABLE]
and
[TABLE]
Proof.
We remark that is continuously differentiable by Lemma 2.2 is differentiable and is Lipschitz. It follows by some straightforward estimation that is integrable with respect to , so that is well-defined. Then follows from integration by parts (see §6.2 for further context on this integration by parts).
Next, let denote the normalized gradient with respect to the matrix variable . Using integration by parts again, we get , so that
[TABLE]
On the other hand, by Lemma 2.3,
[TABLE]
Since the middle term evaluates to , the proof is complete. ∎
Lemma 2.5**.**
Let be a random variable in and let be Lipschitz with respect to in both the domain and target space, and let denote the corresponding Lipschitz (semi)norm. Then
[TABLE]
Proof.
Note that
[TABLE]
Corollary 2.6**.**
Let satisfies , let be the corresponding measure, and let . Then
[TABLE]
Proof.
We apply Lemma 2.5 to . Also, is -Lipschitz by Lemma 2.2. By Lemma 2.4 and . ∎
Lemma 2.7**.**
Let and be positive definite linear transformations . Let be a sequence of functions such that . Let be the associated probability measure. Let be another measure with finite mean. Suppose converges weakly to and the mean of is bounded in as . Then there exists such that and .
Proof.
Since adding a constant to does not change , we can assume without loss of generality that . Now is -Lipschitz where , hence the sequence is equicontinuous. It is also pointwise bounded in light of the previous lemma, since we assumed the mean of is bounded as . Thus, by the Arzelà-Ascoli theorem, by passing to a subsequence, we can assume that converges locally uniformly to some as . Since , this also implies that converges locally uniformly to some , which must satisfy since the family of such functions is closed under pointwise limits (which follows from the family of convex functions being closed under pointwise limits; compare [29, Proposition 2.13(1)]). Moreover, .
Let be the probability measure given by . Since is positive definite, we have for some scalar . Because is bounded in as and , we can see using the dominated convergence theorem that as . It follows again from dominated convergence that for every continuous compactly supported . Hence, , so is given by the potential . ∎
2.3. Log-Sobolev Inequality and Concentration
Log-concave matrix models exhibit concentration of measure as as a consequence of the following classical inequalities.
Definition 2.8**.**
We say that a measure on satisfies the log-Sobolev inequality with constant if for all sufficiently smooth ,
[TABLE]
Definition 2.9**.**
We say that a measure on satisfies Herbst’s concentration inequality with constant if for all Lipschitz functions and , we have and
[TABLE]
where is a random variable distributed according to . Note that by symmetry this implies
[TABLE]
The following theorem is now standard. See for instance [1, §2.3.3 and 4.4.2] and [5]. To summarize the history, the log-Sobolev inequality was introduced by Gross [20]. In the theorem below, (1) is due to Bakry and Emery and (2) is due to unpublished work of Herbst. The application to random matrices was introduced by Guionnet and Zeitouni [24].
Theorem 2.10**.**
- (1)
Suppose that is a probability measure on satisfying and suppose that is convex. Then satisfies the log-Sobolev inequality with constant . 2. (2)
If satisfies the log-Sobolev inequality with constant , then it satisfies Herbst’s concentration inequality with constant .
In particular, we have the following consequences for random matrices. Here we use the gradient and Hessian with respect to the normalized inner product .
Corollary 2.11**.**
Suppose that satisfies and let . Then satisfies the normalized log-Sobolev inequality
[TABLE]
and hence also satisfies the normalized Herbst concentration inequality
[TABLE]
where is Lipschitz and denotes the Lipschitz norm with respect to .
Lemma 2.12**.**
Suppose that is a probability measure on satisfying (2.5) for some constant . Let be Lipschitz with respect to . Then we have
[TABLE]
where and where is a universal constant (independent of and ).
Proof.
First, observe that for . In particular, is -Lipschitz with respect to , and thus
[TABLE]
which implies after a change of variables for that
[TABLE]
Therefore, it suffices to show that for some constant , we have
[TABLE]
We may assume without loss of generality that is self-adjoint since in the general case, , and each of the terms on the right hand side is Lipschitz. Thus, the self-adjoint case would imply the non-self-adjoint case at the cost of doubling the constant . Now to prove self-adjoint case, we use an “-net argument” that is well-known in random matrix theory (see [47, §2.3.1]). Fix . Let be a maximal collection of unit vectors in such that for all . Since this collection is maximal, for every unit vector , there exists some with . Now if , then there is a unit vector with . We may then choose with
[TABLE]
so that
[TABLE]
Note that the balls in are disjoint and contained in . Hence, we can estimate the number of vectors by
[TABLE]
Let . For a matrix , we have
[TABLE]
This implies that is -Lipschitz with respect to and hence
[TABLE]
Since , we have
[TABLE]
Thus, for any , we have
[TABLE]
Now substitute and obtain (2.7) with
[TABLE]
(In fact, for a fixed , we may use in the self-adjoint case.) ∎
3. Functional Calculus and Asymptotic Approximation
In this section, we review the algebra of trace polynomials in self-adjoint variables , …, , as well as a certain completed quotient of this algebra. The elements of represent functions that can be applied to any tuple of self-adjoint non-commutative random variables in an -embeddable tracial -algebra, and application of these functions will produce every element of (see Proposition 3.14). These functions are closed under certain algebraic and composition operations. Moreover, they are a natural tool to describe the large limit of functions on , which we will apply in the rest of the paper.
3.1. The Algebra of Trace Polynomials
Trace polynomials have been used by several previous authors in the study of deterministic and random matrices; a brief list is [44], [45], [43], [46], [10], [15] (which coined the term “trace polynomial”), [30], [31], [14] but they are also used implicitly in many other works. We use the same notation as in our previous paper [29].
We denote by the -algebra of polynomials in self-adjoint non-commuting variables , …, .
We denote by the -algebra of scalar-valued trace polynomials. A formal definition is given in [29]; in short, it is the tensor algebra of the vector space of non-commutative polynomials modulo cyclic symmetry. Informally, this is the commutative -algebra generated by functions of the form , where is a non-commutative polynomial in and is a formal symbol (which stands in for a normalized trace on a von Neumann algebra), where , and where we identify with for all polynomials and . Thus, is spanned as a vector space by elements of the form where , …, .
We denote by the -algebra of operator-valued trace polynomials. This is the -algebra given formally as . As a vector space, it is spanned by elements of the form , where , …, and are in . More generally, we would denote , but these spaces will not be needed in this paper.
The degree of a trace polynomial is defined as one would expect; see [29, §3.1] for precise explanation.
Suppose that , …, are self-adjoint elements of a tracial von Neumann algebra . Then elements of , , and can be evaluated on and by substituting the operator and the trace in place of the formal symbols and . More precisely, the evaluation map is the unique -algebra homomorphism that sends to . Similarly, the evaluation map is the unique -algebra homomorphism that sends to . Finally, the evaluation map is , that is,
[TABLE]
For the most part, we will abuse notation and denote when , and similarly for or . Note in particular that we can consider and thus is defined for and or .
These evaluation maps thus allow us to view as a function (or rather a family of functions) for every tracial -algebra and in particular for every . Similarly, every defines a function for every tracial -algebra and in particular a function for every .
3.2. Functions Approximable by Trace Polynomials
From an analytic viewpoint, we prefer to work with certain separation-completions of and . In [29, §8.1], we sketched several equivalent ways of defining these separation-completions. Here we emphasize their description as functions that can be evaluated on any self-adjoint tuple in (or, as we will see, any -embeddable -algebra).
Let denote the hyperfinite factor (tracial -algebra with trivial center) and let be its (tracial -algebra) ultrapower with respect to some fixed free ultrafilter .
Consider the case of first. Let denote the space of functions that are bounded on operator norm balls, equipped with the family of semi-norms
[TABLE]
(Here “” stands for uniform.) This is clearly a Fréchet space since the topology is given by the countable family of semi-norms given by taking (for background on Fréchet spaces, see e.g. [19, §5.4]). Every defines a function that is a bounded an operator norm balls. In other words, evaluation produces a map . We denote by the closure of the image of this map in . In other words, is the space of functions that can be approximated uniformly on operator-norm balls by trace polynomials.
Remark 3.1*.*
This space was denoted as in our earlier paper [29]. The notation is slightly abusive since we have not shown that the map is injective (and perhaps it is not). However, we will still use the notation since it indicates the connection with trace polynomials.
Earlier, we saw that it makes sense to evaluate a trace polynomial on any self-adjoint tuple in a tracial von Neumann algebra. In fact, makes sense for every when , …, come from a tracial von Neumann algebra that embeds into . To see this, suppose admits a normal trace-preserving embedding . Then we define . This is independent of the choice of trace-preserving embedding if is a trace polynomial, and hence it must also be independent of the choice of embedding when is in .
A similar separation-completion can be defined for . Indeed, let be the set of functions such that
[TABLE]
is finite for each . Again, this is a Fréchet space. Through the evaluation map, every trace polynomial defines an element of and hence there is a linear map . We define to be the closure of the image of this map in .
Similar to the scalar-valued case, we can define evaluation of for tuples in an -embeddable tracial -algebra by using any trace preserving embedding . Indeed, let . Clearly, for , we have where the latter is defined by extending to a map . Since this holds for , then by taking limits, we have for all . Therefore, we may define by . Then one can check this is independent of the choice of embedding similarly as we did in the case of .
Remark 3.2*.*
Because the spaces used here are non-standard, let us briefly describe their relationship to other more familiar ideas. Recall that denotes the space of non-commutative laws of -tuples with operator norms bounded by . We denote by the subspace of laws that can be realized by -tuples in , and . Then we showed in [29, Lemma 8.2] that consists of functions such that the restriction to is continuous for each . One could think of this alternatively as an inverse limit of over the directed system of restriction maps for .
Remark 3.3*.*
The spaces and also arise naturally in the study of model theory of tracial von Neumann algebras introduced in [16, 17, 18]. To avoid some of the technical complexities of sorts, we follow the definitions in [17] where the language has multiple domains of quantification for each sort (and thus we can get away with fewer sorts), and in which formulas are obtained by applying continuous functions to atomic formulas (rather than functions defined on some compact set). For tracial von Neumann algebra , the language includes (though this list is not exhaustive) a sort representing with domains of quantification for each operator norm ball of radius , a special relation-like symbol for the distance , a relation symbol for the trace , and function symbols for the adjoint, addition, and multiplication.
Now is an example of a atomic formula (or strictly speaking, its real and imaginary parts are basic formulas). Similarly, is an atomic formula, where . Since the elements of is obtained by multiplying formulas such as , we see that is a quantifier-free formula for every . Moreover, the supremum of over is the same as the supremum of over . The limiting objects (evaluated on the real parts of operators) are thus uniform limits of quantifier-free formulas on each domain of quantification for every -embeddable tracial von Neumann algebra, that is, they are “quantifier-free definable predicates” relative to the theory of -embeddable tracial von Neumann algebras. Conversely, since is closed under the operation for continuous, every quantifier-free definable predicate satisfying is an element of .
The elements of , evaluated on the real parts of operators, may be viewed similarly as certain “quantifier-free definable functions” relative to the theory of -embeddable tracial von Neumann algebras, meaning that is a quantifier-free definable predicate — actually, for technical reasons a definable function is required to map an operator norm ball into an operator norm ball, so the last statement only applies if we assume our function has this property (but it turns out that such functions exist in abundance in ; see Proposition 3.14 and Proposition 3.17). Alternatively, in order to deal with functions with codomain , we must first modify the language by adding another sort for , with domains of quantification corresponding to -balls, which will act as the target space of the functions in .
The quantifier-free nature of these formulas is a model-theoretic heuristic for why they behave well under limits in non-commutative law (hence describing the large limits of random matrix models). In fact,[29, Proposition 6.28] re-expresses a formula given by quantifiers in a quantifier-free way in order to get behavior under limits. There, we studied the inf-convolution for self-adjoint tuples and . If , then for each ,
[TABLE]
is a formula in the language of tracial von Neumann algebras whose definition involves the quantifier . But if is convex and semi-concave and , then the self-adjoint tuple where the infimum
[TABLE]
is achieved can be evaluated as the limit of a fixed-point iteration using functions from , and hence for some (see [29, Proposition 6.28]). Moreover, it follows from the results in [29] that is Lipschitz in , and thus in light of Proposition 3.17 below, is bounded in operator norm on operator norm balls. So is quantifier-free definable function. We can also conclude that as uniformly on operator norm balls, so is a definable formula (allowing quantifiers). But then because
[TABLE]
we conclude that is in fact a quantifier-free definable predicate.
On the other hand, without the ability to eliminate the quantifier like this, we could not hope for to behave so well for the large limit of random matrix models. Indeed, for to depend continuously on the non-commutative law for in each operator norm ball, it must be in by the last remark, and hence it is a quantifier-free definable predicate.
Many of the properties shown in the next section about operations on and are natural from the model theoretic viewpoint, but we sketch self-contained justifications nonetheless.
3.3. Asymptotic Approximation for Functions of Matrices
Our earlier work introduced asymptotic approximability by trace polynomials for a sequence of functions on , which is a precise description of good asymptotic behavior as suitable for free probabilistic analysis in the limit.
Definition 3.4**.**
Let . We say that is asymptotically approximable by trace polynomials if for every and , there exists such that
[TABLE]
Similarly, for matrix-valued functions , we say that is asymptotically approximable by trace polynomials if for every and , there exists such that
[TABLE]
It will be convenient to denote
[TABLE]
in the scalar-valued case and similarly for the matrix-valued case with rather than . Thus, for instance, the preceding definition says that there exists a trace polynomial with
[TABLE]
Moreover, it is implicit from our discussion in [29, §8.1] that if is asymptotically approximable by trace polynomials, then it will be asymptotic to some or in the following sense.
Definition 3.5**.**
Let or respectively, and let or respectively. Then we say that is asymptotic to , or if for every ,
[TABLE]
Similarly, if and , we make the same definitions with replaced by .
Lemma 3.6**.**
Let (respectively, ). Then is asymptotically approximable by trace polynomials if and only if there exists (respectively, ) such that . Moreover, for each .
Proof.
We record the proof only for the case of scalar-valued functions, since the proof for operator-valued case is identical with minor changes of notation. Suppose that is asymptotically approximable by trace polynomials. Then there exists a sequence of trace polynomials such that for every ,
[TABLE]
As in [29, Lemma 8.1], if , then
[TABLE]
which implies that
[TABLE]
Applying this to , we obtain from the triangle inequality
[TABLE]
and hence is Cauchy with respect to for each . Hence, converges to some . By similar use of the triangle inequality,
[TABLE]
Hence, .
Conversely, suppose that . Choose such that for every . Then
[TABLE]
Hence, it follows that is asymptotically approximable by trace polynomials, namely the polynomials .
We leave the proof of the last claim that to the reader. ∎
Remark 3.7*.*
If and is asymptotically approximable by trace polynomials, then we can asymptotically approximate it using self-adjoint trace polynomials. Indeed, if
[TABLE]
then the same holds with replaced by . Similarly, if is self-adjoint and , then must be self-adjoint.
Remark 3.8*.*
Definitions 3.4 and 3.5 and Lemma 3.6 extend naturally to tuples and . We shall apply them to tuples without further comment in the rest of the paper.
3.4. Algebra, Composition, and Limits
Lemma 3.9**.**
* is an algebra and is a module over . Also, if , then . Moreover, suppose that and are asymptotically approximable, and , , , and . Then we have*
[TABLE]
Proof.
Since the proofs of all the statements are straightforward and similar to each other, we will only explain how to show that if and , then and that if and , then .
First, note that is well-defined as a function on by multiplying the scalar times the vector for each , and also clearly . To show that , it suffices to show that for every and , the function can be approximated by an element of with respect to with error less than . We first choose such that
[TABLE]
Then we choose such that
[TABLE]
and we conclude with the routine observation that
[TABLE]
Next, to show , first observe that
[TABLE]
Then
[TABLE]
which implies that . ∎
In addition to their algebraic structure, functions given by trace polynomials are closed under composition. It turns out that self-adjoint tuples from are closed under composition under the assumption of -uniform continuity of the “outside” function (Lemma 3.12 below).
We say that is -uniformly continuous if for every , there exists such that
[TABLE]
Furthermore, we say is -Lipschitz if for some constant , which is an important special case of uniform continuity. We denote the minimum such constant by . We make the analogous definitions for .
Observation 3.10**.**
If is a function from to or that is -uniformly continuous, then it has a unique continuous extension to , which is also -uniformly continuous. Similarly, if is Lipschitz on , then the extension is also Lipschitz.
Lemma 3.11**.**
Suppose that or and that . If is -uniformly continuous with respect to some modulus of continuity independent of , then is -uniformly continuous on with the same modulus of continuity.
Proof.
Let us only explain the operator-valued case where is -valued and , since the scalar-valued case is easier. We define scalar-valued functions of variables by and . By Lemma 3.9, we have .
Let be a common modulus of continuity for . Let and . Then we may embed into , that is the tracial -ultraproduct of matrices. There exist tuples and of matrices such that and in the ultraproduct and also and . Observe that
[TABLE]
(This equality holds for trace polynomials and hence holds for all functions in by approximation.) On the other hand, we also have for that
[TABLE]
Therefore,
[TABLE]
since . ∎
Lemma 3.12**.**
Let or . Let be -uniformly continuous and let .
- (1)
Then is a well-defined function on , and it is in . 2. (2)
If is also -uniformly continuous, then so is . 3. (3)
Suppose is a function on and such that and . Also, suppose that is -uniformly continuous with the modulus of continuity also uniform in . Then .
Proof.
(1) Because extends to a function on , we can define . Now let us show . Choose and . By uniform continuity of , there exists a such that implies or (for or respectively). Now choose such that , and hence
[TABLE]
Because is a trace polynomial, there is some such that implies . Choose with , and hence
[TABLE]
Then altogether we have .
(2) This is immediate.
(3) This is similar to the proof of (1). Fix and . Choose such that implies or and such that the same holds for as well. Let such that . Note that for sufficiently large , we have and hence
[TABLE]
Then let and be as in (1). Then for sufficiently large , we have
[TABLE]
so overall
[TABLE]
so that for large enough . ∎
Moreover, asymptotically approximable sequences are closed under limits in an appropriate sense.
Lemma 3.13**.**
Let or to for and . Suppose that in for each , and that
[TABLE]
Then converges in to some , and we have .
Proof.
Note that
[TABLE]
Then because of our assumption (3.1), we see that is Cauchy with respect to for each . Thus, converges to some . Then to show that is a routine argument. ∎
3.5. Functional Calculus and Operator Norm Bounds
Now we will show that every element of can be expressed as for some . In fact, we can arrange that can be approximated uniformly by Lipschitz functions. It will be convenient to define the uniform norm
[TABLE]
and we make the same definition for where the supremum is instead taken over .
Proposition 3.14**.**
Let , …, be self-adjoint variables which generate a tracial -algebra that is embeddable into . Let .
- (1)
There exists a -uniformly continuous such that . 2. (2)
The in (1) can be chosen so that there are -Lipschitz functions such that . 3. (3)
If , then can be chosen to be -Lipschitz.
We use the following auxiliary observation. Here will denote the space of non-commutative laws for an -tuple of operators with operator norm . We equip with the topology of convergence in moments. Recall that is compact, separable, and metrizable. In [29, Lemma 8.2], we noted the relationship between and continuous functions on for each . This same idea motivates the proof of the next lemma.
Lemma 3.15**.**
Let and let be a neighborhood of , and let . Then there exists a trace polynomial such that
[TABLE]
Proof.
By Urysohn’s lemma, there exists a continuous function such that and for . The functions of the form for form a self-adjoint algebra in , and they separate points because by definition two laws are the same if they agree on every non-commutative polynomial. So by the Stone-Weierstrass theorem, this algebra is dense in . In particular, there exists a trace polynomial such that for all . Then let . ∎
We will also use the following smooth cut-off trick.
Lemma 3.16**.**
Let . Let such that for and . For , define where is applied through functional calculus. Then
- (1)
* if .* 2. (2)
* for all .* 3. (3)
. 4. (4)
* is globally -Lipschitz.*
Proof.
(1) and (2) follow from the properties of functional calculus. To prove (3), note by the Weierstrass approximation theorem that for every , there is a polynomial such that for . This implies as with (1) that for all with . Claim (4) follows from the results of [41]; the argument is explained in [29, (8.9) and Proposition 8.8]. ∎
Proof of Proposition 3.14.
Let be the law of , and let . Since , there exist non-commutative polynomials such that and hence for ,
[TABLE]
By scaling, we may assume without loss of generality that and set , and then the above statement also holds for . Now let
[TABLE]
which is a neighborhood of in . By the previous lemma, there exists a scalar-valued trace polynomial such that and
[TABLE]
(We can assume without loss of generality that .) Now the function will evaluate at the point to . If with and if the law of is in , then we will have
[TABLE]
On the other hand, if the law of is not in , then . Overall, we have
[TABLE]
This implies that converges with respect to for our given choice of , and of course evaluating this function on it produces the desired operator since .
To extend the function to be be globally defined on , we use the smooth cut-off trick. Let such that for and . For , let . Then because it is the composition of a trace polynomial with a function that is uniformly bounded in operator norm.
Also, since is globally -Lipschitz and since is -Lipschitz on the operator norm ball of radius , we see that is globally Lipschitz in . For all ,
[TABLE]
Therefore,
[TABLE]
converges, and clearly since each of the individual terms is. Furthermore, -uniform continuity of each term and the uniform convergence of the series implies uniform continuity of . Since , we have and , so that
[TABLE]
This concludes the proof of (1).
To verify (2), we take to be the th partial sum of the series defining ; we have shown that the individual terms are -Lipschitz, hence so are the partial sums. Finally, to prove (3), note that if , then also equals where , and by the same reasoning as above is globally -Lipschitz. ∎
We have shown that every element of has the form for some . On the other hand, we will prove that if is Lipschitz, then is actually bounded in operator norm. We state our estimate in terms of unitarily invariant random matrix models which satisfy concentration (2.5), but as explained in Remark 3.18 such models exist whenever is embeddable into .
Proposition 3.17**.**
Let be a tuple of self-adjoint variables in a -algebra whose non-commutative law is . Suppose there is a sequence of probability measures on , invariant under unitary conjugation, that satisfies the concentration estimate (2.5) for some constant , and such that the corresponding random variables satisfy in probability. Then is embeddable into . Moreover, if is -Lipschitz, then is a bounded operator and
[TABLE]
where is a universal constant.
Proof.
In light of Lemma 2.12,
[TABLE]
and
[TABLE]
Also, the non-commutative law of converges in probability to that of and finally in probability as a consequence of concentration. Therefore, we may choose a sequence of elements such that
[TABLE]
Because by unitary invariance and because of concentration, must converge to since converges to the in probability. So overall in operator norm. In particular,
[TABLE]
and hence is bounded as . Moreover, our choice of also satisfies
[TABLE]
since again by unitary invariance.
Fix a free ultrafilter and let be the tracial -ultraproduct of the sequence of matrix algebras. Since is bounded in operator norm, defines an element of . By definition of ultraproducts, for every non-commutative polynomial and therefore the non-commutative law of is (which is the same as that of ). In particular, embeds into and hence also into . (Compare [22, Theorem 4.4].)
Since is -embeddable, is well-defined, and clearly . Now we claim that is given by the sequence as an element of (that is, application of commutes with ultralimits). It is easy to check that when . But for any , there exists with . Thus, and also for sufficiently large . This implies that . Thus, as claimed. The same holds with replaced by . This implies
[TABLE]
Remark 3.18*.*
Suppose that is embeddable into . Then there exist tuples in such that and . Let be an random Haar unitary matrix and let . Clearly, the probability distribution of is unitarily invariant and also in probability.
To check concentration, observe that is a -Lipschitz function from the unitary group to with respect to . Therefore, if is Lipschitz, then is also Lipschitz, with the Lipschitz constant . It was proved in [35, Theorem 15], [34, Theorem 5.16] that the Haar measure on the unitary group satisfies the (non-normalized) log-Sobolev inequality with constant and the corresponding concentration of measure for Lipschitz functions with respect to the Hilbert-Schmidt metric . After renormalization this implies that the Haar measure on the unitary group satisfies (2.5) with . Hence, satisfies (2.5) with .
4. Tools for Differential Equations in
This section describes two analytic operations — solution of ODE and convolution with the Gaussian law — that can be performed on tuples in and on asymptotically approximable sequences of functions on matrices. These operations were applied in [29], and will be applied in the remainder of this paper, to analyze the large limit of certain PDE associated to random matrix models, and hence to understand the behavior of convex matrix models in the large limit.
4.1. Flows Along Vector Fields
Several times in our study of partial differential equations, we will use flows along vector fields given by functions in and by asymptotically approximable sequences of functions on matrices. For instance, this idea was used in [29, Lemma 4.10], and in this paper, it will be used in the proof of Lemma 5.13 and Theorem 7.11.
The setup is roughly speaking as follows. Consider a time interval . Let be a function such that is a tuple of functions in for each (satisfying certain uniform continuity assumptions). Also, let . Then we would like to construct such that
[TABLE]
Moreover, we would like to show that if is a function on that is asymptotic to and , then the solutions are asymptotic to the solution .
Such a proof was essentially carried out in [29, Lemma 4.10], but now we introduce the added complexity that will depend on , , and an auxiliary parameter , and we must solve the initial value problem
[TABLE]
The added parameter arises naturally in our analysis of conditional expectation, entropy, and transport since it represents the variables we are conditioning upon (see for instance §5.3).
For the sake of future reference, let us state the set of assumptions we make about the vector field . These assumptions are framed for a convenient and applicable level of generality rather than maximum generality.
Assumption 4.1**.**
We are given and a function satisfying:
- (1)
For each , we have . 2. (2)
* is -Lipschitz in , that is, for some constant independent of , we have*
[TABLE] 3. (3)
The map is a continuous function with respect to the Fréchet topology on . This implies that for every and for every , there exists , such that
[TABLE]
(where we have upgraded from continuity to uniform continuity because of compactness of ).
Observation 4.2**.**
Under this assumption, as in Observation 3.10, we see that has a unique continuous extension to . Furthermore, for each , the function is continuous (though the modulus of continuity cannot be chosen independent of ). Continuity follows because there exists a sequence such that in . Now is continuous by assumption (3), but assumption (2) implies that uniformly on .
Under these assumptions, (4.1) can be solved by the standard method of Picard iteration. We first verify that Assumption 4.1 is preserved under the composition and integration operations used to define Picard iterates.
Lemma 4.3**.**
Suppose that satisfies Assumption 4.1 and suppose that is globally -Lipschitz. Then the function
[TABLE]
is well-defined by Riemann integration and it also satisfies Assumption 4.1.
Proof.
The Riemann integral is defined because is continuous with respect to for each (and in fact, each ). Now let us check that satisfies Assumption 4.1.
(1) Fix and . By assumption (2) for , there exists such that
[TABLE]
Fix , then choose a partition , …, of such that . Then let such that
[TABLE]
Then
[TABLE]
Therefore,
[TABLE]
This shows that is in . Because is in this space as well, this implies that is in as desired.
(2) If is -Lipschitz for all , then .
(3) Since is continuous with respect to , we must have for some constant . Then . ∎
Lemma 4.4**.**
Suppose that and satisfy Assumption 4.1. Then also satisfies Assumption 4.1.
Proof.
The composition makes sense because extends to be defined for . It follows from Lemma 3.12 that satisfies (1). The Lipschitz estimate (2) is straightforward and left to the reader. To prove (3), let be a Lipschitz constant for as a function of that works for all . Fix . Proceeding as in the proof of Lemma 4.3, we can choose a partition of and such that
[TABLE]
Then there exists some such that implies for all . Then by applying assumption (3) to , there exists such that
[TABLE]
We also choose such that
[TABLE]
Supposing that and , we have
[TABLE]
Meanwhile, after we pick such that , then
[TABLE]
The middle term can be estimated by because . Meanwhile, the first and third terms can each be estimated by using the Lipschitz property of and our choice of . Altogether, implies that whenever . ∎
Proposition 4.5**.**
Let satisfy Assumption 4.1 and let . Then there exists a unique continuous satisfying
[TABLE]
Moreover, also satisfies Assumption 4.1.
Proof.
We define the Picard iterates inductively by
[TABLE]
The previous two lemmas imply that is well-defined and satisfies Assumption 4.1. Convergence of the Picard iterates follows from the standard proof of Picard-Lindelöf. Briefly, given that is -Lipschitz in with respect to , we have
[TABLE]
Also, we have
[TABLE]
where , which is finite because of continuity of in . From here a straightforward induction on shows that for ,
[TABLE]
because . Now because converges, we know that
[TABLE]
and
[TABLE]
The fact that satisfies the integral equation is straightforward, and the proof of the uniqueness of this is also standard.
It remains to show that satisfies Assumption 4.1. First, recall that is Lipschitz in uniformly for all . If is a Lipschitz constant for this function, then
[TABLE]
In particular,
[TABLE]
This implies that the convergence of to occurs uniformly for with and all . Then because can be approximated in by trace polynomials, the same must be true for for each , which shows that satisfies (1). Similarly, because of the uniform convergence of to for and , the uniform continuity property (3) for follows from property (3) for .
Finally, we must show (2) that is Lipschitz in . More precisely, we claim that
[TABLE]
Now it suffices to check that each Picard iterate satisfies this estimate. This can be verified by induction on . The base case is immediate. For the induction step, we observe that
[TABLE]
using the fact that is -Lipschitz. Then we plug in our induction hypothesis that is bounded by , and then directly evaluate the integral to close the induction. ∎
We have now shown that it makes sense to solve ODE for tuples in . There is a parallel list of results which instead deal with functions on matrices that are asymptotically approximable as . We use the following assumptions.
Assumption 4.6**.**
We are given and for each a function such that
- (1)
For each , there exists such that . 2. (2)
* is -Lipschitz in with some Lipschitz constant independent of and .* 3. (3)
For every and for every , there exists , such that
[TABLE]
Proposition 4.7**.**
Let satisfy Assumption 4.6, and let be asymptotically approximable such that and is Lipschitz uniformly in . Then for each there is a unique satisfying
[TABLE]
Moreover, also satisfies Assumption 4.1. Furthermore, the vector field such that satisfies Assumption 4.1, and we have where is the solution given by Proposition 4.5.
Proof.
The proof of existence and uniqueness of the solution is almost identical to that of Proposition 4.5. First, one shows that Assumption 4.6 is preserved under integration and composition (analogous to Lemma 4.3 and 4.4). Then exactly as in the proof of Proposition 4.5, one defines Picard iterates, proves they converge, establishes Lipschitz bounds, and checks they satisfy Assumption 4.6. The one additional feature in these proofs is to make all the estimates uniform in . For instance, the quantity in the proof of Proposition 4.5 is replaced by
[TABLE]
Then has some Lipschitz constant independent of , and
[TABLE]
But then we can show that is finite. This is because if , then is finite because of Assumption 4.1 (3) and the fact that is asymptotically approximable and hence bounded in as .
Now the fact that satisfies Assumption 4.1 is a straightforward limiting argument. The key ingredient is that if , then .
Finally, to show that , it suffices to show that for each of the Picard iterates because of the uniform convergence of as for , where the rate of convergence is also independent of . Furthermore, since the Picard iterates are defined inductively by composition and integration, it suffices to show that the asymptotic approximation relation is preserved by these operations. Preservation under integration follows because the integrals can be approximated by Riemann sums and this approximation is uniformly good for and for all because of the uniform continuity Assumption 4.6 (3). Preservation under composition follows from Lemma 3.12. ∎
4.2. The Heat Semigroup
Recall that the solution to the classical heat equation is given by convolution the heat kernel (which is given by a Gaussian probability density). In particular, let be the probability distribution of an -tuple of independent GUE matrices such that , which is given by density . If , then solves the normalized heat equation
[TABLE]
Here is meant in the sense of convolving a function with a measure, and this is the same as convolving of with the density function for . The meaning of is to be interpreted using coordinates with respect to some orthonormal basis of in the inner product ; this is not the same as differentiating entrywise since some of the entries are real and some are complex.
Our goal is to describe the large behavior of when is asymptotically approximable by trace polynomials, and to define “” when .
In [29, §3.2 and 3.3], using similar methods to [10], we explained the computation of as a function on when or . More precisely, let denote the Laplacian with respect to the coordinates of the matrix . We found that for there are linear maps defined purely algebraically, such that when is viewed as a function on , and do not increase the degree of a trace polynomial, and coefficient-wise.
A similar analysis holds for the Laplacian of viewed as a function . Here we follow the standard convention of using the same symbol for the Laplacians of vector-valued functions as for the Laplacians of scalar-valued functions; thus, the reader must be careful to distinguish scalar-valued and vector-valued functions based on context. We saw that there were linear transformations such that as a function on matrices, and do not increase degree, and coefficient-wise.
We deduced as a consequence that has a well-defined large limit if is a trace polynomial [29, Lemma 3.21], and that if is asymptotically approximable by trace polynomials, then so is [29, Lemma 3.28].
In order to establish “conditional versions” of our earlier results, we must consider trace polynomials in variables and take the Laplacian with respect to while treating as an auxiliary parameter. We denote by , , and the various Laplacian operators with respect to .
Because and map the finite-dimensional vector space trace polynomials of degree into itself, there are well-defined linear operators and on the space of trace polynomials in of degree for each each , and each real . Since trace polynomials are the union of the subspaces of trace polynomials with degree , there are linear operators . Moreover, these operators form a semigroup, and they satisfy the following property, which is an extension of [10, Theorem 2.4] to the spaces .
Lemma 4.8**.**
Let be a random variable in with finite moments, and let be an independent GUE random variable in . Then we have
[TABLE]
Similarly, suppose that is a tuple of self-adjoint non-commutative random variables, and let be a freely independent tuple with non-commutative law . Then
[TABLE]
and
[TABLE]
where is the unique trace-preserving conditional expectation.
Proof.
Since is independent and distributed according to , we have
[TABLE]
On the other hand, for ,
[TABLE]
because both sides are the solution to the heat equation on the space of coordinate-wise polynomials on of degree . This shows (4.2).
To prove the free versions, we assume familiarity with the results of free probability (see e.g. [55], [38], [1, Chapter 5]). Suppose that are non-commutative random variables and is a freely independent free semicircular -tuple with law . We may assume that is a free Brownian motion, so that for and . Note that is a well-defined operator on trace polynomials. To prove (4.3), it suffices to show that for . This will follow if we check that
[TABLE]
From a free probabilistic computation sketched in [29, Lemma 3.23], we have
[TABLE]
and hence
[TABLE]
Next, to prove (4.4), it suffices to show that for , we have
[TABLE]
since functions of the form for are dense in . Consider the function given by . Notice that
[TABLE]
Here the first equality is checked directly from the definition of the Laplacian [29, see Def. 3.13 and 3.16, proof of Lemma 3.18]. The equality again is checked from the definition of the Laplacian; this equality is intuitive since is independent of . Since the same reasoning may be applied to compute the Laplacian of , we have
[TABLE]
We can view as a function of the -tuple and the -tuple , that is, an element of . We apply (4.3) to and the pair and obtain
[TABLE]
which means precisely that
[TABLE]
which completes the proof of (4.4). ∎
Remark 4.9*.*
The free conditional expectation formulas (4.3) and (4.4) could also be proved using random matrices provided that is -embeddable. Indeed, let be (deterministic) tuples of matrices with non-commutative laws converging to the law of and let . Then to prove (4.3) for instance, we could use the fact that and take the limit as using Voiculescu’s theorem on asymptotic freeness [52, Theorem 2.2]. A similar proof could be done for (4.4).
Lemma 4.10**.**
If for , then we have for . In particular, extends to a unique continuous linear operator .
Proof.
Let with . Let be a freely independent semicircular tuple. If , then
[TABLE]
Since , we have . Therefore, as desired. Similarly, if , then we check using the conditional expectation formula (4.4). Now the continuous extension of to is immediate. ∎
The semigroup acting on describes the large limit of the Gaussian convolution semigroup on defined as follows.
Definition 4.11**.**
For or , we denote
[TABLE]
Moreover, we denote by the continuous extension of .
Lemma 4.12**.**
Suppose that is asymptotically approximable by trace polynomials and . Furthermore, assume that for some and , we have
[TABLE]
Then . The same holds for and with replaced by .
The proof of this lemma is the same as in [29, Lemma 3.28].
Remark 4.13*.*
In both the scalar-valued and matrix-valued cases, the assumption (4.5) holds automatically with provided that and are -uniformly continuous (with modulus of continuity independent of ). Let us focus on the matrix-valued case of , there exists such that
[TABLE]
In particular, given , we can choose an integer such that . Then we have
[TABLE]
Thus,
[TABLE]
which implies the first estimate of (4.5). The case for is handled similarly, and we note that is bounded as because of our assumption that . The same argument works in the case of scalar-valued functions and .
5. Conditional Expectation for Free Gibbs States
5.1. Free Gibbs States from Convex Potentials
In [29] and in the present work, we focus on the following situation:
Assumption 5.1**.**
We are given and such that
- (1)
, that is, is convex. 2. (2)
, that is, is concave. 3. (3)
* is asymptotically approximable by trace polynomials.*
We denote by the probability measure on given by
[TABLE]
Furthermore, we assume that the mean is a scalar multiple of the identity matrix.
The following was proved in [29, Theorem 4.1].
Theorem 5.2**.**
Let and be as in Assumption 5.1. Then there exists a non-commutative law such that for every non-commutative polynomial , we have
[TABLE]
Moreover, we have for every and that
[TABLE]
Corollary 5.3**.**
Let and be as in Theorem 5.2. Let be a random -tuple of matrices distributed according to and let be a non-commutative random -tuple distributed according to . Let . Suppose there are constants and and such that
[TABLE]
Suppose that and where . Then
[TABLE]
Proof.
Let which we assumed to be a scalar multiple of the identity, and which we know has a limit as . By Lemma 2.12, we have
[TABLE]
In particular, letting , we have
[TABLE]
and
[TABLE]
Therefore, in order to prove convergence of the expectation, it suffices to check that converges in probability to .
We already know that converges to in probability for every non-commutative polynomial . It follows that if is a scalar-valued trace polynomial, then in probability. This also holds for ; indeed, we know that with probability tending to and , whereas can be approximated in by trace polynomials. Finally, if is a sequence of scalar-valued function such that , then converges to [math] in probability, and hence converges in probability to . By Lemma 3.9, we can apply this statement to and , which completes the argument. ∎
Definition 5.4**.**
Let and suppose extends to a function such that is convex and is concave. In this case, is differentiable as a function on the real Hilbert space , as a consequence of the existence of supporting hyperplanes for convex functions on a Hilbert space. If we assume also that , then we say that .
Remark 5.5*.*
We did not prove or assume that the trace polynomials which approximate are the gradients of the same trace polynomials that approximate . Thus, this definition is technically different from that of [29, §8.2].
Definition 5.6**.**
If , then we may define , and in this case . Clearly, is asymptotically approximable by trace polynomials, and so by Theorem 5.2, there exists a non-commutative law that arises as the large limit of the associated random matrix models. Furthermore, the limiting free Gibbs law only depends on , that is, every approximating sequence of functions will produce the same free Gibbs law (see [29, §8.2]). We call the free Gibbs state given by potential .
Remark 5.7*.*
One can check that if is as in Assumption 5.1, then there exists a such that and . Thus, the non-commutative laws that arise from these random matrix models are precisely for .
Remark 5.8*.*
Since is independent of the choice of approximating sequence , we can in particular take , which produces a canonical unitarily invariant sequence of random matrices models.
5.2. Main Result on Conditional Expectation
Our main result in this section is in some sense a generalization of [29, Theorem 4.1], which deals with conditional expectations rather than expectations. The proof of the earlier theorem was reduced to the following statement: Suppose satisfies Assumption 5.1 and that is -Lipschitz (uniformly in ) and asymptotically approximable by trace polynomials. Then
[TABLE]
Now, our goal is to prove the following.
Theorem 5.9**.**
Consider functions , denoted as , which satisfy Assumption 5.1 as functions of . Let be the associated probability measure on . Let be an -tuple of random matrices distributed according to , and let be a -tuple of non-commutative random variables distributed according to the limiting free Gibbs law given by Theorem 5.2
Let be -Lipschitz (uniformly in ) and suppose . Let be the function given by
[TABLE]
which is well-defined function because has positive density everywhere. Then is Lipschitz with
[TABLE]
Moreover, there exists such that and hence
[TABLE]
The gist of the theorem is that the conditional expectation behaves in the large limit like the -algebraic expectation . For instance, if is globally Lipschitz in , then the -algebraic conditional expectation of can be approximated by the classical conditional expectation .
In fact, we can approximate for every using classical conditional expectations in the same sense. Indeed, we showed in Proposition 3.14 that every can be expressed as where is -uniformly continuous, and there exist -Lipschitz functions such that with respect to the uniform norm . Let and be given by
[TABLE]
and the analogous relation for and . Because conditional expectation is a contraction in (for functions taking values in with ), we have
[TABLE]
By the theorem, there exists such that . Given that , a routine argument (“exchange of limits and uniform limits”) shows that there exists such that . In other words, the conclusion of Theorem 5.9 holds also for and thus can be viewed as the large limit of .
5.3. Strategy
Our proof will follow the same strategy as the special case in [29, §4]. In that paper, we showed that if and on are as in Assumption 5.1 and if is uniformly Lipschitz and asymptotically approximable by trace polynomials, then exists.
We considered the diffusion semigroup that solves the equation
[TABLE]
As mentioned in [29, §4], this diffusion semigroup has an equivalent SDE formulation, and is a standard tool in proving the log-Sobolev inequality and concentration estimates (see for instance, [32], [1, §4.4.2], [14]).
Now and . As , the function converges to the constant function at a rate independent of . On the other hand, we showed in [29, Lemma 4.10] that if and are asymptotically approximable by trace polynomials, then so is . Hence, we concluded that the sequence of constant functions is asymptotically approximable by trace polynomials, which means that the limit as exists.
Now we apply the same method in the conditional setting to prove Theorem 5.9. Let be a function satisfying Assumption 5.1. If we fix , then is uniformly convex and semi-concave function of , so it defines a log-concave probability measure on . This produces a well-behaved conditional distribution of given , where . Explicitly, for , we have
[TABLE]
We will evaluate this conditional expectation as the limit as of , where is the semigroup, acting on Lipschitz functions of , that solves
[TABLE]
where denotes the differential (Jacobian) of as a function from to and denotes the adjoint. In §5.4, we will analyze how affects the Lipschitz norms with respect to and separately and hence show that the conditional expectation is given by a Lipschitz function of . In §5.5, we will show that preserves asymptotic approximability by trace polynomials of and conclude our argument. The new aspect compared to [29] is that the functions are matrix-valued and depend on an extra parameter .
5.4. Conditional Diffusion Semigroup
To simplify notation, let us fix and fix for the remainder of §5.4. We will denote
[TABLE]
which is a measure on depending on the parameter . The associated semigroup will be approximated by alternating two other operators and on short time intervals. Let denote the semigroup of convolution with Gaussian with respect to , that is,
[TABLE]
The semigroup is given by
[TABLE]
where is the solution to the initial value problem
[TABLE]
This solution is defined for all by the Picard-Lindelöf theorem because is globally Lipschitz in (compare §4.1).
Proposition 5.10**.**
There exists a semigroup acting on Lipschitz functions such that the following hold:
- (1)
If is a dyadic rational, let . Then as and more precisely
[TABLE] 2. (2)
If , we have
[TABLE] 3. (3)
. 4. (4)
. 5. (5)
We have as and specifically
[TABLE]
Proof.
These results follow by freezing the variable and applying the results from our previous paper, specifically,
- (1)
see [29, Lemma 4.5], 2. (2)
see [29, Lemma 4.6], 3. (3)
see [29, Lemma 4.6], 4. (4)
see [29, Lemma 4.8], 5. (5)
see [29, Lemma 4.9].
The results of [29, §4] were stated only for scalar-valued functions. However, the arguments hold for functions from to any finite-dimensional normed vector space. The result (4) that is expectation-preserving follows immediately by applying the scalar-valued result to each coordinate of the vector-valued function in some basis. To verify the estimates, one simply replaces the “” in the arguments by the appropriate norm, which in our case would be on . ∎
We will next show that and depend in a Lipschitz manner upon . Let us denote
[TABLE]
Lemma 5.11**.**
With the setup above, we have for Lipschitz
- (1)
* and .* 2. (2)
. 3. (3)
. 4. (4)
* and .* 5. (5)
. 6. (6)
.
Proof.
(1) Fix and . Define
[TABLE]
Note that is locally Lipschitz in and hence absolutely continuous. Moreover, is with
[TABLE]
Here we have employed the inequality coming from the uniform convexity of as well as the Cauchy-Schwarz inequality. This implies that
[TABLE]
Thus, , so that . This implies that
[TABLE]
But and . Hence,
[TABLE]
This proves both estimates of (1).
(2) This is immediate since , as in [29, Lemma 4.4 (5)].
(3) Note that
[TABLE]
(4) This follows from basic properties of convolution of a function with a probability measure.
(5) By iterating the estimates (2) and (4), we obtain . Then by Proposition 5.10 (2) and (3) we may take and then extend to all real values of .
(6) First, consider for a dyadic rational . Denote . For , …, , we have
[TABLE]
where the last inequality follows from (3). Therefore, by induction
[TABLE]
In light of Proposition 5.10 (1), we can take and conclude that for dyadic rational . This inequality can then be extended to all real by Proposition 5.10 (2). ∎
Corollary 5.12**.**
Let be Lipschitz with respect to . Let . Then is Lipschitz with
[TABLE]
Proof.
By the previous lemma,
[TABLE]
As , we have by Proposition 5.10 (5). Hence, . ∎
5.5. Asymptotic Approximation and Convergence
Let and be as in Theorem 5.9, let be a random variable with distribution . Let denote the conditional distribution of given .
Let , , and be the semigroups acting on Lipschitz functions defined as in §5.4 with respect to the potential .
Lemma 5.13**.**
With the notation above, suppose that , that is -Lipschitz for every , and that is asymptotically approximable by trace polynomials. Then
- (1)
* is asymptotically approximable by trace polynomials.* 2. (2)
* is asymptotically approximable by trace polynomials.* 3. (3)
* is asymptotically approximable by trace polynomials.*
Proof.
(1) We proved in Lemma 4.12 that preserves asymptotic approximability by trace polynomials.
(2) Recall that , where
[TABLE]
Now is -Lipschitz in , asymptotically approximable by trace polynomials, and independent of , and thus it satisfies Assumption 4.6, so by Proposition 4.7, is asymptotically approximable by trace polynomials (here we rely on Lemma 3.6 that asymptotic approximability is equivalent to being asymptotic to some element of ). Then because is -Lipschitz in , Lemma 3.12 implies asymptotic approximability of .
(3) Let whenever . From (1) and (2), it follows that is asymptotically approximable by trace polynomials. Now for each dyadic , Proposition 5.10 (1) shows that uniformly on -balls (and hence on ). Therefore, by Lemma 3.13, is asymptotically approximable by trace polynomials. Then we extend this property from dyadic to all real using Proposition 5.10 (2) and Lemma 3.13. ∎
Proof of Theorem 5.9.
Let be -Lipschitz and asymptotically approximable by trace polynomials. Let
[TABLE]
We showed in Corollary 5.12 that is Lipschitz with . We know that is asymptotically approximable by trace polynomials in . By Proposition 5.10 (5), we have as , with the error bounded by
[TABLE]
Given that is asymptotically approximable by trace polynomials, is bounded as . This implies that the rate of convergence of as is uniform on and independent of . So by Lemma 3.13, is asymptotically approximable by trace polynomials of . Yet is independent of , and so we may approximate by evaluating these trace polynomials at , which reduces them to trace polynomials of .
Since is asymptotically approximable by trace polynomials, let such that . Then it remains to show that , where are non-commutative random variables for the free Gibbs law as in the theorem statement. It suffices to check that
[TABLE]
whenever is a non-commutative polynomial. But using Corollary 5.3,
[TABLE]
Remark 5.14*.*
We showed in §4.2 that has a large limit acting on . Similarly, the results of §4.1 imply that has a large limit acting on . This implies that the semigroup also has a large limit in light of Proposition 5.10 (1) and (2) and Lemma 3.13. Future research should investigate in what sense would solve the differential equation
[TABLE]
where is the large limit of and is the Jacobian matrix of with respect to the variable .
6. Conditional Entropy and Fisher’s Information
In this section, we show that for random matrix models satisfying Assumption 5.1, the conditional (classical) entropy converges to the conditional non-microstates free entropy (also known as ).
6.1. Conditional Entropy and Fisher’s Information in the Classical Setting
We refer to [54, §3] and [29, §5] for background on classical entropy and Fisher’s information and motivation for the free case. The conditional setting is more technical, and we will state several standard results without proof, since the proofs in the non-conditional case were repeated in some detail in [29].
Recall that the classical entropy of a random variable in with probability density is . Similarly, if is a random variable in with density , then the conditional entropy is defined by
[TABLE]
where is the marginal density
[TABLE]
and is the conditional density
[TABLE]
It is a standard fact that if has finite variance, then is well-defined. The proof for the non-conditional entropy was reviewed in [29, Lemma 5.1], and the conditional case can be handled similarly.
The conditional Fisher information given by
[TABLE]
whenever the right hand side makes sense and otherwise. It describes the rate of change of , where is a Gaussian random variable in with covariance matrix independent of . Knowing that the density satisfies the heat equation
[TABLE]
one can show that is well-defined and finite for and that
[TABLE]
The Fisher information is the norm of the (-valued) random variable given by evaluating the score function on the random variable , provided that this random variable is in . In this case, the random variable is known as the score function for given , and it is the unique element of satisfying the integration-by-parts relation
[TABLE]
More generally, if there exists a random variable in satisfying this integration-by-parts formula, then we define the conditional Fisher information to be (and this extends our previous definition of ). Otherwise, is defined to be .
In light of the integration-by-parts characterization, score functions behave well under conditionally independent sums. The following lemma is proved in the same way as the non-conditional case (see [29, Lemma 5.6]) and the free case (see [51, Proposition 3.7]).
Lemma 6.1**.**
Let be a random variable in and let and be random variables in that are conditionally independent given . Suppose that is a score function for given . Then is a score function for given . Hence,
[TABLE]
In particular, this holds if is independent from or is independent of .
Score functions also scale in the following way. The proof is straightforward from the integration-by-parts relation.
Lemma 6.2**.**
If is a score function for given and , then is a score function for given , and hence .
6.2. Random Matrix Renormalization
Suppose that is a random variable in with density . The trace on produces a real inner product. But to study the large limit, we use the normalized trace . The corresponding normalized Gaussian is the GUE ensemble where has variance with respect to . We use the following renormalized entropy, which is motivated by computation of the Gaussian case and by (6.5) below,
[TABLE]
Due to the normalization of Gaussian, the evolution of the density for is given by the renormalized heat equation
[TABLE]
This results in
[TABLE]
where , assuming that has finite variance and .
Another heuristic for the normalization comes from analyzing the case where have density where is uniformly convex and semi-concave. Indeed, in this case, the classical score function for given is . Recall that is the gradient of with respect to the normalized inner product . Thus,
[TABLE]
is a dimension-independent normalization. Furthermore, the normalized score function (which would be in the case where the law is given by a potential ) satisfies the integration-by-parts relation
[TABLE]
where and where is the divergence with respect to the classical coordinates (not normalized). But if is a non-commutative polynomial, then
[TABLE]
where denotes the non-commutative derivative or free difference quotient with respect to . Thus, applying the integration-by-parts relation to non-commutative polynomials results in the dimension-independent relation
[TABLE]
that characterizes the normalized score function.
As a consequence of (6.5), can be recovered by integrating and modifying the integral to converge at . This results in
[TABLE]
provided that has a density and that has finite variance. The proof is similar to [29, Lemma 5.7]. Convergence of the integral at can be deduced from the following estimate, and it also shows convergence of the integral at [math] if is finite. Compare [51, Corollary 6.14 and Remark 6.15] and [29, Lemma 5.7].
Lemma 6.3**.**
Let be a random variable in such that , and let be an independent GUE -tuple. Then
[TABLE]
Proof.
We observe that is a normalized score function for given by Lemma 6.1. This yields . On the other hand, if is a normalized score function for given , we also have , which yields the upper bound . The lower bound follows from observing and evaluating the right hand side using integration by parts. ∎
6.3. Convergence to Conditional Free Entropy
Motivated by the normalized entropy and Fisher’s information in the previous section, Voiculescu defined the free versions as follows. Let be an -tuple of self-adjoint non-commutative random variables in a tracial -algebra . We say that is a free score function for given (also known as a conjugate variable) if for every non-commutative polynomial , we have
[TABLE]
The free Fisher information is defined to be if such a exists, and otherwise. The non-microstates free entropy is defined to be
[TABLE]
Convergence of the integral at follows from the free analogue of Lemma 6.3, so that is well-defined in whenever has finite variance.
Remark 6.4*.*
Voiculescu’s original notation in [51, §7] was rather than , since the definition of the free score function can be rephrased so as to depend only on rather than . However, we prefer to write instead by analogy with the classical case, using the vertical bar to denote “conditioning.” This avoids potential confusion with the notation for microstates entropy of in the presence of used in [50, §1].
The following lemma gives sufficient conditions for classical Fisher information for random matrix models to converge to free Fisher information. The main hypotheses are that the non-commutative laws converge, the score functions for the matrix models are asymptotically approximable by trace polynomials, and some mild growth conditions on score functions and probability measures as . We omit the proof since it is a direct adaptation of the proof of [29, Proposition 5.10].
Lemma 6.5**.**
Let be a potential with , let be the associated probability density, and let be a random variable distributed according to . Let be an -tuple of self-adjoint non-commutative random variables in the tracial -algebra . Assume that:
- (A)
The non-commutative law of with respect to converges in probability to the non-commutative law of . 2. (B)
* is defined and continuous, and the sequence is asymptotically approximable by trace polynomials, and hence .* 3. (C)
For some and , we have
[TABLE] 4. (D)
There exists such that
[TABLE]
Then is finite. Moreover, is in and it is the free score function for given , and we have
[TABLE]
Theorem 6.6**.**
Let satisfy Assumption 5.1 for some . Let be the corresponding measure, let be random variables chosen according to , and let be an independent -tuple of GUE matrices.
Let and be non-commutative random variables with non-commutative law , and let be a freely independent free semicircular -tuple. Then for every , we have
[TABLE]
and
[TABLE]
Proof.
We want to show that the law of satisfies the assumptions of Lemma 6.5 for each . The joint law of is given by the convex potential . Now satisfies and is asymptotically approximable by trace polynomials. Thus, the law of has a large limit given by Theorem 5.2. In fact, the large limit must be non-commutative law of because of Voiculescu’s asymptotic freeness theorem [52] and because the non-commutative law of converges to the non-commutative law of . (Alternatively, this could be proved the same way as [29, Lemma 7.4].)
Since the non-commutative law of converges in probability to that of , the non-commutative law of converges in probability to that of , and thus (A) of Lemma 6.5 holds. Moreover, Lemma 2.12 shows that
[TABLE]
From this it is not hard to show that satisfies (D).
It remains to check (B) and (C). The potential for is given by
[TABLE]
which follows by applying the change of variables formula for the density. Here we write to emphasize that this variable corresponds to rather than . Note that is uniformly convex and semi-concave since it is the composition of with an invertible linear transformation. Also,
[TABLE]
is asymptotically approximable by trace polynomials. The potential corresponding to is
[TABLE]
Since is uniformly convex, the integrand vanishes rapidly at , and thus it is straightforward to differentiate under the integral by dominated convergence, and deduce that is continuously differentiable. Furthermore,
[TABLE]
so that
[TABLE]
or in other words is given by the conditional expectation
[TABLE]
Now we apply Theorem 5.9 using the potential and conditioning on to conclude conclude that is asymptotically approximable by trace polynomials, which establishes (B).
Furthermore, Theorem 5.9 implies that
[TABLE]
This implies that (C) of Lemma 6.5 holds with , using Remark 4.13.
Therefore, we may apply Lemma 6.5 to to obtain that (6.8) holds for every , that is,
[TABLE]
For the second claim (6.9) regarding and , it remains to show that
[TABLE]
We just showed the integrand converges pointwise. But we can take the limit inside the integral by the dominated convergence theorem, because by Lemma 6.3, we have
[TABLE]
and we also know that is bounded as because it converges to . ∎
Remark 6.7*.*
Of course, (6.10) leads to the same conclusion as Lemma 6.1. Indeed, is the score function for , and Lemma 6.1 says that is the conditional expectation of given and .
Remark 6.8*.*
In [29, §7], we did not use the conditional expectation method to prove is asymptotically approximable by trace polynomials, but rather we analyzed the evolution of directly using PDE semigroups. The proof given here for convergence of entropy is thus considerably shorter. However, our results on the evolution of will come in handy for our construction of transport in the next section.
7. Conditional Transport to Gaussian
In this section, we prove our main results about transport (Theorems 7.11 and 7.13). Suppose that is a potential as in Assumption 5.1, is the corresponding probability distribution and that is a random variable with this law. Let be an independent -tuple of GUE matrices. Let be the law of .
The evolution of the potential corresponding to was studied in [29], and in particular, we established a dimension-independent way to obtain from using operations that preserve asymptotic approximability by trace polynomials. By solving an ODE in terms of , we will obtain transport maps such that
[TABLE]
Upon renormalizing and taking the limit as or goes to , we obtain transport to the law of .
To make each part of the proof more computationally tractable, we proceed in stages. Up until §7.5, we fix (and thus suppress it in the notation). First, in §7.1, we describe the basic construction of transport for functions of alone (imagining that we have frozen the variable ). In §7.2, we describe the properties of . Next, §7.3 proves Lipschitz estimates for the transport maps .
In §7.4, we introduce renormalized transport maps that transport to , where is the law of . The renormalized transport map is the same one used by Otto and Villani in their proof of the Talagrand transportation-entropy inequality [39, §4, proof of Lemma 2], in the special case where the target measure is Gaussian (and generalized to the conditional setting). We will explain this inequality further in §8.3.
The new element in our paper is the analysis of the large and large limits of the transport maps. In §7.4, we show that the limit as or tends to exists. Then in §7.5, we use the machinery of asymptotic approximation by trace polynomials to study the large limit of . In order to get dimension-independent estimates for convergence as or tends to , we conduct a finer analysis of convexity properties of and Lipschitz properties of . It is convenient to carry out the earlier stages of this analysis in §7.2 and §7.3 for rather than .
7.1. Basic Construction of Transport
In this section, we will fix and fix a function in for some . Later, we will allow to depend on and to depend on another self-adjoint tuple , but we prefer to simplify notation for the sake of carrying out the basic computation.
Let be the probability measure with density where . We showed in [29] that the density of is , where solves the equation
[TABLE]
Because solves the heat equation, we know that is a smooth function of for and a continuous function of for . Moreover, for each as proved in Theorem 6.1 (1) of [29].
Now we can describe explicit transport functions such that for all .
Proposition 7.1**.**
Let , , , and be as above.
- (1)
There exists a unique family of functions for such that
[TABLE] 2. (2)
* and in particular .* 3. (3)
.
Proof.
(1) Because , we know that is -Lipschitz with respect to . Hence, given , by the Picard-Lindelöf theorem, the initial value problem (7.2) has a solution defined for all .
(2) Fix , , and and fix . Let be the function defined by and . By definition of the functions , we have and . So also satisfies the initial value problem and . Therefore, , so that .
(3) We first prove the claim for . Because is smooth, it follows that is smooth for by standard theory of smooth dependence for ODE. Let denote the Jacobian linear transformation (differential) of . Let is the density of . As a consequence of the change-of-variables formula for multivariable integrals, we see that the density of is
[TABLE]
Fix . If , then clearly this reduces to . Therefore, it suffices to show that is a constant function of , or equivalently that
[TABLE]
Recalling smoothness and for and using the differential equations (7.1) for and (7.2) for , we obtain
[TABLE]
Meanwhile, to compute , note that for small ,
[TABLE]
so that
[TABLE]
Using smoothness,
[TABLE]
Since becomes the identity when , we know that for small enough , the linear transformation has positive determinant and is well-defined by power series, so that
[TABLE]
Hence, . This implies that
[TABLE]
completing the proof of the claim for . The equality extends to the case where or is zero because both sides depend continuously on and with respect to the weak topology on measures. ∎
In particular, the map transports to the original law . In other words, if and , then and . This implies that . This suggests that we can find a transport map from the law of to the law of as the large limit of . In the interest of efficiency, we postpone the details of this argument until after we introduce the dependence on the other set of parameters .
7.2. Conditional Hamilton-Jacobi-Bellman Equation and Semigroups
Let us now fix and fix a potential in for some . Let be the corresponding law and let be a random variable in distributed according to . Let be the law of , where is an independent tuple of independent GUE.
Our goal is to transport the law to the law . Upon freezing the variable , the methods of the previous section will produce a transport map such that pushes forward the conditional distribution of given to the conditional distribution of given . Specifically, is the solution to the initial value problem
[TABLE]
Then .
We seek to understand the large and large behavior of as a function of rather than simply as a function of for a fixed . To achieve this, we must understand the behavior of and as a functions of . We will first import the results of [29, §6] regarding as a function of , then we will extend them to handle the dependence on .
The potential satisfies
[TABLE]
We express , where is a semigroup acting on convex and semi-concave functions defined as follows. Let
[TABLE]
Then as suggested by Trotter’s formula, we want to express , but for technical convenience we only apply this to dyadic rationals and values of that are powers of . The following is a direct application of [29, Theorems 6.1 and 6.17] to .
Theorem 7.2**.**
There exists a semigroup of nonlinear operators with the following properties:
- (1)
Change in Convexity:* If , then .* 2. (2)
Approximation by Iteration:* For and , denote . Suppose and .*
- (a)
If , then
[TABLE] 2. (b)
. 3. (3)
Continuity in Time:* Suppose and . Then*
- (a)
. 2. (b)
. 3. (c)
If , then . 4. (4)
Differential Equation:* is continuous as a function of on and smooth on , and it satisfies (7.3), and we have .*
Result (1) regarding convexity and semi-concavity only applies to as a function of for a fixed . We now extend this result to control the dependence on , using the same techniques as in [29, Lemma 6.6]. As remarked in that paper, this type of analysis of is standard in the PDE literature on viscosity solutions.
We use the following notation, as in Definition 2.1: Consider a function on . Let us write to mean that
[TABLE]
and similarly let us write to mean that
[TABLE]
Lemma 7.3**.**
Suppose that and that
[TABLE]
Then
- (1)
. 2. (2)
. 3. (3)
. 4. (4)
.
Proof.
(1) This is left as an exercise.
(2) The proof is a modification of that of [29, Lemma 6.6], which proves an analogous result in the simpler case of functions of without the extra variable . Fix and . Because the function is uniformly convex with respect to , it has a unique minimizer . This minimizer must be a critical point with respect to the first variable, and hence
[TABLE]
that is,
[TABLE]
Let and , so that . Our assumption implies that
[TABLE]
where
[TABLE]
Note that implies since monotonicity of is immediate from the definition. One can compute and directly as in Lemma 6.4 (2) and the proof of Lemma 6.6 in [29] and obtain
[TABLE]
where the last two lines following from substituting and that the infimum defining is achieved at . The analogous formula for holds as well. The functions and thus provide second-order Taylor expansions from above and below for the function with respect to at the point . Looking at the first-order terms in the expansions shows that is differentiable at with
[TABLE]
which proves (2).
(3) We examine the second-order terms of upper and lower Taylor expansions and and apply the claim (2) (1) from Lemma 2.2. This is the same argument as in the proof of [29, Proposition 2.13 (2)].
(4) Recall that if , then . Using this fact together with (3) iteratively, we see that if is a dyadic rational and , then
[TABLE]
In light of Theorem 7.2 (2), this will also hold in the limit as , since for any two self-adjoint matrices and , the family of functions with is closed under pointwise limits. Similarly, using Theorem 7.2 (3), we extend this to all real . ∎
Remark 7.4*.*
The convexity conditions of Lemma 7.3 (4) can alternatively be deduced from [9, Theorem 4.3]. However, it is convenient for us to use Theorem 7.2 here because we want the dimension-independent time-continuity estimates Theorem 7.2 (3) in the proof of Theorem 7.11 below.
7.3. Lipschitz Estimates for Conditional Transport
This subsection proves the technical estimate Lemma 7.6 on the Lipschitz seminorm of . This depends crucially on the convexity properties of .
Lemma 7.5**.**
[TABLE]
Proof.
First, note that
[TABLE]
By Lemma 2.2, the first term on the right hand side of (7.4) can be estimated by
[TABLE]
To handle the second term on the right hand side of (7.4), define
[TABLE]
and recall that is convex and is concave and in particular
[TABLE]
Note that
[TABLE]
Therefore,
[TABLE]
Now we apply Lemma 2.3 to with the matrix and conclude that
[TABLE]
Combining this estimate for the second term of (7.4) with our earlier estimate for the first term completes the proof. ∎
Lemma 7.6**.**
We have
[TABLE]
and
[TABLE]
Proof.
Fix and and in and define
[TABLE]
Note that is locally Lipschitz, hence absolutely continuous. Also,
[TABLE]
where we have applied Lemma 7.5. It follows that whenever ,
[TABLE]
On the other hand, since , any point where is zero and is differentiable must be a critical point, so when the estimate is vacuously true. This inequality implies
[TABLE]
where in the last line we have observed that . Hence for
[TABLE]
Now we substitute and and rearrange to obtain
[TABLE]
This proves the asserted estimates in the case where . The argument for the case is similar. Here we use the lower bound rather than the upper bound in Lemma 7.5 and get
[TABLE]
so that
[TABLE]
Now we take and obtain
[TABLE]
which yields the desired estimates. ∎
7.4. Transport in the Large Limit
We remind the reader here that we are still working in the finite-dimensional setting for a fixed value of which is suppressed in the notation. To understand the large limit of our transport maps, consider the renormalized law
[TABLE]
A brief computation shows that the corresponding potential is
[TABLE]
(here the potential is only well-defined up to an additive constant because the probability measure includes a normalizing constant anyway, so we made a convenient choice of the additive constant). This potential satisfies the equation
[TABLE]
We remark that if is the density at time and is the Gaussian density, then
[TABLE]
In other words, evolves according to the diffusion semigroup with respect to Gaussian measure (compare equation (33) of [39]), while the heat equation represents diffusion with respect to Lebesgue measure.
The transport functions are renormalized as follows. Because pushes forward to , we may compute that pushes forward , where
[TABLE]
Moreover, from the differential equation (7.2), we deduce that
[TABLE]
As , the law converges to the law of , which we denote . Thus, if we show that has a limit as or , we will be able to transport our given law of to the law of . As the first step, we deduce from Lemma 7.6 the following Lipschitz estimates on which are uniform in and . Note also that the coefficient of goes to zero as .
Lemma 7.7**.**
We have
[TABLE]
and
[TABLE]
In particular,
[TABLE]
Proof.
For the first estimate, for the case where , direct substitution of (7.9) into (7.5) of Lemma 7.6 shows that
[TABLE]
The function is clearly monotone on and achieves the values and at [math] and respectively, and hence is between and . Hence,
[TABLE]
The case where follows by the same argument, where the bound this time is .
For the second estimate, we apply (7.6). Note in (7.6), in the case , we may use and thus in both cases or ,
[TABLE]
This implies that
[TABLE]
where we have again applied .
For the last estimate (7.13), observe that
[TABLE]
Lemma 7.8**.**
Let denote the function . Then
[TABLE]
and
[TABLE]
Proof.
Let . Then (7.10) says that
[TABLE]
Moreover, we have
[TABLE]
We can bound above and below by subtracting from both sides, which after some computation reduces to
[TABLE]
Therefore, we have where
[TABLE]
We claim that . If the first term is negative, then it is automatically, but if it is positive, then and hence
[TABLE]
Similarly, if is negative, there is nothing to prove, but otherwise , and hence
[TABLE]
But implies that is -Lipschitz in . Therefore,
[TABLE]
Applying (7.11) in the case where , we get
[TABLE]
Hence,
[TABLE]
which proves the desired estimate (7.14).
To check the second estimate (7.15), first observe
[TABLE]
Moreover, . Therefore, using (7.12) and (7.14),
[TABLE]
Proposition 7.9**.**
The limits and exist for . More precisely, let and be a pair of random variables with the laws and as above. Then
[TABLE]
and
[TABLE]
The estimates of Lemmas 7.7 and 7.8 extend to the cases where or is infinite, where we define . Moreover, if , then we have the relation when .
Remark 7.10*.*
We have written the explicit form of the estimates here in order to emphasize that the bounds are dimension-independent; they only depend on the parameters , , , , , , , and . The estimates also become sharper when and are close to , which would include the situation where is a perturbation of the quadratic potential . This perturbative setting was studied first in the literature, for instance in [21] and [23]; see [29, §8.3] for further discussion.
Proof.
We first consider the case where is fixed and . Note that by (7.11),
[TABLE]
By Lemma 7.8,
[TABLE]
where . Then we apply Lemma 2.5 to with the random variable . Note that has mean and variance . Moreover,
[TABLE]
Thus, by Lemma 2.5,
[TABLE]
Plugging this into (7.18), we see that is Cauchy in as . Moreover, we obtain the estimate (7.16) by taking in (7.19) and multiplying by .
Now let us fix and consider when and approach . The argument for this case is similar but antisymmetrical. We estimate
[TABLE]
where the last line follows from (7.14). Then by applying Lemma 2.5 to the function and the random variable , together with (7.13), we obtain
[TABLE]
This produces an estimate on which shows that is Cauchy as , so that is well-defined. The explicit bound on the rate of convergence follows fixing and , combining the above estimates, and taking .
Finally, since we have established convergence of as or approaches , a routine argument with limits will extend the estimates of Lemmas 7.7 and 7.8, and the transport relations, to the cases where or is . ∎
7.5. Transport in the Large Limit
If and is asymptotically approximable by trace polynomials, then we must show that the associated sequence of transport maps is asymptotically approximable by trace polynomials, and hence conclude that they define transport for the non-commutative random variables in the large limit.
Theorem 7.11**.**
For , let be a potential on satisfying Assumption 5.1 for some , and let be the corresponding probability measures on . Let be a random variable given by and let be an independent GUE -tuple. Let
[TABLE]
and let be the corresponding potential. Similarly, let be the law of . For , let be the solution of the initial value problem
[TABLE]
Then
- (1)
The family extends continuously to . 2. (2)
. 3. (3)
. 4. (4)
For , the sequence is -Lipschitz for all , , and , and it is asymptotically approximable by trace polynomials as .
Proof.
Recall in §7.4 we defined by renormalizing . However, that definition is equivalent to the definition of given in this theorem because both definitions produce a solution to the ODE (7.10). Of course, global uniqueness of the solution holds because the vector field is uniformly Lipschitz in on any compact time interval (as we discuss in more detail below).
So claims (1), (2), and (3) follow immediately from Proposition 7.9. The estimate for the Lipschitz norm of was shown in (7.13).
We finish by showing asymptotic approximability using the results of §4.1. Let . By Theorem 7.2 (3c), is uniformly continuous in on . Since , it follows that is uniformly continuous in on for every , with modulus of continuity independent of , and recall it is also uniformly Lipschitz in , since .
Consequently, is uniformly continuous in on and uniformly Lipschitz in . Also, we showed that is asymptotically approximable by trace polynomials in the proof of Theorem 6.6, and hence so is . Thus, satisfies Assumption 4.6, so we may apply Proposition 4.7 to deduce that is asymptotically approximable by trace polynomials for . This property extends to the case where or is infinite using Lemma 3.13 and Proposition 7.9. ∎
Remark 7.12*.*
Rather than citing the proof of Theorem 6.6, one could also argue that is asymtotically approximable directly from the construction of the semigroup using the same reasoning as [29, Proposition 6.8]. Moreover, this method would also show that is asymptotically approximable by trace polynomials provided we can prove analogues of Theorem 7.2 (2) and (3) for rather than only . However, all this is unnecessary work for our present purpose.
Theorem 7.13**.**
With all the notation of the previous theorem, let be a non-commutative random variable distributed according to the limiting free Gibbs law , let be a freely independent free semicircular -tuple, and let . Define by . For , we have
- (1)
* is -Lipschitz with respect to .* 2. (2)
. 3. (3)
* in non-commutative law.* 4. (4)
We have
[TABLE]
where is the universal constant from Proposition 3.17.
In particular, is isomorphic to , which is the free product .
Proof.
We know that there exists such that because of Lemma 3.6. Then (1) and (2) follow from the corresponding properties of by straightforward limit arguments.
As remarked in the last proof is asymptotically approximable by trace polynomials. We also know is uniformly convex and semi-concave, and thus by Theorem 5.2, the non-commutative law of converges in probability to some non-commutative law. Of course, the limiting non-commutative law must be the non-commutative law of because the joint non-commutative law of converges in probability to that (as in the proof of Theorem 6.6).
With this relation between the laws of and in hand, we can prove (3) by taking the large limit using Corollary 5.3. Indeed, if is -uniformly continuous, then is also -uniformly continuous and asymptotically approximable by trace polynomials by Lemma 3.12. Thus, applying Corollary 5.3 to this function and the function , we get
[TABLE]
Hence, for all that are uniformly continuous in . But by Proposition 3.14 such functions can realize every element in the -algebra generated by , and in particular all the non-commutative polynomials in . Hence, in non-commutative law as desired.
(4) Note that
[TABLE]
but in non-commutative law. Hence, it suffices to prove the desired estimate for rather than . Now arises as the large limit of the matrix models given by potential . By Lemma 7.3 (4), we have , so that
[TABLE]
By Remark 5.8, there exists a sequence of random matrix models for given by uniformly convex potentials which are also unitarily invariant (even if this is not true of our original model), with the same lower bound for the Hessian of the potential. Therefore, by Proposition 3.17,
[TABLE]
We finish by substituting the estimate
[TABLE]
which follows from (7.15) and Lemma 3.11 (the latter lemma is needed since the original statement of (7.15) is for the finite-dimensional setting for a fixed ).
The last claim regarding -algebras follows from (3) by examining the case with and or vice versa. ∎
8. Applications
We show that Assumption 5.1 is preserved under independent joins, marginals, convolution, and linear changes of variables. We conclude that for the convex free Gibbs laws considered here, satisfies additivity under conditioning. Moreover, by iterating our conditional transport results, we obtain “lower-triangular” transport maps from a convex free Gibbs law to the law of a free semicircular family, which also satisfy the entropy-cost inequality relative to the semicircular law, analogous to the triangular transport achieved in the classical case by [6, Corollary 3.10].
8.1. Operations on Convex Gibbs Laws
Recall that Assumption 5.1 for a sequence of potentials states that for some constants and , the sequence is asymptotically approximable by trace polynomials, and is a scalar matrix for each , where is the measure associated to .
Proposition 8.1**.**
Suppose that and satisfy Assumption 5.1 for some . Then also satisfies Assumption 5.1 for the same and .
Moreover, let , , and be the measures associated to , , and respectively, and let , , and be the respective limiting free Gibbs laws given by Theorem 5.2. Then is the independent join of and and is the freely independent join of and .
Proof.
The claim follows because . The claim about asymptotic approximation by trace polynomials follows because and each component is asymptotically approximable by trace polynomials.
The probability density for is the tensor product of the probability densities for and and hence is the independent join of these two marginal laws. It follows that and are scalar matrices, hence Assumption 5.1 holds for .
Let be random variables and let be non-commutative random variables. Then by Theorem 6.6,
[TABLE]
It was shown in [53, Proposition 5.18(c)] that implies that and are freely independent. ∎
Proposition 8.2**.**
Suppose that satisfies Assumption 5.1. Let be the corresponding law, let and let and be the laws of and . Then and are given by a potentials and that also satisfy Assumption 5.1 for the same values of and .
Proof.
By symmetry, it suffices to prove the claims for . First, it is immediate that the mean of under is a scalar, since it is . Moreover, if we define
[TABLE]
then (as in the proof of Theorem 6.6) we may compute by differentiating under the integral and obtain
[TABLE]
It follows by Theorem 5.9 that is asymptotically approximable by trace polynomials.
Finally, the fact that follows from [9, Theorem 4.3], or alternatively by the following reasoning. Let be the law of , where is an independent GUE tuple. The corresponding potential is given by (7.8) and it satisfies
[TABLE]
by direct substitution of (7.8) into Lemma 7.3 (4) and hence
[TABLE]
Now as , the law converges to the law of . By applying Lemma 2.7, is given by some potential satisfying
[TABLE]
However, we know that because the potential corresponding to a law is unique up to an additive constant. This implies that as desired. ∎
Proposition 8.3**.**
Let satisfy Assumption 5.1 for some , and let be the corresponding random variable. Let be an invertible matrix with real entries and let denote the linear transformation given by
[TABLE]
Then is the potential corresponding to , and satisfies Assumption 5.1 with constants and .
Proof.
The fact that is the potential corresponding to follows from change of variables. Now it is immediate that the expectation of is a scalar multiple of identity for each . Next, by the chain rule
[TABLE]
and from this it follows that is asymptotically approximable by trace polynomials. Similarly, by the chain rule,
[TABLE]
The maximum and minimum singular values of are the same as those of , which are and respectively. By a basic linear algebra argument, it follows that . ∎
Proposition 8.4**.**
Let and be two potentials satisfying Assumption 5.1 with constants and . Let and be the corresponding random tuples of matrices. Then the law of is given by another potential satisfying Assumption 5.1 with constants and . Moreover, the free Gibbs state corresponding to is the free convolution of those corresponding to and .
Proof.
Let , which satisfies Assumption 5.1 (with the same constants) by Proposition 8.1. Now let be the matrix
[TABLE]
Since is an isometry, we have and . Therefore, by Proposition 8.3, the law of is given by a potential satisfying Assumption 5.1 with constants and . Then by Proposition 8.2, the law of is given by such a potential with the same constants and .
We showed in Proposition 8.1 that the large limit of the law of given a freely independent join of the corresponding marginals. Hence, the large limit of the law of is given by the free convolution. ∎
As a consequence, we have additivity of entropy under conditioning.
Corollary 8.5**.**
Let be a potential satisfying Assumption 5.1 as in the setup of Theorem 6.6. Let be a tuple of non-commutative random variables distributed according to the limiting free Gibbs law associated to . Then
[TABLE]
Proof.
From standard classical results, we have
[TABLE]
Dividing by and adding to both sides, we obtain the normalized version
[TABLE]
By the previous theorem, we obtain the desired relation for in the limit as . More precisely, we apply the theorem as stated to . Meanwhile, for and we apply the special case of the theorem where we condition on [math] variables. ∎
8.2. Entropy and Fisher Information Relative to Gaussian
As background for our discussion of the entropy-cost inequality in §8.3, we review the entropy of one probability measure relative to another. If is a measure on , then the entropy of relative to is
[TABLE]
whenever the integral is well-defined. The standard entropy corresponds to the choice of Lebesgue measure for .
Remark 8.6*.*
The reader should be careful to distinguish between the relative entropy and the conditional entropy . The first changes the ambient measure while the second describes conditioning on .
Remark 8.7*.*
If and are both probability measures, then . For this reason, many authors choose to change the sign. We will keep the sign convention given above to be consistent with our convention for relative to Lebesgue measure, but we will write absolute value signs around relative entropy when it is natural to use the positive version.
For probability measures on , we may study entropy relative to the Gaussian measure on . A direct computation shows that if is a random variable in , then we have
[TABLE]
We denote the normalized version by
[TABLE]
Similarly, if is a measure on which absolutely continuous with respect to Lebesgue measure and is the corresponding random variable, we define
[TABLE]
which is equivalent to
[TABLE]
where is the conditional distribution of given , and is the marginal law of . Similarly, if is an -tuple of non-commutative random variables, we define the free entropy relative to Gaussian by
[TABLE]
We define the normalized conditional Fisher information relative to Gaussian by
[TABLE]
Note that if this Fisher information is finite and if is the normalized score function for given as in §6.2, then
[TABLE]
because
[TABLE]
where we have evaluated the middle term on the right hand side using integration by parts. Similarly, for an -tuple of non-commutative random variables, we define
[TABLE]
where the second equality holds provided that is finite and is the free score function. We have the following version of (6.6) and Lemma 6.3 for entropy and Fisher information relative to Gaussian.
Lemma 8.8**.**
Let be a random variable in with a density and with finite variance and let be an independent GUE -tuple. Then
[TABLE]
Similarly, suppose that is an -tuple of non-commutative random variables and let be a freely independent free semicircular -tuple. Then
[TABLE]
Proof.
The first formula follows from [39, §4, Lemma 1] after renormalization. However, we will give an argument by a change of variables in (6.6) that will apply to both and . Note that by (6.6)
[TABLE]
and in particular, we know that the integral is well-defined in . Now we do a change of variables in the integral , and obtain
[TABLE]
where we have applied the scaling relation Lemma 6.2 for Fisher information. On the other hand,
[TABLE]
Therefore, altogether
[TABLE]
which is the desired formula. The statement for can be proved by exactly the same computation, since the definition of in (6.7) is completely analogous to (6.6). ∎
Furthermore, the log-Sobolev inequality for the Gaussian measure has the following interpretation for entropy and Fisher’s information. This in fact generalizes to entropy and Fisher’s information relative to any measure satisfying LSI, see [39, Definition 1], but we only use the case where is Gaussian and is sufficiently regular.
Lemma 8.9**.**
Let be a random variable in that has a density with respect to Lebesgue measure. Then
[TABLE]
Proof.
First, it suffices to check the non-conditional version . Indeed, in the conditional case, the left hand side is and the right hand side is , and solving the non-conditional case would allow us to compare the integrands pointwise.
Now suppose that has density with respect to Lebesgue measure and let be the density with respect to Gaussian, so that
[TABLE]
By Corollary 2.11, the measure satisfies the normalized log-Sobolev inequality (2.4) with , so that
[TABLE]
Let . Then reduces to , so the right hand side is . On the other hand, letting , we get
[TABLE]
and hence on the support of , we have
[TABLE]
Thus,
[TABLE]
Hence, the log-Sobolev inequality implies the desired inequality. ∎
8.3. Conditional Transport and the Entropy-Cost Inequality
Now we will show that the transport maps constructed in §7.5 satisfy the Talagrand entropy-cost inequality. It was shown in [39, Theorem 1] that if a measure satisfies the log-Sobolev inequality (2.1) with some constant (and some regularity conditions), then it satisfies the Talagrand inequality
[TABLE]
where is the -Wasserstein distance, which is equivalent to the infimum of over all coupled random variables and with and .
Adapting Otto and Villani’s argument, we will show that the transport maps constructed in §7.5 witness the (conditional) entropy-cost inequality relative to the GUE law for the matrix models and the corresponding free entropy-cost inequality for the non-commutative random variables. This is claim (5) below, while the other claims in Theorem 8.10 summarize the results of our earlier construction.
We remark that the free Talagrand inequality for self-adjoint tuples was studied in greater generality in [28] and [13, §3.3]. Although we restricted ourselves to the case where the target measure is Gaussian/semicircular, our goal in this paper was not merely to estimate the Wasserstein distance using some coupling, but rather to exhibit a coupling that arises from a transport map, and to show Lipschitzness of this transport map.
Theorem 8.10**.**
As in Theorem 7.11, let be a potential on satisfying Assumption 5.1 for some , and let and be the corresponding probability measures and random variables. Let be an independent GUE -tuple. Let be a tuple of non-commutative random variables given by the limiting free Gibbs law and let be a freely independent free semicircular -tuple. Let and . Then there exist functions , and such that
- (1)
We have and in law, and and in non-commutative law. 2. (2)
* and the same holds for and .* 3. (3)
* and .* 4. (4)
We have and , and the same holds for and . 5. (5)
We have
[TABLE]
and
[TABLE]
Proof.
Let and be as in Theorems 7.11 and 7.13. Then let
[TABLE]
The only property that was not shown in the earlier theorems is (5). First, note that as a consequence of (1),
[TABLE]
The rest of the proof of (5) proceeds as in [39, §4]. As in §7.5, let denote the potential corresponding to and recall that
[TABLE]
and hence
[TABLE]
Then we apply Minkowski’s inequality with respect to integration to obtain
[TABLE]
which can be rewritten as
[TABLE]
where we have applied the fact that . It follows from Lemma 8.8 and a change of variables that
[TABLE]
It is easy to see that is bounded on compact sets because of Lemma 6.3 and (8.1). Therefore, we have for almost every ,
[TABLE]
Hence, for almost every ,
[TABLE]
where the last line follows from Lemma 8.9. Therefore,
[TABLE]
where we have employed the fact that by (8.3). This establishes the first claim of (5).
The second claim of (5) follows by taking the large limit using Corollary 5.3 and Theorem 6.6. More precisely, for the left hand side, we take the limit using Corollary 5.3. Meanwhile, for the right hand side, note that because and by Corollary 5.3. ∎
8.4. Construction of Triangular Transport
Finally, by iterating Theorem 8.10, we obtain the following result concerning “lower-triangular transport.” This is analogous to the classical result [6, Corollary 3.10]. Of course, the challenge in our situation was to understand the large behavior of the transport maps in a dimension-independent way. Unfortunately, the transport constructed here is not optimal among triangular mappings, since indeed Otto and Villani’s construction does not produce the optimal transport map.
Theorem 8.11**.**
Let be a potential satisfying Assumption 5.1. Let and be the corresponding law and random variable. Let be the limiting free Gibbs law, and let be an -tuple of non-commutative random variables. Let be an independent GUE -tuple and let be a freely independent free semicircular family. Then there exist functions , and such that
- (1)
* and in law, and similarly, and in non-commutative law.* 2. (2)
* and are inverse functions of each other, and the same holds for and .* 3. (3)
* and .* 4. (4)
* is upper triangular in the sense that*
[TABLE]
and the same holds for , , and . In particular, the isomorphism induced by maps onto for each , …, . 5. (5)
We have and is bounded by some constant which goes to zero as . 6. (6)
We have
[TABLE]
and
[TABLE] 7. (7)
We have
[TABLE]
where is the universal constant from Proposition 3.17.
Proof.
First, by Proposition 8.2, the marginal law of is given by a convex potential satisfying the same assumptions.
For each , we apply Theorem 8.10 with as the first variable and as the second variable. We thus obtain maps such that
[TABLE]
Let
[TABLE]
Let . Then we can check by backwards induction on that
[TABLE]
Indeed, the base case is trivial. For the induction step, suppose the claim holds for . Since is a function of , …, , then the induction hypothesis implies that
[TABLE]
where the last line follows because and because , …, are independent of the other variables. By Theorem 8.10, is asymptotic to some , and the objects , , and satisfy the analogous transport relations in the non-commutative setting. Now because each is -Lipschitz, we see that is -Lipschitz.
By Theorem 8.10, there is a map such that is the inverse of . Define by induction by
[TABLE]
Then is the inverse of . Since is -Lipschitz, we can show by induction that is bounded by a constant depending only on , , and , and which goes to zero as . Moreover, by Lemma 3.12, is asymptotic to some .
This concludes the verification of (1) - (5). Now to prove (6), we apply Theorem 8.10 (5) and get
[TABLE]
where we have applied the definition of and the classical fact that is additive under conditioning. As before, because , we see that . Finally, the second claim of (6) regarding the free case follows by taking the limit as .
Finally, to prove (7), recall that the map is a special case of the map in Theorem 7.13. Thus, by applying Theorem 7.13 (4) in the case where and , we obtain . Moreover, the middle quantity in claim (7) equals the left hand side because . ∎
Funding
This work was supported by the National Science Foundation [grant DMS-1762360] and the UCLA graduate division.
Acknowledgements
I thank Dima Shlyakhtenko, Ben Hayes, Brent Nelson, Yoann Dabrowski, Yoshimichi Ueda, and Todd Kemp for various useful conversations and comments on drafts of this paper. The results of this paper were motivated in part by discussions with Ben Hayes regarding free entropy and maximal amenable subalgebras. Dima Shlyakhtenko suggested the name “triangular transport.” The anonymous referees suggested several references and improvements to the exposition, including the connection with model theory.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] G. W. Anderson, A. Guionnet, and O. Zeitouni , An Introduction to Random Matrices , Cambridge Studies in Advanced Mathematics, Cambridge University Press, 2009.
- 2[2] C. Antharaman and S. Popa , An Introduction to II 1 Factors , 2017. preprint available at http://www.math.ucla.edu/ popa/Books/I Iun-v 10.pdf.
- 3[3] P. Biane, M. Capitaine, and A. Guionnet , Large deviation bounds for matrix Brownian motion , Invent. Math., 152 (2003), pp. 433–459.
- 4[4] S. G. Bobkov , Large deviations via transference plans , Advances in Mathematics Research, 2 (2003), pp. 151-175.
- 5[5] S. G. Bobkov and M. Ledoux , From Brunn-Minkowski to Braskamp-Lieb and to logarithmic Sobolev inequalities , Geom. Funct. Anal., 10 (2000), pp. 1028–1052.
- 6[6] V. I. Bogachev, A. V Kolesnikov, and K. V. Medvedev , Triangular transformations of measures , Sb. Math., 196.3 (2005), pp. 309–335.
- 7[7] R. Boutonnet and A. Carderi , Maximal amenable von neumann subalgebras arising from maximal amenable subgroups , Geometric and Functional Analysis, 25 (2015), pp. 1688–1705.
- 8[8] R. Boutonnet and C. Houdayer , Amenable absorption in amalgamated free product von neumann algebras , Kyoto J. Math., 58 (2018), pp. 583–593.
