Conditional Expectation, Entropy, and Transport for Convex Gibbs Laws in   Free Probability

David Jekel

arXiv:1906.10051·math.OA·January 8, 2020

Conditional Expectation, Entropy, and Transport for Convex Gibbs Laws in Free Probability

David Jekel

PDF

TL;DR

This paper establishes a connection between free Gibbs laws and semicircular families using conditional expectations, entropy, and transport, providing new isomorphisms and inequalities in free probability.

Contribution

It introduces a novel approach to construct measure transport and isomorphisms between free Gibbs laws and semicircular families via matrix models.

Findings

01

Conditional expectations and entropy converge from matrix models to free Gibbs laws.

02

Constructed measure transport maps induce isomorphisms between free probability algebras.

03

Proved Talagrand inequality for free Gibbs laws relative to semicircular laws.

Abstract

Let $(X_{1}, \dots, X_{m})$ be self-adjoint non-commutative random variables distributed according to the free Gibbs law given by a sufficiently regular convex and semi-concave potential $V$ , and let $(S_{1}, \dots, S_{m})$ be a free semicircular family. We show that conditional expectations and conditional non-microstates free entropy given $X_{1}$ , \dots, $X_{k}$ arise as the large $N$ limit of the corresponding conditional expectations and entropy for the random matrix models associated to $V$ . Then by studying conditional transport of measure for the matrix models, we construct an isomorphism $W^{*} (X_{1}, \dots, X_{m}) \to W^{*} (S_{1}, \dots, S_{m})$ which maps $W^{*} (X_{1}, \dots, X_{k})$ to $W^{*} (S_{1}, \dots, S_{k})$ for each $k = 1, \dots, m$ , and which also witnesses the Talagrand inequality for the law of $(X_{1}, \dots, X_{m})$ relative to the law of $(S_{1}, \dots, S_{m})$ .

Equations852

d μ^{(N)} (x) = \frac{1}{Z ^{(N)}} e^{- N^{2} V^{(N)} (x)} d x,

d μ^{(N)} (x) = \frac{1}{Z ^{(N)}} e^{- N^{2} V^{(N)} (x)} d x,

V^{(N)} (x) = \frac{1}{2} j = 1 \sum m τ_{N} (x_{j}^{2})

V^{(N)} (x) = \frac{1}{2} j = 1 \sum m τ_{N} (x_{j}^{2})

τ_{N} (p (X_{1}^{(N)}, \dots, X_{m}^{(N)})) \to τ (p (X_{1}, \dots, X_{m})) in probability for every non-commutative polynomial p;

τ_{N} (p (X_{1}^{(N)}, \dots, X_{m}^{(N)})) \to τ (p (X_{1}, \dots, X_{m})) in probability for every non-commutative polynomial p;

E [τ_{N} (D_{x_{j}} V^{(N)} (X^{(N)}) p (X^{(N)}))] = E [τ_{N} \otimes τ_{N} (\partial_{x_{j}} p (X^{(N)}))],

E [τ_{N} (D_{x_{j}} V^{(N)} (X^{(N)}) p (X^{(N)}))] = E [τ_{N} \otimes τ_{N} (\partial_{x_{j}} p (X^{(N)}))],

τ (D_{x_{j}} V (X) p (X)) = τ \otimes τ (\partial_{x_{j}} p (X));

τ (D_{x_{j}} V (X) p (X)) = τ \otimes τ (\partial_{x_{j}} p (X));

u (x) = u (x_{0}) + ⟨ D u (x_{0}), x - x_{0} ⟩_{2} + \frac{1}{2} ⟨ H u (x_{0}) (x - x_{0}), x - x_{0} ⟩_{2} + o (∥ x - x_{0} ∥_{2}^{2}) .

u (x) = u (x_{0}) + ⟨ D u (x_{0}), x - x_{0} ⟩_{2} + \frac{1}{2} ⟨ H u (x_{0}) (x - x_{0}), x - x_{0} ⟩_{2} + o (∥ x - x_{0} ∥_{2}^{2}) .

(X^{(N)}, Y^{(N)}) = (X_{1}^{(N)}, \dots, X_{m}^{(N)}, Y_{1}^{(N)}, \dots, Y_{n}^{(N)})

(X^{(N)}, Y^{(N)}) = (X_{1}^{(N)}, \dots, X_{m}^{(N)}, Y_{1}^{(N)}, \dots, Y_{n}^{(N)})

N \to \infty lim x \in M_{N} (C)_{s a}^{m} ∥ x ∥_{\infty} \leq R sup f^{(N)} (x) - f (x)_{2} = 0,

N \to \infty lim x \in M_{N} (C)_{s a}^{m} ∥ x ∥_{\infty} \leq R sup f^{(N)} (x) - f (x)_{2} = 0,

∥ F (X) - X ∥_{2}^{2} \leq ∥ X ∥_{2}^{2} + m lo g 2 π - 2 χ^{*} (X),

∥ F (X) - X ∥_{2}^{2} \leq ∥ X ∥_{2}^{2} + m lo g 2 π - 2 χ^{*} (X),

d μ (x) = \frac{1}{Z} e^{- N^{2} V (x)} d x,

d μ (x) = \frac{1}{Z} e^{- N^{2} V (x)} d x,

\frac{1}{2} ⟨ A (x^{'} - x), x^{'} - x ⟩ \leq u (x^{'}) - u (x) - ⟨ y, x^{'} - x ⟩_{2} \leq \frac{1}{2} ⟨ B (x^{'} - x), x^{'} - x ⟩_{2}

\frac{1}{2} ⟨ A (x^{'} - x), x^{'} - x ⟩ \leq u (x^{'}) - u (x) - ⟨ y, x^{'} - x ⟩_{2} \leq \frac{1}{2} ⟨ B (x^{'} - x), x^{'} - x ⟩_{2}

⟨ A (x^{'} - x), x^{'} - x ⟩_{2} \leq ⟨ D u (x^{'}) - D u (x), x^{'} - x ⟩_{2} \leq ⟨ B (x^{'} - x), x^{'} - x ⟩_{2}

⟨ A (x^{'} - x), x^{'} - x ⟩_{2} \leq ⟨ D u (x^{'}) - D u (x), x^{'} - x ⟩_{2} \leq ⟨ B (x^{'} - x), x^{'} - x ⟩_{2}

- \frac{C}{2} ∥ x^{'} - x ∥_{2}^{2} \leq u (x^{'}) - u (x) - ⟨ y, x^{'} - x ⟩_{2} \leq \frac{C}{2} ∥ x^{'} - x ∥_{2}^{2} .

- \frac{C}{2} ∥ x^{'} - x ∥_{2}^{2} \leq u (x^{'}) - u (x) - ⟨ y, x^{'} - x ⟩_{2} \leq \frac{C}{2} ∥ x^{'} - x ∥_{2}^{2} .

⟨ D u (x^{'}) - D u (x), x^{'} - x ⟩_{2} = \int_{0}^{1} ⟨ H u (x + t (x^{'} - x)) (x^{'} - x), x^{'} - x ⟩_{2} d t,

⟨ D u (x^{'}) - D u (x), x^{'} - x ⟩_{2} = \int_{0}^{1} ⟨ H u (x + t (x^{'} - x)) (x^{'} - x), x^{'} - x ⟩_{2} d t,

u (x^{'}) - u (x) = \int_{0}^{1} ⟨ D u (x + t (x^{'} - x)), x^{'} - x ⟩ d t .

u (x^{'}) - u (x) = \int_{0}^{1} ⟨ D u (x + t (x^{'} - x)), x^{'} - x ⟩ d t .

u (x^{'}) - u (x) - ⟨ D u (x), x^{'} - x ⟩_{2}

u (x^{'}) - u (x) - ⟨ D u (x), x^{'} - x ⟩_{2}

= \int_{0}^{1} \frac{1}{t} ⟨ D u (x + t (x^{'} - x)) - D u (x), [x + t (x^{'} - x)] - x ⟩_{2} d t

\leq \int_{0}^{1} \frac{1}{t} ⟨ B [t (x^{'} - x)], t (x^{'} - x) ⟩_{2} d t

= \frac{1}{2} ⟨ B (x^{'} - x), x^{'} - x ⟩_{2} .

u (x^{'}) - u (x) + \frac{1}{2} ⟨ A x^{'}, x^{'} ⟩_{2} - \frac{1}{2} ⟨ A x, x ⟩_{2} \geq ⟨ y, x - x^{'} ⟩_{2} .

u (x^{'}) - u (x) + \frac{1}{2} ⟨ A x^{'}, x^{'} ⟩_{2} - \frac{1}{2} ⟨ A x, x ⟩_{2} \geq ⟨ y, x - x^{'} ⟩_{2} .

∣ ⟨ D u (x) - D u (x^{'}), y ⟩_{2} ∣ \leq ⟨ A (x - x^{'}), x - x^{'} ⟩_{2}^{1/2} ⟨ A y, A y ⟩_{2}^{1/2},

∣ ⟨ D u (x) - D u (x^{'}), y ⟩_{2} ∣ \leq ⟨ A (x - x^{'}), x - x^{'} ⟩_{2}^{1/2} ⟨ A y, A y ⟩_{2}^{1/2},

⟨ D u (x) - D u (x^{'}), y ⟩

⟨ D u (x) - D u (x^{'}), y ⟩

\leq \int_{0}^{1} ⟨ H u (t x + (1 - t) x^{'}) (x - x^{'}), x - x^{'} ⟩_{2}^{1/2} ⟨ H u (t x + (1 - t) x^{'}) y, y ⟩_{2}^{1/2} d t

\leq \int_{0}^{1} ⟨ A (x - x^{'}), x - x^{'} ⟩^{1/2} ⟨ A y, y ⟩_{2}^{1/2} d t

= ⟨ A (x - x^{'}), x - x^{'} ⟩_{2}^{1/2} ⟨ A y, y ⟩_{2}^{1/2} . \qed

E [D V (X)] = 0

E [D V (X)] = 0

\frac{m}{C} \leq E ∥ X - E (X) ∥_{2}^{2} \leq \frac{m}{c} .

\frac{m}{C} \leq E ∥ X - E (X) ∥_{2}^{2} \leq \frac{m}{c} .

E ⟨ D V (X) - D V (E (X)), X - E (X) ⟩_{2} = E ⟨ D V (X), X - E (X) ⟩_{2} = m .

E ⟨ D V (X) - D V (E (X)), X - E (X) ⟩_{2} = E ⟨ D V (X), X - E (X) ⟩_{2} = m .

c E ∥ X - E (X) ∥_{2}^{2} \leq E ⟨ D V (X) - D V (E (X)), X - E (X) ⟩_{2} \leq C E ∥ X - E (X) ∥_{2}^{2} .

c E ∥ X - E (X) ∥_{2}^{2} \leq E ⟨ D V (X) - D V (E (X)), X - E (X) ⟩_{2} \leq C E ∥ X - E (X) ∥_{2}^{2} .

∥ G (x) ∥_{2} \leq ∥ E (G (X)) ∥_{2} + ∥ G ∥_{Lip} (∥ x - E (X) ∥_{2} + (E ∥ X - E (X) ∥_{2}^{2})^{1/2}) .

∥ G (x) ∥_{2} \leq ∥ E (G (X)) ∥_{2} + ∥ G ∥_{Lip} (∥ x - E (X) ∥_{2} + (E ∥ X - E (X) ∥_{2}^{2})^{1/2}) .

∥ G (x) - E (G (X)) ∥_{2}

∥ G (x) - E (G (X)) ∥_{2}

\leq ∥ G ∥_{Lip} (∥ x - E (X) ∥_{2} + E ∥ X - E (X) ∥_{2})

\leq ∥ G ∥_{Lip} (∥ x - E (X) ∥_{2} + (E ∥ X - E (X) ∥_{2}^{2})^{1/2}) . \qed

∥ D V (x) ∥_{2} \leq C (∥ x - E (X) ∥_{2} + \frac{m ^{1/2}}{c ^{1/2}}) .

∥ D V (x) ∥_{2} \leq C (∥ x - E (X) ∥_{2} + \frac{m ^{1/2}}{c ^{1/2}}) .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Conditional Expectation, Entropy, and Transport for Convex Gibbs Laws in Free Probability

David Jekel

Department of Mathematics, UCLA, Los Angeles, CA 90095

[email protected] www.math.ucla.edu/$\sim$davidjekel/

Abstract.

Let $(X_{1},\dots,X_{m})$ be self-adjoint non-commutative random variables distributed according to the free Gibbs law given by a sufficiently regular convex and semi-concave potential $V$ , and let $(S_{1},\dots,S_{m})$ be a free semicircular family. We show that conditional expectations and conditional non-microstates free entropy given $X_{1}$ , …, $X_{k}$ arise as the large $N$ limit of the corresponding conditional expectations and entropy for the random matrix models associated to $V$ . Then by studying conditional transport of measure for the matrix models, we construct an isomorphism $\mathrm{W}^{*}(X_{1},\dots,X_{m})\to\mathrm{W}^{*}(S_{1},\dots,S_{m})$ which maps $\mathrm{W}^{*}(X_{1},\dots,X_{k})$ to $\mathrm{W}^{*}(S_{1},\dots,S_{k})$ for each $k=1,\dots,m$ , and which also witnesses the Talagrand inequality for the law of $(X_{1},\dots,X_{m})$ relative to the law of $(S_{1},\dots,S_{m})$ .

Key words and phrases:

free Gibbs state, free entropy, free transport, free group factor, invariant random matrix ensembles, asymptotic random matrix theory, Talagrand inequality

1991 Mathematics Subject Classification:

Primary: 46L54, Secondary: 35K10, 37A35, 46L52, 46L53, 60B20

1. Introduction

1.1. Motivation

Free probability initiated a fruitful exchange between random matrix theory and operator algebras. In many situations, tuples of $N\times N$ random matrices $(X_{1}^{(N)},\dots,X_{m}^{(N)})$ can be described in the large $N$ limit by non-commutative random variables $X_{1}$ , …, $X_{m}$ which are operators in a tracial $\mathrm{W}^{*}$ -algebra. Conversely, many properties of non-commutative random variables (and the $\mathrm{W}^{*}$ -algebras that they generate) are easier to understand when they can be simulated by finite-dimensional random matrix models. For instance, Voiculescu used free entropy, defined in terms of matricial microstates, to prove the absence of Cartan subalgebras for free group $\mathrm{W}^{*}$ -algebras $L(\mathbb{F}_{n})$ [50]; similar techniques were used to give sufficient conditions for a von Neumann algebra to be non-prime and non-Gamma (a convenient list of results and references can be found in [11]). Further applications of random matrices to the properties of $\mathrm{C}^{*}$ - and $\mathrm{W}^{*}$ -algebras can be found for instance in [25] and [22, §4].

Free Gibbs laws are a prototypical example of the connection between random matrices and $\mathrm{W}^{*}$ -algebras. Free Gibbs laws describe the large $N$ behavior of self-adjoint tuples of random matrices $X^{(N)}=(X_{1}^{(N)},\dots,X_{m}^{(N)})$ given by a probability measure $\mu^{(N)}$ of the form

[TABLE]

where $x\in M_{N}(\mathbb{C})_{sa}^{m}$ is a self-adjoint tuple, $dx$ denotes Lebesgue measure, $V^{(N)}:M_{N}(\mathbb{C})_{sa}^{m}\to\mathbb{R}$ is a function (known as a potential) chosen so that $e^{-N^{2}V^{(N)}(x)}$ is integrable, and $Z^{(N)}$ is normalizing constant to make $\mu^{(N)}$ a probability measure. Here $V^{(N)}(x)$ could be given by $V^{(N)}(x)=\tau_{N}(p(x_{1},\dots,x_{m}))$ , where $\tau_{N}=(1/N)\operatorname{Tr}$ and $p$ is a non-commutative polynomial; for instance, taking

[TABLE]

produces the Gaussian unitary ensemble (GUE). Under certain assumptions on $V$ (e.g. convexity and good asymptotic behavior as $N\to\infty$ ), there will be non-commutative random variables $X_{1}$ , …, $X_{m}$ in a tracial $\mathrm{W}^{*}$ -algebra $(\mathcal{M},\tau)$ such that

[TABLE]

see [21, Theorems 3.3 and 3.4], [14, Proposition 50 and Theorem 51], [29, Theorem 4.1]. The random matrix models satisfy the relation, derived from integration by parts, that

[TABLE]

where $D_{x_{j}}V$ is a normalized gradient with respect to the coordinates of $x_{j}$ and $\partial_{x_{j}}$ denotes the free difference quotient, and hence the non-commutative tuple $X=(X_{1},\dots,X_{m})$ satisfies

[TABLE]

see [21, §2.2 - 2.3]. The non-commutative law of a tuple $X$ satisfying such an equation is known as a free Gibbs law for the potential $V$ .

Given sufficient assumptions on $V^{(N)}$ (for instance, Assumption 5.1), many of the classical quantities associated to $X^{(N)}$ will converge in the large $N$ limit to their free counterparts, besides obviously the convergence of the non-commutative moments $\tau_{N}(p(X^{(N)}))$ . For instance, the normalized classical entropy will converge to the microstates free entropy (see [48, §2], [22, Theorem 5.1], [29, §5.2]), and the normalized classical Fisher information will converge to the free Fisher information (see [29, §5.3]). The monotone transport maps of Guionnet and Shlyakhtenko are well-approximated by classical transport maps for the random matrix models [23, Theorem 4.7]. The solutions of classical SDE associated to the random matrix models approximate the solutions of free SDE; see for instance [3], [22, §2], [12, §4].

1.2. Summary of Main Results

This paper will further develop the connection between classical and free probability for convex free Gibbs laws, by studying conditional expectation (§5), conditional entropy and Fisher information (§6), and conditional transport (§7). This is an extension of our previous work [29].

We consider a sequence of random matrix tuples $(X^{(N)},Y^{(N)})=(X_{1}^{(N)},\dots,X_{m}^{(N)},Y_{1}^{(N)},\dots,Y_{n}^{(N)})$ given by a uniformly convex and semi-concave sequence of potentials $V^{(N)}$ such that the normalized gradient $DV^{(N)}$ is asymptotically approximable by trace polynomials (a notion of good asymptotic behavior as $N\to\infty$ defined in §3.3). Then the following results hold:

(1)

The non-commutative moments $\tau_{N}(p(X^{(N)},Y^{(N)}))$ converge in probability to $\tau(p(X,Y))$ for some tuple $(X,Y)$ of non-commutative random variables in a tracial $\mathrm{W}^{*}$ -algebra. See Theorem 5.2. 2. (2)

The classical conditional expectation $E[f^{(N)}(X^{(N)},Y^{(N)})|Y^{(N)}]$ behaves asymptotically like the non-commutative conditional expectation $E_{\mathrm{W}^{*}(Y)}[f(X,Y)]$ where $f$ comes from an appropriate non-commutative function space and $f^{(N)}:M_{N}(\mathbb{C})_{sa}^{m+n}\to M_{N}(\mathbb{C})$ is a sequence of uniformly Lipschitz functions that “behaves like $f$ in the large $N$ limit” in the sense of §3. See Theorem 5.9. 3. (3)

The classical conditional entropy $N^{-2}h(X^{(N)}|Y^{(N)})+(m/2)\log N$ converges to the conditional free entropy $\chi^{*}(X:\mathrm{W}^{*}(Y))$ . This is a similar to a conditional version of $\chi=\chi^{*}$ . See Theorem 6.6. 4. (4)

There exists a function $f(X,Y)$ such that $(f(X,Y),Y)\sim(S,Y)$ in non-commutative law, where $S$ is a free semicircular $m$ -tuple freely independent of $Y$ , and this function also arises from functions $f^{(N)}$ such that $(f^{(N)}(X^{(N)},Y^{(N)}),Y^{(N)})\sim(S^{(N)},Y^{(N)})$ , where $S^{(N)}$ is an independent GUE $m$ -tuple. This is the conditional version of transport to the Gaussian/semicircular law. See Theorems 8.10. 5. (5)

This transport map also witnesses the conditional entropy-cost inequality for the law of $X$ relative to semicircular conditioned on $Y$ . See Theorem 8.10. 6. (6)

This transport map furnishes an isomorphism $\mathrm{W}^{*}(X,Y)\cong\mathrm{W}^{*}(S,Y)\cong\mathrm{W}^{*}(S)*\mathrm{W}^{*}(Y)$ , which shows that $\mathrm{W}^{*}(Y)$ is freely complemented in $\mathrm{W}^{*}(X,Y)$ . 7. (7)

Actually, a second application of transport shows that $\mathrm{W}^{*}(Y)$ is isomorphic to the $\mathrm{W}^{*}$ -algebra generated by a semicircular $n$ -tuple, or in other words $L(\mathbb{F}_{n})$ . So altogether there is an isomorphism $\mathrm{W}^{*}(X,Y)\to L(\mathbb{F}_{m+n})$ that maps $\mathrm{W}^{*}(Y)$ to the canonical copy of $L(\mathbb{F}_{n})$ inside $L(\mathbb{F}_{m+n})$ .

Furthermore, the results about transport can be iterated to produce a “lower-triangular transport” as shown in Theorem 8.11 and discussed further in §1.6. This is analogous to the classical results on triangular transport of measure such as [6].

In the rest of the introduction, we will review notation and then motivate and explain the main results in more detail. In the course of the paper, it will become clear that not only are our main results proved all by the similar techniques, but in fact their statements and proofs are tightly interrelated.

1.3. Notation and Background

We will continue to use the same notation and background as in [29]. The one major change is that we will write superscript $(N)$ rather than subscript $N$ for measures and functions defined on $N\times N$ matrices. Moreover, we will use the original notation $\partial$ for Voiculescu’s free difference quotient, even though [29] used $\mathcal{D}$ .

We assume familiarity with the basic properties of tracial $\mathrm{W}^{*}$ -algebras (or tracial von Neumann algebras); see for instance [2]. In particular, a tracial $\mathrm{W}^{*}$ -algebra is a finite $\mathrm{W}^{*}$ -algebra $\mathcal{M}$ with a specified trace $\tau:\mathcal{M}\to\mathbb{C}$ . If $\mathcal{N}\subseteq\mathcal{M}$ is a $\mathrm{W}^{*}$ -algebra, then there is a unique trace-preserving conditional expectation $E_{\mathcal{N}}:\mathcal{M}\to\mathcal{N}$ . If $x=(x_{1},\dots,x_{m})$ is a tuple of operators in $\mathcal{M}$ , then we denote by $\mathrm{W}^{*}(x)$ the $\mathrm{W}^{*}$ -subalgebra which they generate.

There is an inner product on $\mathcal{M}$ defined by $\langle x,y\rangle_{2}=\tau(x^{*}y)$ , and the completion of $\mathcal{M}$ in this inner product is a Hilbert space known as $L^{2}(\mathcal{M},\tau)$ . We denote the self-adjoint elements of $\mathcal{M}$ by $\mathcal{M}_{sa}$ and recall that if $x$ and $y$ are self-adjoint, then $\langle x,y\rangle_{2}$ is real. If $x=(x_{1},\dots,x_{m})$ and $y=(y_{1},\dots,y_{m})$ are tuples, we denote $\langle x,y\rangle_{2}=\sum_{j=1}^{m}\langle x_{j},y_{j}\rangle_{2}$ . We define $\lVert x\rVert_{\infty}=\max_{j}\lVert x_{j}\rVert_{\infty}$ , that is, the maximum of the operator norms of $x_{j}$ .

We denote by $\operatorname{NCP}_{m}=\mathbb{C}\langle X_{1},\dots,X_{m}\rangle$ the $*$ -algebra of non-commutative polynomials in $m$ self-adjoint variables. A non-commutative law is a linear map $\lambda:\mathbb{C}\langle X_{1},\dots,X_{m}\rangle\to\mathbb{C}$ satisfying

(A)

$\lambda(1)=1$ . 2. (B)

$\lambda(p^{*}p)\geq 0$ for all $p\in\operatorname{NCP}_{m}$ . 3. (C)

$\lambda(pq)=\lambda(qp)$ for all $p,q\in\operatorname{NCP}_{m}$ . 4. (D)

$|\lambda(X_{i_{1}}\dots X_{i_{k}})|\leq R^{k}$ for some constant $R$ .

The set of non-commutative laws that satisfy (D) for a fixed value of $R$ is denoted $\Sigma_{m,R}$ , and it is equipped with the topology of pointwise convergence on $\operatorname{NCP}_{m}$ . Likewise, the space of all laws, equipped with the topology of pointwise convergence, will be denoted by $\Sigma_{m}$ .

If $x=(x_{1},\dots,x_{m})$ is a tuple of self-adjoint elements of $(\mathcal{M},\tau)$ , then we may define a non-commutative law $\lambda_{x}$ by $\lambda_{x}(p)=\tau(p(x))$ . Conversely, every non-commutative law can be realized in this way through the GNS construction. In particular, a free Gibbs law can be realized by a tuple $(x_{1},\dots,x_{m})$ of self-adjoint operators, and thus the free Gibbs law has a corresponding $\mathrm{W}^{*}$ -algebra $\mathrm{W}^{*}(x)$ , that is unique up to isomorphism.

We always consider $M_{N}(\mathbb{C})$ as a tracial $\mathrm{W}^{*}$ -algebra with the normalized trace $\tau_{N}=(1/N)\operatorname{Tr}$ , and in particular, we use the notation $\lVert x\rVert_{2}$ , $\lVert x\rVert_{\infty}$ , and $\lambda_{x}$ as defined above when $x$ is an $m$ -tuple of matrices. The notation $\lVert\cdot\rVert_{2}$ and $\lVert\cdot\rVert_{\infty}$ will never be used for the $L^{2}$ or $L^{\infty}$ norms of functions on matrices, but if we write an $L^{p}$ norm it will be expressed $\lVert\cdot\rVert_{L^{p}}$ .

For a smooth function $u:M_{N}(\mathbb{C})_{sa}^{m}\to\mathbb{R}$ , we denote by $Du$ and $Hu$ the gradient and Hessian with respect to the normalized inner product $\langle\cdot,\cdot\rangle_{2}$ . In other words, $Du(x_{0})$ is the vector in $M_{N}(\mathbb{C})_{sa}^{m}$ and $Hu(x_{0})$ is the $\mathbb{R}$ -linear transformation of $M_{N}(\mathbb{C})_{sa}^{m}$ satisfying

[TABLE]

For functions $f:M_{N}(\mathbb{C})_{sa}^{m}\to\mathbb{R}$ or $M_{N}(\mathbb{C})_{sa}^{m}\to M_{N}(\mathbb{C})$ , we denote $\lVert f\rVert_{\operatorname{Lip}}$ the Lipschitz (semi)norm with respect to using $\lVert\cdot\rVert_{2}$ on $M_{N}(\mathbb{C})_{sa}^{m}$ and $M_{N}(\mathbb{C})$ .

Note that $M_{N}(\mathbb{C})_{sa}^{m}$ can also be equipped with the real inner product $\langle x,y\rangle_{\operatorname{Tr}}=\sum_{j=1}^{m}\operatorname{Tr}(x_{j}y_{j})=N\langle x,y\rangle_{2}$ . Being a real inner-product space, $M_{N}(\mathbb{C})_{sa}^{m}$ may be identified with $\mathbb{R}^{mN^{2}}$ by choosing an orthonormal basis in $\langle\cdot,\cdot\rangle_{\operatorname{Tr}}$ . Lebesgue measure on $M_{N}(\mathbb{C})_{sa}^{m}$ should be understood with respect to this identification. Moreover, the gradient $\nabla$ , Jacobian matrix $J$ , divergence $\operatorname{Div}$ , and Laplacian $\Delta$ for functions on $M_{N}(\mathbb{C})_{sa}^{m}$ should also be understood with respect to this identification. Beware that this is not equivalent to using entrywise coordinates for $M_{N}(\mathbb{C})_{sa}^{m}$ since the off-diagonal entries are complex and conjugate-symmetric, while the diagonal entries are real, and that the normalized gradient above satisfies $Df=N\nabla f$ . For further discussion see [29, §2.1].

1.4. Main Results on Conditional Expectation

Consider a tuple

[TABLE]

of random self-adjoint matrices given by a probability density $(1/Z^{(N)})e^{-N^{2}V^{(N)}(x,y)}\,dx\,dy$ . We assume that $V^{(N)}$ is uniformly convex and semi-concave and that the normalized gradient $DV^{(N)}$ is asymptotically approximable by trace polynomials (a certain notion of good asymptotic behavior as $N\to\infty$ , explained below). The precise hypotheses are listed in Assumption 5.1. We showed in [29, Theorem 4.1] that in this case, there exists an $(m+n)$ -tuple $(X,Y)$ of non-commutative random variables such that $\tau_{N}(p(X,Y))\to\tau(p(X,Y))$ in probability.

Our first main result (Theorem 5.9) says roughly that the classical conditional expectation given $Y^{(N)}$ well approximates the $\mathrm{W}^{*}$ -algebraic conditional expectation $E_{\mathrm{W}^{*}(Y)}:\mathrm{W}^{*}(X,Y)\to\mathrm{W}^{*}(Y)$ . This is motivated in general by the importance of conditional expectation in free probability, e.g. its relationship to free independence with amalgamation and to free score functions. See [3, §4] for a study of the large $N$ limits of conditional expectations related to matrix SDE. The relationship between classical and free conditional expectation also has implications for the study of relative matricial microstate spaces, such as the “external averaging property” introduced in the upcoming work with Hayes, Nelson, and Sinclair [27].

Applications of conditional expectation within this paper include our results on free Fisher information and entropy (see Theorem 6.6 and Remark 6.8), as well as our proof that Assumption 5.1 is preserved under marginals (see Proposition 8.2).

The statement and proof of Theorem 5.9 rely on a notion of asymptotic approximation for functions on $M_{N}(\mathbb{C})_{sa}^{m}$ explained in §3. We define a class of non-commutative functions $\overline{\operatorname{TrP}}_{m}^{1}$ as a certain Fréchet space completion of trace polynomials, such that if $f\in\overline{\operatorname{TrP}}_{m}^{1}$ and $x_{1}$ , …, $x_{m}$ are self-adjoint elements in an $\mathcal{R}^{\omega}$ -embeddable tracial $\mathrm{W}^{*}$ -algebra $(\mathcal{M},\tau)$ , then $f(x_{1},\dots,x_{m})$ is a well-defined element of $L^{2}(\mathcal{M})$ . In particular, every $f\in\overline{\operatorname{TrP}}_{m}^{1}$ can be evaluated on a tuple of self-adjoint matrices. Now if $f^{(N)}:M_{N}(\mathbb{C})_{sa}^{m}\to M_{N}(\mathbb{C})$ , we say that $f^{(N)}\rightsquigarrow f$ if for every $R>0$ ,

[TABLE]

Moreover, if such an $f$ exists, then we say that $f^{(N)}$ is asymptotically approximable by trace polynomials.

Consider the random matrices $(X^{(N)},Y^{(N)})$ and non-commutative random variables $(X,Y)$ as above, and suppose that $f^{(N)}:M_{N}(\mathbb{C})_{sa}^{m+n}\to M_{N}(\mathbb{C})$ is uniformly Lipschitz in $\lVert\cdot\rVert_{2}$ and that $f^{(N)}\rightsquigarrow f\in\overline{\operatorname{TrP}}_{m+n}^{1}$ . Then we show that $E[f^{(N)}(X^{(N)},Y^{(N)})|Y^{(N)}]$ is given by a function $g^{(N)}(Y^{(N)})$ such that $g^{(N)}\rightsquigarrow g\in\overline{\operatorname{TrP}}_{n}^{1}$ , and moreover $E_{\mathrm{W}^{*}(Y)}[f(X,Y)]=g(Y)$ .

A curious feature of this result is that the function $g$ is defined for all self-adjoint $n$ -tuples of non-commutative random variables, not only for the specific $n$ -tuple $Y$ that we are concerned with. Similarly, the claim that $g^{(N)}\rightsquigarrow g$ describes the asymptotic behavior of $g^{(N)}(y)$ for all $y\in M_{N}(\mathbb{C})_{sa}^{n}$ , even though the distribution of the random matrix $Y^{(N)}$ is highly concentrated as $N\to\infty$ on much smaller sets, namely the “matricial microstate spaces” consisting of tuples $y\in M_{N}(\mathbb{C})_{sa}^{n}$ with non-commutative moments close to those of $Y$ . Thus, the statement we prove about the functions $g^{(N)}$ is stronger than an asymptotic result about $L^{2}$ approximation such as [23, Theorem 4.7].

1.5. Main Results on Entropy

Voiculescu defined two types of free entropy (see [49], [51], [54]). The first, called $\chi(X)$ , is based on measuring the size of matricial microstate spaces, which is closely related to the classical entropy of the random matrix models (see [29, §5.2]). The second, called $\chi^{*}(X)$ , is defined in terms of free Fisher information, which is based on classical Fisher information. Either one should heuristically be the large $N$ limit of the classical entropy of random matrix models, but there were many technical obstacles to proving this. The inequality $\chi\leq\chi^{*}$ is known in general thanks to [3]. However, even for non-commutative laws as well-behaved and explicit as free Gibbs states given by convex potentials, the equality of $\chi$ and $\chi^{*}$ when $m>1$ was not proved until Dabrowski’s paper [12], and the problem is still open for non-convex Gibbs states.

Our previous work [29] gave a proof of this equality in the convex case based on the asymptotic analysis of functions and PDE related to the random matrix models. Here we will use similar techniques for the conditional setting. We will show (Theorem 6.6) that for a random tuple of matrices $(X^{(N)},Y^{(N)})$ given by a convex potential as above, the classical conditional entropy $N^{-2}h(X^{(N)}|Y^{(N)})+(m/2)\log N$ converges to the conditional free entropy $\chi^{*}(X:\mathrm{W}^{*}(Y))$ . Actually, the proof here is shorter than those of [12] and [29] (see Remark 6.8), even considering the results we used from [29].

We focus here only on the non-microstates entropy (defined using Fisher information). It is not yet resolved in the literature what the correct definition of conditional microstates free entropy should be. In light of [29, §5.2], the conditional classical entropy for the random matrix models seems to be a reasonable substitute for microstates entropy, and in the convex setting we expect this to agree with any plausible definition of conditional microstates entropy due to the exponential concentration of measure.

1.6. Main Results on Transport

A transport map from a probability measure $\mu$ and to another probability measure $\nu$ is a function $f$ such that $f_{*}\mu=\nu$ . In probabilistic language, if $X\sim\mu$ and $Y\sim\nu$ are random variables, then $f_{*}\mu=\nu$ means that $f(X)\sim Y$ in distribution. The theory of transport (and in particular optimal transport) has numerous and significant applications in the classical setting. For instance, if we have a function $f$ such that $f(X)\sim Y$ and we can numerically simulate the random variable $X$ , then we can also simulate $Y$ .

In the non-commutative world, transport is even more significant. As remarked in [23, §1.1], there is no known analogue of a probability density in free probability. However, the existence of transport maps that would express our given random variables as functions of a free semicircular family (for instance) would serve a similar purpose to a density, namely to provide a fairly explicit and analytically tractable model for a large class of non-commutative laws.

Moreover, in contrast to the classical setting, the very existence of transport maps is a nontrivial condition. Being able to express a non-commutative tuple $Y$ as a function of another non-commutative tuple $X$ implies that $\mathrm{W}^{*}(Y)$ embeds into $\mathrm{W}^{*}(X)$ , and having a transport map in the other direction as well implies that $\mathrm{W}^{*}(Y)\cong\mathrm{W}^{*}(X)$ . In the classical setting, any two diffuse (non-atomic) standard Borel probability spaces are isomorphic. On the other hand, there are many non-isomorphic diffuse tracial $\mathrm{W}^{*}$ -algebras, even after restricting our attention to factors (those which cannot be decomposed as direct sums); see [33]. Moreover, Ozawa [40] showed that there is no separable tracial factor that contains an isomorphic copy of all the others. Thus, there are many instances where it is not even possible to transport one given non-commutative law to another.

The papers [22] and [14] showed the existence of monotone transport maps between certain free Gibbs laws given by convex potentials and the law of a free semicircular family, and thus concluded that each of the corresponding $\mathrm{W}^{*}$ -algebras was isomorphic to a free group factor $L(\mathbb{F}_{n})$ . In particular, this result applies to the $q$ -Gaussian variables for sufficiently small $q$ . These transport techniques have been extended to type III von Neumann algebras [36], to planar algebras [37], and to interpolated free group factors [26]. We will focus on “conditional transport” in the tracial setting.

Our first main result about transport is contained in Theorems 7.11 and 7.13. Let $(X^{(N)},Y^{(N)})$ be an $(m+n)$ -tuple of random matrices arising from a sequence of convex potentials satisfying Assumption 5.1. Let $(X,Y)$ be an $(m+n)$ -tuple of non-commutative self-adjoint variables realizing the limiting free Gibbs law. Then we construct functions $F^{(N)}:M_{N}(\mathbb{C})_{sa}^{m+n}\to M_{N}(\mathbb{C})_{sa}^{m}$ such that $(F^{(N)}(X^{(N)},Y^{(N)}),Y^{(N)})\sim(S^{(N)},Y^{(N)})$ in distribution, where $S^{(N)}$ is a GUE $m$ -tuple independent of $Y^{(N)}$ . We think of this as a conditional transport, which transports the law of $X^{(N)}$ to the law of $S^{(N)}$ conditioned on $Y^{(N)}$ .

Moreover, we show that the transport maps satisfy $F^{(N)}\rightsquigarrow F\in(\overline{\operatorname{TrP}}_{m+n}^{1})_{sa}^{m}$ . In the large $N$ limit, we obtain $(F(X,Y),Y)\sim(S,Y)$ in non-commutative law, where $S$ is a free semicircular $m$ -tuple freely independent of $Y$ . In particular, this means that $\mathrm{W}^{*}(X,Y)\cong\mathrm{W}^{*}(S,Y)=\mathrm{W}^{*}(S)*\mathrm{W}^{*}(Y)$ (where $*$ denotes free product). In other words, $\mathrm{W}^{*}(Y)$ is freely complemented in $\mathrm{W}^{*}(X,Y)$ .

By iterating this result, we can show that if $X=(X_{1},\dots,X_{m})$ is a tuple of non-commutative random variables given by a convex free Gibbs state as above, then there is an isomorphism $\mathrm{W}^{*}(X)\to\mathrm{W}^{*}(S)$ such that $\mathrm{W}^{*}(X_{1},\dots,X_{k})$ is mapped onto $\mathrm{W}^{*}(S_{1},\dots,S_{k})$ for each $k=1$ , …, $m$ . In other words, there is a “lower-triangular transport.” See Theorem 8.11. This is a (partial) free analogue of [6, Corollary 3.10].

This result implies in particular that $\mathrm{W}^{*}(X_{1})$ is a maximal abelian subalgebra and in fact maximal amenable (since the subalgebra $\mathrm{W}^{*}(S_{1})$ is known to be maximal amenable thanks to Popa [42]), and the same holds for each $\mathrm{W}^{*}(X_{j})$ by symmetry. For context on maximal amenable subalgebras, see for instance [42] [7] [8]. More generally, any von Neumann algebraic properties of the sequence of inclusions $\mathrm{W}^{*}(X_{1})\subseteq\mathrm{W}^{*}(X_{1},X_{2})\subseteq\dots\subseteq\mathrm{W}^{*}(X_{1},\dots,X_{m})$ are the same as for the case of free semicirculars, that is, for the standard inclusions $L(\mathbb{Z})\subseteq L(\mathbb{F}_{2})\subseteq\dots\subseteq L(\mathbb{F}_{m})$ .

Denote by $F$ the transport map from the law of $X$ to the law of $S$ in our construction, so that $F(X)\sim S$ . We can also arrange that $F$ witnesses the Talagrand entropy-cost inequality relative to the semicircular law, that is,

[TABLE]

where the left hand side is twice the entropy relative to semicircular (see §8.3). This is not surprising because it was already known in the classical case that the Talagrand inequality can be witnessed by some triangular transport [6, Corollary 3.10]. Moreover, our construction of the transport maps is a direct application of the same method that Otto and Villani used to prove the Talagrand entropy-cost inequality under the assumption of the log-Sobolev inequality [39, §4]. Thus, our main contribution is to study the large $N$ limit of the transport maps using asymptotic approximation by trace polynomials. We also show that $F$ is $\lVert\cdot\rVert_{2}$ -Lipschitz, and we estimate $\lVert F(X)-X\rVert_{\infty}$ in terms of the constants $c$ and $C$ specifying the uniform convexity and semi-concavity of $V^{(N)}$ . These estimates will in fact go to zero as $c,C\to 1$ .

Unfortunately, the maps constructed here are not optimal triangular transport maps with respect to the $L^{2}$ -Wasserstein distance, since Otto and Villani’s proof of [39, Theorem 1] uses a diffusion-semigroup interpolation between the two measures, not the displacement interpolation from optimal transport theory. In that sense, the results of this paper do not fully prove an analogue of [6, Corollary 3.10]. Even in the work of Guionnet and Shlyakhtenko [22], which constructed monotone transport maps in the free setting, the question of whether these maps furnish an optimal coupling between $X$ and $S$ inside a tracial von Neumann algebra was left unresolved. Future research should study optimal transport in the free setting, and determine whether the classical optimal transport (or more generally optimal triangular transport) maps for the random matrix models converge in the large $N$ limit in the sense of this paper.

1.7. Outline

The paper is organized as follows. We remark that §2 and §4 are mostly technical background, and the reader may treat them like appendices if desired. In other words, it is feasible to read through the other sections in order and only refer to §2 and §4 as needed to verify technical details of the main results.

§2 gives standard background on convex and semi-concave functions and on log-concave random matrix models.

§3 sets up the algebra of trace polynomials, and the spaces $\overline{\operatorname{TrP}}_{m}^{0}$ and $\overline{\operatorname{TrP}}_{m}^{1}$ of functions that can be approximated by trace polynomials. These spaces provide a framework for functional calculus in multiple self-adjoint variables $X_{1}$ , …, $X_{m}$ that can realize every element of $L^{2}(\mathrm{W}^{*}(X_{1},\dots,X_{m}))$ . They are a convenient tool to describe the large $N$ behavior of functions of several matrices, and thus will be used in the statements of our main theorems.

§4 describes solving ODE’s and the heat equation over $\overline{\operatorname{TrP}}_{m}^{1}$ . These are the technical lemmas used in the rest of the paper to show that the solutions of certain PDE’s have well-defined large $N$ limits.

§5 explains the setup of our random matrix models given by convex potentials, and then proves our main result on conditional expectation (Theorem 5.9).

§6 shows that the conditional entropy for random matrix models converges to the conditional non-microstates entropy (Theorem 6.6).

§7 proves the existence of transport maps from a free Gibbs law to the law of a free semicircular $m$ -tuple which arise as the large $N$ limit of transport maps for the random matrix models (Theorem 7.11 and 7.13).

§8 discusses applications of our results. We show that our standard set of assumptions for log-concave random matrix models is preserved under marginals, independent joins, linear change of variables, and convolution (§8.1). We show that the transport maps constructed above witness (the conditional version of) Talagrand’s entropy-cost inequality relative to Gaussian measure (Theorem 8.10). Then by iterating our conditional transport results, we show the existence of triangular transport (Theorem 8.11).

2. Multi-matrix Models from Convex Potentials

This section is a review and reference for basic results we will use throughout the paper.

We will be concerned with probability measures on $M_{N}(\mathbb{C})_{sa}^{m}$ of the form

[TABLE]

where $x=(x_{1},\dots,x_{m})$ is a tuple of self-adjoint matrices, $V:M_{N}(\mathbb{C})_{sa}^{m}\to\mathbb{R}$ such that $e^{N^{2}V}$ is integrable, and $Z=\int e^{-N^{2}V(x)}\,dx$ is the normalizing constant. Here $dx$ denotes Lebesgue measure where we identify $M_{N}(\mathbb{C})_{sa}^{m}$ with $\mathbb{R}^{mN^{2}}$ using the inner product associated to the trace (the normalization of Lebesgue measure is irrelevant here because if we multiply it by a constant, the normalizing constant $Z$ for $\mu$ will change to compensate). In this case, we will say that $\mu$ is the measure given by the potential $V$ . We will often assume $V$ is convex. Note that $\mu$ only determines $V$ up to an additive constant, but we will still say that “ $V$ is the potential corresponding to $\mu$ ” with a slight abuse of terminology.

A primary motivating example is $V(x)=\tau_{N}(f(x))$ , where $\tau_{N}=(1/N)\operatorname{Tr}$ is the normalized trace and $f$ is a non-commutative polynomial in $x_{1}$ , …, $x_{m}$ . Unlike the notation in many random matrix papers, we prefer to write $N^{2}\tau_{N}(f)$ rather than $N\operatorname{Tr}(f)$ . This seems natural because $\tau_{N}(f)$ is a function with dimension-independent normalization and it would make sense for self-adjoint elements of a tracial $\mathrm{W}^{*}$ -algebra. Meanwhile, $N^{2}$ is the dimension of $M_{N}(\mathbb{C})_{sa}^{N}$ and also the scale (in the sense of large deviations) for the standard concentration estimates that hold when $V$ is uniformly convex (see for instance [3] or §2.3 below).

2.1. Semi-convex and Semi-concave Functions

Definition 2.1.

Let $A:M_{N}(\mathbb{C})_{sa}^{m}\to M_{N}(\mathbb{C})_{sa}^{m}$ be a self-adjoint linear transformation and let $u:M_{N}(\mathbb{C})_{sa}^{m}\to\mathbb{R}$ . We say that $Hu\leq A$ if $u(x)-(1/2)\langle Ax,x\rangle_{2}$ is concave. We say that $Hu\geq A$ if $u(x)-(1/2)\langle Ax,x\rangle_{2}$ is convex.

We will also regularly use the following observation:

Lemma 2.2.

Suppose that $u:M_{N}(\mathbb{C})_{sa}^{m}\to\mathbb{R}$ , and let $A$ and $B$ be self-adjoint linear transformations. The following are equivalent:

(1)

$A\leq Hu\leq B$ . 2. (2)

For each $x\in M_{N}(\mathbb{C})_{sa}^{m}$ , there exists $y\in M_{N}(\mathbb{C})_{sa}^{m}$ such that

[TABLE]

for all $x^{\prime}\in M_{N}(\mathbb{C})_{sa}^{m}$ . 3. (3)

$u$ * is continuously differentiable and we have*

[TABLE]

for all $x,x^{\prime}\in M_{N}(\mathbb{C})_{sa}^{m}$ .

Moreover, in this case, $Du$ is $\max(\lVert A\rVert,\lVert B\rVert)$ -Lipschitz with respect to $\lVert\cdot\rVert_{2}$ .

Sketch of proof.

(1) $\implies$ (3). Suppose (1) holds. If $C=\max(\lVert A\rVert,\lVert B\rVert)$ , then for each $x$ there exists $y$ such that

[TABLE]

Hence, it follows from [29, Proposition 2.13] that $u$ must be continuously differentiable and $Du$ is $C$ -Lipschitz (which proves the last claim of our lemma as well). To prove the inequality asserted by (3), we can reduce to the case when $u$ is smooth using a similar argument as in [29, Proposition 2.13]). But in the smooth case, the claim follows by estimating from above and below the formula

[TABLE]

where $Hu$ is the Hessian defined in the standard pointwise sense.

(3) $\implies$ (2). Recall the formula

[TABLE]

This implies that

[TABLE]

This proves the upper bound, and the lower bound is symmetrical.

(2) $\implies$ (1). This follows from the characterization of convex functions by supporting hyperplanes. Indeed, $u(x)-(1/2)\langle Ax,x\rangle$ is convex if and only if for every $x$ , there exists $y$ satisfying

[TABLE]

which is equivalent to the right inequality of (2), and the concavity of $u(x)-(1/2)\langle Bx,x\rangle$ follows similarly. ∎

Lemma 2.3.

Suppose that $0\leq Hu\leq A$ for some linear transformation $A$ . Then $u$ is differentiable and we have

[TABLE]

so that in particular, $\lVert Du(x)-Du(x^{\prime})\rVert_{2}\leq\lVert A\rVert\lVert x-x^{\prime}\rVert_{2}$ .

Proof.

As in [29, Proposition 2.13], we obtain differentiability; and moreover to prove the asserted estimate, it suffices to prove the claim for smooth functions $u$ . In this case,

[TABLE]

2.2. Some Basic Lemmas

Let $V:M_{N}(\mathbb{C})_{sa}^{m}\to\mathbb{R}$ satisfy $HV\geq c$ . Then one can check that $e^{-N^{2}V(x)}$ is integrable; indeed, $V$ must achieve a minimum at some $x_{0}$ and we have $V(x)\geq V(x_{0})+(c/2)\lVert x-x_{0}\rVert_{2}^{2}$ and clearly $e^{-N^{2}c\lVert x-x_{0}\rVert_{2}^{2}}$ is integrable. Therefore, the probability measure $\mu$ given by $(1/Z^{(N)})e^{-N^{2}V(x)}\,dx$ is well-defined.

Lemma 2.4.

Let $V:M_{N}(\mathbb{C})_{sa}^{m}\to\mathbb{R}$ satisfy $c\leq HV\leq C$ for some scalars $0<c\leq C$ . Let $\mu$ be the probability measure given by $d\mu(x)=(1/Z^{(N)})e^{-N^{2}V(x)}\,dx$ and let $X$ be a random variables whose distribution is $\mu$ . Then

[TABLE]

and

[TABLE]

Proof.

We remark that $V$ is continuously differentiable by Lemma 2.2 $V$ is differentiable and $DV$ is Lipschitz. It follows by some straightforward estimation that $\lVert DV\rVert_{2}$ is integrable with respect to $\mu$ , so that $E[DV(X)]$ is well-defined. Then $E[DV(X)]=0$ follows from integration by parts (see §6.2 for further context on this integration by parts).

Next, let $D_{j}V$ denote the normalized gradient with respect to the matrix variable $x_{j}$ . Using integration by parts again, we get $E\langle D_{j}V(X),X_{j}-E(X_{j})\rangle_{2}=1$ , so that

[TABLE]

On the other hand, by Lemma 2.3,

[TABLE]

Since the middle term evaluates to $m$ , the proof is complete. ∎

Lemma 2.5.

Let $X$ be a random variable in $M_{N}(\mathbb{C})_{sa}^{m}$ and let $G:M_{N}(\mathbb{C})_{sa}^{m}\to M_{N}(\mathbb{C})_{sa}^{n}$ be Lipschitz with respect to $\lVert\cdot\rVert_{2}$ in both the domain and target space, and let $\lVert G\rVert_{\operatorname{Lip}}$ denote the corresponding Lipschitz (semi)norm. Then

[TABLE]

Proof.

Note that

[TABLE]

Corollary 2.6.

Let $V:M_{N}(\mathbb{C})_{sa}^{m}\to\mathbb{R}$ satisfies $c\leq HV\leq C$ , let $\mu$ be the corresponding measure, and let $X\sim\mu$ . Then

[TABLE]

Proof.

We apply Lemma 2.5 to $DV(X)$ . Also, $DV$ is $C$ -Lipschitz by Lemma 2.2. By Lemma 2.4 $E(DV(X))=0$ and $E\left\lVert X-E(X)\right\rVert_{2}^{2}\leq m/c$ . ∎

Lemma 2.7.

Let $A$ and $B$ be positive definite linear transformations $M_{N}(\mathbb{C})_{sa}^{m}\to M_{N}(\mathbb{C})_{sa}^{m}$ . Let $\{V_{k}\}_{k\in\mathbb{N}}$ be a sequence of functions such that $A\leq HV_{k}\leq B$ . Let $d\mu_{k}(x)=(1/Z_{k})e^{-N^{2}V_{k}(x)}\,dx$ be the associated probability measure. Let $\mu$ be another measure with finite mean. Suppose $\mu_{k}$ converges weakly to $\mu$ and the mean of $\mu_{k}$ is bounded in $\left\lVert\cdot\right\rVert_{2}$ as $k\to\infty$ . Then there exists $V$ such that $d\mu(x)=(1/Z)e^{-N^{2}V(x)}\,dx$ and $A\leq HV\leq B$ .

Proof.

Since adding a constant to $V_{k}$ does not change $\mu_{k}$ , we can assume without loss of generality that $V_{k}(0)=0$ . Now $DV_{k}$ is $C$ -Lipschitz where $C=\max(\lVert A\rVert,\lVert B\rVert)$ , hence the sequence is equicontinuous. It is also pointwise bounded in light of the previous lemma, since we assumed the mean of $\mu_{k}$ is bounded as $k\to\infty$ . Thus, by the Arzelà-Ascoli theorem, by passing to a subsequence, we can assume that $DV_{k}$ converges locally uniformly to some $F$ as $k\to\infty$ . Since $V_{k}(0)=0$ , this also implies that $V_{k}$ converges locally uniformly to some $V$ , which must satisfy $A\leq HV\leq B$ since the family of such functions is closed under pointwise limits (which follows from the family of convex functions being closed under pointwise limits; compare [29, Proposition 2.13(1)]). Moreover, $DV=F$ .

Let $\nu$ be the probability measure given by $d\nu(x)=(1/Z)e^{-N^{2}V(x)}\,dx$ . Since $A$ is positive definite, we have $A\geq c$ for some scalar $c>0$ . Because $DV_{k}(0)$ is bounded in $\left\lVert\cdot\right\rVert_{2}$ as $k\to\infty$ and $V_{k}(x)\geq\langle x,DV_{k}(0)\rangle_{2}+c\left\lVert x\right\rVert_{2}^{2}$ , we can see using the dominated convergence theorem that $Z_{k}\to Z$ as $k\to\infty$ . It follows again from dominated convergence that $\int\phi\,d\mu_{k}\to\int\phi\,d\nu$ for every continuous compactly supported $\phi$ . Hence, $\nu=\mu$ , so $\mu$ is given by the potential $V$ . ∎

2.3. Log-Sobolev Inequality and Concentration

Log-concave matrix models exhibit concentration of measure as $N\to\infty$ as a consequence of the following classical inequalities.

Definition 2.8.

We say that a measure $\mu$ on $\mathbb{R}^{m}$ satisfies the log-Sobolev inequality with constant $c$ if for all sufficiently smooth $f$ ,

[TABLE]

Definition 2.9.

We say that a measure $\mu$ on $\mathbb{R}^{m}$ satisfies Herbst’s concentration inequality with constant $c$ if for all Lipschitz functions $f:\mathbb{R}^{m}\to\mathbb{R}$ and $\delta>0$ , we have $E|f(X)|<+\infty$ and

[TABLE]

where $X$ is a random variable distributed according to $\mu$ . Note that by symmetry this implies

[TABLE]

The following theorem is now standard. See for instance [1, §2.3.3 and 4.4.2] and [5]. To summarize the history, the log-Sobolev inequality was introduced by Gross [20]. In the theorem below, (1) is due to Bakry and Emery and (2) is due to unpublished work of Herbst. The application to random matrices was introduced by Guionnet and Zeitouni [24].

Theorem 2.10.

(1)

Suppose that $\mu$ is a probability measure on $\mathbb{R}^{m}$ satisfying $d\mu(x)=(1/Z)e^{-V(x)}\,dx$ and suppose that $V(x)-(c/2)|x|^{2}$ is convex. Then $\mu$ satisfies the log-Sobolev inequality with constant $1/c$ . 2. (2)

If $\mu$ satisfies the log-Sobolev inequality with constant $1/c$ , then it satisfies Herbst’s concentration inequality with constant $c$ .

In particular, we have the following consequences for random matrices. Here we use the gradient $Df$ and Hessian $Hf$ with respect to the normalized inner product $\langle\cdot,\cdot\rangle_{2}$ .

Corollary 2.11.

Suppose that $V:M_{N}(\mathbb{C})_{sa}^{m}\to\mathbb{R}$ satisfies $HV\geq c>0$ and let $d\mu(x)=(1/Z)e^{-N^{2}V(x)}\,dx$ . Then $\mu$ satisfies the normalized log-Sobolev inequality

[TABLE]

and hence also satisfies the normalized Herbst concentration inequality

[TABLE]

where $f:M_{N}(\mathbb{C})_{sa}^{m}\to\mathbb{R}$ is Lipschitz and $\lVert f\rVert_{\operatorname{Lip}}$ denotes the Lipschitz norm with respect to $\lVert\cdot\rVert_{2}$ .

Lemma 2.12.

Suppose that $\mu$ is a probability measure on $M_{N}(\mathbb{C})_{sa}^{m}$ satisfying (2.5) for some constant $c$ . Let $f:M_{N}(\mathbb{C})_{sa}^{m}\to M_{N}(\mathbb{C})_{sa}$ be Lipschitz with respect to $\left\lVert\cdot\right\rVert_{2}$ . Then we have

[TABLE]

where $X\sim\mu$ and where $\Theta$ is a universal constant (independent of $N$ and $c$ ).

Proof.

First, observe that $\lVert x\rVert_{\infty}\leq N^{1/2}\lVert x\rVert_{2}$ for $x\in M_{N}(\mathbb{C})_{sa}^{m}$ . In particular, $g(x)=\lVert f(x)-E[f(X)]\rVert_{\infty}$ is $N^{1/2}\lVert f\rVert_{\operatorname{Lip}}$ -Lipschitz with respect to $\lVert\cdot\rVert_{2}$ , and thus

[TABLE]

which implies after a change of variables for $\delta$ that

[TABLE]

Therefore, it suffices to show that for some constant $\Theta$ , we have

[TABLE]

We may assume without loss of generality that $f$ is self-adjoint since in the general case, $f=(1/2)(f+f^{*})+i(1/2i)(f-f^{*})$ , and each of the terms on the right hand side is Lipschitz. Thus, the self-adjoint case would imply the non-self-adjoint case at the cost of doubling the constant $\Theta$ . Now to prove self-adjoint case, we use an “ $\epsilon$ -net argument” that is well-known in random matrix theory (see [47, §2.3.1]). Fix $N$ . Let $\{\eta_{j}\}_{j=1}^{J}$ be a maximal collection of unit vectors in $\mathbb{C}^{N}$ such that $|\eta_{i}-\eta_{j}|\geq 1/3$ for all $i\neq j$ . Since this collection is maximal, for every unit vector $\eta$ , there exists some $\eta_{j}$ with $|\eta-\eta_{i}|<1/3$ . Now if $a\in M_{N}(\mathbb{C})_{sa}$ , then there is a unit vector with $\left\lVert a\right\rVert_{\infty}=\langle\eta,a\eta\rangle$ . We may then choose $\eta_{j}$ with $|\eta-\eta_{j}|<1/3$

[TABLE]

so that

[TABLE]

Note that the balls $\{B(\eta_{j},1/6)\}_{j=1}^{J}$ in $\mathbb{C}^{N}$ are disjoint and contained in $B(0,7/6)$ . Hence, we can estimate the number of vectors by

[TABLE]

Let $K=\lVert f\rVert_{\operatorname{Lip}}$ . For a matrix $a\in M_{N}(\mathbb{C})_{sa}$ , we have

[TABLE]

This implies that $x\mapsto\langle\eta_{j},f(x)\eta_{j}\rangle$ is $KN^{1/2}$ -Lipschitz with respect to $\left\lVert\cdot\right\rVert_{2}$ and hence

[TABLE]

Since $\left\lVert a\right\rVert_{\infty}\leq 3\max_{j}\langle\eta_{j},a\eta_{j}\rangle$ , we have

[TABLE]

Thus, for any $t_{0}>0$ , we have

[TABLE]

Now substitute $t_{0}=6c^{-1/2}K(\log 7)^{1/2}$ and obtain (2.7) with

[TABLE]

(In fact, for a fixed $N$ , we may use $\Theta_{N}=6(\log 7)^{1/2}+9/6N(\log 7)^{1/2}$ in the self-adjoint case.) ∎

3. Functional Calculus and Asymptotic Approximation

In this section, we review the algebra $\operatorname{TrP}_{m}^{1}$ of trace polynomials in self-adjoint variables $X_{1}$ , …, $X_{m}$ , as well as a certain completed quotient $\overline{\operatorname{TrP}}_{m}^{1}$ of this algebra. The elements of $\overline{\operatorname{TrP}}_{m}^{1}$ represent functions that can be applied to any tuple of self-adjoint non-commutative random variables $(X_{1},\dots,X_{m})$ in an $\mathcal{R}^{\omega}$ -embeddable tracial $\mathrm{W}^{*}$ -algebra, and application of these functions will produce every element of $L^{2}(\mathrm{W}^{*}(X_{1},\dots,X_{m}))$ (see Proposition 3.14). These functions are closed under certain algebraic and composition operations. Moreover, they are a natural tool to describe the large $N$ limit of functions on $M_{N}(\mathbb{C})_{sa}^{m}$ , which we will apply in the rest of the paper.

3.1. The Algebra of Trace Polynomials

Trace polynomials have been used by several previous authors in the study of deterministic and random matrices; a brief list is [44], [45], [43], [46], [10], [15] (which coined the term “trace polynomial”), [30], [31], [14] but they are also used implicitly in many other works. We use the same notation as in our previous paper [29].

We denote by $\operatorname{NCP}_{m}=\mathbb{C}\langle X_{1},\dots,X_{m}\rangle$ the $*$ -algebra of polynomials in $m$ self-adjoint non-commuting variables $X_{1}$ , …, $X_{m}$ .

We denote by $\operatorname{TrP}_{m}^{0}$ the $*$ -algebra of scalar-valued trace polynomials. A formal definition is given in [29]; in short, it is the tensor algebra of the vector space of non-commutative polynomials modulo cyclic symmetry. Informally, this is the commutative $*$ -algebra generated by functions of the form $\tau(p(X_{1},\dots,X_{m}))$ , where $p$ is a non-commutative polynomial in $X=(X_{1},\dots,X_{m})$ and $\tau$ is a formal symbol (which stands in for a normalized trace on a von Neumann algebra), where $\tau(p(X))^{*}=\tau(p(X)^{*})$ , and where we identify $\tau(p(X)q(X))$ with $\tau(q(X)p(X))$ for all polynomials $p$ and $q$ . Thus, $\operatorname{TrP}_{m}^{0}$ is spanned as a vector space by elements of the form $\tau(p_{1}(X))\dots\tau(p_{n}(X))$ where $p_{1}$ , …, $p_{n}\in\operatorname{NCP}_{m}$ .

We denote by $\operatorname{TrP}_{m}^{1}$ the $*$ -algebra of operator-valued trace polynomials. This is the $*$ -algebra given formally as $\operatorname{TrP}_{m}^{0}\otimes\operatorname{NCP}_{m}$ . As a vector space, it is spanned by elements of the form $\tau(p_{1}(X))\dots\tau(p_{n}(X))q(X)$ , where $p_{1}$ , …, $p_{n}$ and $q$ are in $\operatorname{NCP}_{m}$ . More generally, we would denote $\operatorname{TrP}_{m}^{k}=\operatorname{TrP}_{m}^{0}\otimes(\operatorname{NCP}_{m})^{\otimes k}$ , but these spaces will not be needed in this paper.

The degree of a trace polynomial is defined as one would expect; see [29, §3.1] for precise explanation.

Suppose that $x_{1}$ , …, $x_{m}$ are self-adjoint elements of a tracial von Neumann algebra $(\mathcal{M},\tau_{0})$ . Then elements of $\operatorname{NCP}_{m}$ , $\operatorname{TrP}_{m}^{0}$ , and $\operatorname{TrP}_{m}^{1}$ can be evaluated on $(x_{1},\dots,x_{m})$ and $\tau_{0}$ by substituting the operator $x_{j}$ and the trace $\tau_{0}$ in place of the formal symbols $X_{j}$ and $\tau$ . More precisely, the evaluation map $\varepsilon_{(x_{1},\dots,x_{m})}:\operatorname{NCP}_{m}\to\mathcal{M}$ is the unique $*$ -algebra homomorphism that sends $X_{j}$ to $x_{j}$ . Similarly, the evaluation map $\varepsilon_{(x_{1},\dots,x_{m})}^{0}:\operatorname{TrP}_{m}^{0}\to\mathbb{C}$ is the unique $*$ -algebra homomorphism that sends $\tau(p(X))$ to $\tau_{0}(\varepsilon_{(x_{1},\dots,x_{m})}(p))$ . Finally, the evaluation map $\varepsilon_{(x_{1},\dots,x_{m})}^{1}:\operatorname{TrP}_{m}^{1}\to\mathcal{M}$ is $\varepsilon_{(x_{1},\dots,x_{m})}^{0}\otimes\varepsilon_{(x_{1},\dots,x_{m})}$ , that is,

[TABLE]

For the most part, we will abuse notation and denote $f(x)=\varepsilon_{(x_{1},\dots,x_{m})}(f)$ when $f\in\operatorname{NCP}_{m}$ , and similarly for $f\in\operatorname{TrP}_{m}^{0}$ or $f\in\operatorname{TrP}_{m}^{1}$ . Note in particular that we can consider $(\mathcal{M},\tau_{0})=(M_{N}(\mathbb{C})_{sa},\tau_{N})$ and thus $f(x)$ is defined for $x\in M_{N}(\mathbb{C})_{sa}^{m}$ and $f\in\operatorname{TrP}_{m}^{0}$ or $\operatorname{TrP}_{m}^{1}$ .

These evaluation maps thus allow us to view $f\in\operatorname{TrP}_{m}^{0}$ as a function (or rather a family of functions) $\mathcal{M}_{sa}^{m}\to\mathbb{C}$ for every tracial $\mathrm{W}^{*}$ -algebra $(\mathcal{M},\tau)$ and in particular $M_{N}(\mathbb{C})_{sa}^{m}\to\mathbb{C}$ for every $m$ . Similarly, every $f\in\operatorname{TrP}_{m}^{1}$ defines a function $\mathcal{M}_{sa}^{m}\to\mathcal{M}$ for every tracial $\mathrm{W}^{*}$ -algebra and in particular a function $M_{N}(\mathbb{C})_{sa}^{m}\to M_{N}(\mathbb{C})$ for every $N$ .

3.2. Functions Approximable by Trace Polynomials

From an analytic viewpoint, we prefer to work with certain separation-completions of $\operatorname{TrP}_{m}^{0}$ and $\operatorname{TrP}_{m}^{1}$ . In [29, §8.1], we sketched several equivalent ways of defining these separation-completions. Here we emphasize their description as functions that can be evaluated on any self-adjoint tuple in $\mathcal{R}^{\omega}$ (or, as we will see, any $\mathcal{R}^{\omega}$ -embeddable $\mathrm{W}^{*}$ -algebra).

Let $\mathcal{R}$ denote the hyperfinite $\operatorname{II}_{1}$ factor (tracial $\mathrm{W}^{*}$ -algebra with trivial center) and let $\mathcal{R}^{\omega}$ be its (tracial $\mathrm{W}^{*}$ -algebra) ultrapower with respect to some fixed free ultrafilter $\omega\in\beta\mathbb{N}\setminus\mathbb{N}$ .

Consider the case of $\operatorname{TrP}_{m}^{0}$ first. Let $\mathcal{F}_{m}^{0}$ denote the space of functions $(\mathcal{R}^{\omega})_{sa}^{m}\to\mathbb{C}$ that are bounded on operator norm balls, equipped with the family of semi-norms

[TABLE]

(Here “ $u$ ” stands for uniform.) This is clearly a Fréchet space since the topology is given by the countable family of semi-norms given by taking $R\in\mathbb{N}$ (for background on Fréchet spaces, see e.g. [19, §5.4]). Every $f\in\operatorname{TrP}_{m}^{0}$ defines a function $(\mathcal{R}^{\omega})_{sa}^{m}\to\mathbb{C}$ that is a bounded an operator norm balls. In other words, evaluation produces a map $\operatorname{TrP}_{m}^{0}\to\mathcal{F}_{m}^{0}$ . We denote by $\overline{\operatorname{TrP}}_{m}^{0}$ the closure of the image of this map in $\mathcal{F}_{m}$ . In other words, $\overline{\operatorname{TrP}}_{m}^{0}$ is the space of functions $(\mathcal{R}^{\omega})_{sa}^{m}\to\mathbb{C}$ that can be approximated uniformly on operator-norm balls by trace polynomials.

*Remark 3.1**.*

This space was denoted as $\mathcal{T}_{m}^{0}$ in our earlier paper [29]. The notation $\overline{\operatorname{TrP}}_{m}^{0}$ is slightly abusive since we have not shown that the map $\operatorname{TrP}_{m}^{0}\to\mathcal{F}_{m}^{0}$ is injective (and perhaps it is not). However, we will still use the notation $\overline{\operatorname{TrP}}_{m}^{0}$ since it indicates the connection with trace polynomials.

Earlier, we saw that it makes sense to evaluate a trace polynomial $f$ on any self-adjoint tuple $(x_{1},\dots,x_{m})$ in a tracial von Neumann algebra. In fact, $f(x_{1},\dots,x_{m})$ makes sense for every $f\in\overline{\operatorname{TrP}}_{m}^{0}$ when $x_{1}$ , …, $x_{m}$ come from a tracial von Neumann algebra that embeds into $\mathcal{R}^{\omega}$ . To see this, suppose $(\mathcal{M},\tau)$ admits a normal trace-preserving embedding $\iota:\mathcal{M}\to\mathcal{R}^{\omega}$ . Then we define $f(x_{1},\dots,x_{m})=f(\iota(x_{1}),\dots,\iota(x_{m}))$ . This is independent of the choice of trace-preserving embedding if $f$ is a trace polynomial, and hence it must also be independent of the choice of embedding when $f$ is in $\overline{\operatorname{TrP}}_{m}^{0}$ .

A similar separation-completion can be defined for $\operatorname{TrP}_{m}^{1}$ . Indeed, let $\mathcal{F}_{m}^{1}$ be the set of functions $\phi:(\mathcal{R}^{\omega})_{sa}^{m}\to L^{2}(\mathcal{R}^{\omega})$ such that

[TABLE]

is finite for each $R$ . Again, this is a Fréchet space. Through the evaluation map, every trace polynomial defines an element of $\mathcal{F}_{m}^{1}$ and hence there is a linear map $\operatorname{TrP}_{m}^{1}\to\mathcal{F}_{m}^{1}$ . We define $\overline{\operatorname{TrP}}_{m}^{1}$ to be the closure of the image of this map in $\mathcal{F}_{m}^{1}$ .

Similar to the scalar-valued case, we can define evaluation of $f\in\overline{\operatorname{TrP}}_{m}^{1}$ for tuples in an $\mathcal{R}^{\omega}$ -embeddable tracial $\mathrm{W}^{*}$ -algebra $(\mathcal{M},\tau)$ by using any trace preserving embedding $\iota:\mathcal{M}\to\mathcal{R}^{\omega}$ . Indeed, let $x_{1},\dots,x_{m}\in\mathcal{M}_{sa}$ . Clearly, for $f\in\operatorname{TrP}_{m}^{1}$ , we have $f(\iota(x_{1}),\dots,\iota(x_{m}))\in\iota(\mathcal{M})\subseteq\iota(L^{2}(\mathcal{M}))$ where the latter is defined by extending $\iota$ to a map $L^{2}(\mathcal{M})\to L^{2}(\mathcal{R}^{\omega})$ . Since this holds for $f\in\operatorname{TrP}_{m}^{1}$ , then by taking limits, we have $f(\iota(x_{1}),\dots,\iota(x_{m}))\in\iota(L^{2}(\mathcal{M}))$ for all $f\in\overline{\operatorname{TrP}}_{m}^{1}$ . Therefore, we may define $f(x_{1},\dots,x_{m})$ by $\iota(f(x_{1},\dots,x_{m}))=f(\iota(x_{1}),\dots,\iota(x_{m}))$ . Then one can check this is independent of the choice of embedding similarly as we did in the case of $\overline{\operatorname{TrP}}_{m}^{0}$ .

*Remark 3.2**.*

Because the spaces $\overline{\operatorname{TrP}}_{m}^{j}$ used here are non-standard, let us briefly describe their relationship to other more familiar ideas. Recall that $\Sigma_{m,R}$ denotes the space of non-commutative laws of $m$ -tuples with operator norms bounded by $R$ . We denote by $\Sigma_{m,R}^{\operatorname{app}}$ the subspace of laws that can be realized by $m$ -tuples in $\mathcal{R}^{\omega}$ , and $\Sigma_{m}^{\operatorname{app}}=\bigcup_{R>0}\Sigma_{m,R}^{\operatorname{app}}$ . Then we showed in [29, Lemma 8.2] that $\overline{\operatorname{TrP}}_{m}^{0}$ consists of functions $\Sigma_{m}^{\operatorname{app}}\to\mathbb{C}$ such that the restriction to $\Sigma_{m,R}^{\operatorname{app}}$ is continuous for each $R$ . One could think of this alternatively as an inverse limit of $C(\Sigma_{m,R}^{\operatorname{app}})$ over the directed system of restriction maps $C(\Sigma_{m,R^{\prime}}^{\operatorname{app}})\to C(\Sigma_{m,R}^{\operatorname{app}})$ for $R^{\prime}>R$ .

*Remark 3.3**.*

The spaces $\operatorname{TrP}_{m}^{0}$ and $\overline{\operatorname{TrP}}_{m}^{0}$ also arise naturally in the study of model theory of tracial von Neumann algebras introduced in [16, 17, 18]. To avoid some of the technical complexities of sorts, we follow the definitions in [17] where the language has multiple domains of quantification for each sort (and thus we can get away with fewer sorts), and in which formulas are obtained by applying continuous functions $\mathbb{R}^{n}\to\mathbb{R}$ to atomic formulas (rather than functions defined on some compact set). For tracial von Neumann algebra $(M,\tau)$ , the language includes (though this list is not exhaustive) a sort representing $M$ with domains of quantification for each operator norm ball of radius $n\in\mathbb{N}$ , a special relation-like symbol $d(x,y)$ for the distance $\lVert x-y\rVert_{2}$ , a relation symbol for the trace $\tau(x)$ , and function symbols for the adjoint, addition, and multiplication.

Now $\tau(p(x_{1},\dots,x_{m}))$ is an example of a atomic formula (or strictly speaking, its real and imaginary parts are basic formulas). Similarly, $\tau(p(\operatorname{Re}(x_{1}),\dots,\operatorname{Re}(x_{m}))$ is an atomic formula, where $\operatorname{Re}(x_{j})=(x_{j}+x_{j}^{*})/2$ . Since the elements of $\operatorname{TrP}_{m}^{0}$ is obtained by multiplying formulas such as $\tau(p)$ , we see that $f(\operatorname{Re}(x_{1}),\dots,\operatorname{Re}(x_{m}))$ is a quantifier-free formula for every $f\in\operatorname{TrP}_{m}^{0}$ . Moreover, the supremum of $|f(\operatorname{Re}(x_{1}),\dots,\operatorname{Re}(x_{m}))$ over $\{x:\lVert x_{j}\rVert\leq R_{j}\}$ is the same as the supremum of $f$ over $\{x:x_{j}=x_{j}^{*},\lVert x_{j}\rVert\leq R_{j}\}$ . The limiting objects $\overline{\operatorname{TrP}}_{m}^{0}$ (evaluated on the real parts of operators) are thus uniform limits of quantifier-free formulas on each domain of quantification for every $\mathcal{R}^{\omega}$ -embeddable tracial von Neumann algebra, that is, they are “quantifier-free definable predicates” relative to the theory of $\mathcal{R}^{\omega}$ -embeddable tracial von Neumann algebras. Conversely, since $\overline{\operatorname{TrP}}_{m}^{0}$ is closed under the operation $(f_{1},\dots,f_{n})\mapsto\phi(f_{1},\dots,f_{n})$ for $\phi:\mathbb{C}^{n}\to\mathbb{C}$ continuous, every quantifier-free definable predicate $f$ satisfying $f(x_{1},\dots,x_{m})=f(\operatorname{Re}(x_{1}),\dots,\operatorname{Re}(x_{m}))$ is an element of $\overline{\operatorname{TrP}}_{m}^{0}$ .

The elements of $\overline{\operatorname{TrP}}_{m}^{1}$ , evaluated on the real parts of operators, may be viewed similarly as certain “quantifier-free definable functions” relative to the theory of $\mathcal{R}^{\omega}$ -embeddable tracial von Neumann algebras, meaning that $\lVert f(x)-y\rVert_{2}^{2}$ is a quantifier-free definable predicate — actually, for technical reasons a definable function is required to map an operator norm ball into an operator norm ball, so the last statement only applies if we assume our function $f\in\overline{\operatorname{TrP}}_{m}^{1}$ has this property (but it turns out that such functions exist in abundance in $\overline{\operatorname{TrP}}_{m}^{1}$ ; see Proposition 3.14 and Proposition 3.17). Alternatively, in order to deal with functions with codomain $L^{2}$ , we must first modify the language by adding another sort for $L^{2}(M)$ , with domains of quantification corresponding to $L^{2}$ -balls, which will act as the target space of the functions in $\overline{\operatorname{TrP}}_{m}^{1}$ .

The quantifier-free nature of these formulas is a model-theoretic heuristic for why they behave well under limits in non-commutative law (hence describing the large $N$ limits of random matrix models). In fact,[29, Proposition 6.28] re-expresses a formula given by quantifiers in a quantifier-free way in order to get behavior under limits. There, we studied the inf-convolution $(Q_{t}V)(x)=\inf_{y}[V(y)-(1/2t)\lVert x-y\rVert_{2}^{2}]$ for self-adjoint tuples $x$ and $y$ . If $V\in\operatorname{TrP}_{m}^{0}$ , then for each $>0$ ,

[TABLE]

is a formula in the language of tracial von Neumann algebras whose definition involves the quantifier $\inf$ . But if $V$ is convex and semi-concave and $DV\in(\overline{\operatorname{TrP}}_{m}^{1})_{sa}^{m}$ , then the self-adjoint tuple $y$ where the infimum

[TABLE]

is achieved can be evaluated as the limit of a fixed-point iteration using functions from $(\overline{\operatorname{TrP}}_{m}^{1})_{sa}^{m}$ , and hence $y=\phi(\operatorname{Re}(x))$ for some $\phi\in(\overline{\operatorname{TrP}}_{m}^{1})_{sa}^{m}$ (see [29, Proposition 6.28]). Moreover, it follows from the results in [29] that $\phi$ is Lipschitz in $\lVert\cdot\rVert_{2}$ , and thus in light of Proposition 3.17 below, $\phi$ is bounded in operator norm on operator norm balls. So $\phi(\operatorname{Re}(x))$ is quantifier-free definable function. We can also conclude that $W_{t,R}\to W_{t}$ as $R\to\infty$ uniformly on operator norm balls, so $W_{t}$ is a definable formula (allowing quantifiers). But then because

[TABLE]

we conclude that $W_{t}$ is in fact a quantifier-free definable predicate.

On the other hand, without the ability to eliminate the quantifier like this, we could not hope for $Q_{t}V$ to behave so well for the large $N$ limit of random matrix models. Indeed, for $Q_{t}V(x)$ to depend continuously on the non-commutative law $\lambda_{x}$ for $x$ in each operator norm ball, it must be in $\operatorname{TrP}_{m}^{0}$ by the last remark, and hence it is a quantifier-free definable predicate.

Many of the properties shown in the next section about operations on $\operatorname{TrP}_{m}^{0}$ and $\operatorname{TrP}_{m}^{1}$ are natural from the model theoretic viewpoint, but we sketch self-contained justifications nonetheless.

3.3. Asymptotic Approximation for Functions of Matrices

Our earlier work introduced asymptotic approximability by trace polynomials for a sequence of functions on $M_{N}(\mathbb{C})_{sa}^{m}$ , which is a precise description of good asymptotic behavior as $N\to\infty$ suitable for free probabilistic analysis in the limit.

Definition 3.4.

Let $\phi^{(N)}:M_{N}(\mathbb{C})_{sa}^{m}\to\mathbb{C}$ . We say that $\{\phi^{(N)}\}$ is asymptotically approximable by trace polynomials if for every $R>0$ and $\epsilon>0$ , there exists $f\in\operatorname{TrP}_{m}^{0}$ such that

[TABLE]

Similarly, for matrix-valued functions $\phi^{(N)}:M_{N}(\mathbb{C})_{sa}^{m}\to M_{N}(\mathbb{C})$ , we say that $\{\phi^{(N)}\}$ is asymptotically approximable by trace polynomials if for every $R>0$ and $\epsilon>0$ , there exists $f\in\operatorname{TrP}_{m}^{1}$ such that

[TABLE]

It will be convenient to denote

[TABLE]

in the scalar-valued case and similarly for the matrix-valued case with $\left\lVert\phi(x)\right\rVert_{2}$ rather than $|\phi(x)|$ . Thus, for instance, the preceding definition says that there exists a trace polynomial $f$ with

[TABLE]

Moreover, it is implicit from our discussion in [29, §8.1] that if $\phi^{(N)}$ is asymptotically approximable by trace polynomials, then it will be asymptotic to some $f\in\overline{\operatorname{TrP}}_{m}^{0}$ or $\overline{\operatorname{TrP}}_{m}^{1}$ in the following sense.

Definition 3.5.

Let $\phi^{(N)}:M_{N}(\mathbb{C})_{sa}^{m}\to\mathbb{C}$ or $M_{N}(\mathbb{C})$ respectively, and let $f\in\overline{\operatorname{TrP}}_{m}^{0}$ or $\overline{\operatorname{TrP}}_{m}^{1}$ respectively. Then we say that $\{\phi^{(N)}\}$ is asymptotic to $f$ , or $\phi^{(N)}\rightsquigarrow f$ if for every $R>0$ ,

[TABLE]

Similarly, if $\phi^{(N)}:M_{N}(\mathbb{C})_{sa}^{m}\to M_{N}(\mathbb{C})$ and $f\in\overline{\operatorname{TrP}}_{m}^{1}$ , we make the same definitions with $|\phi^{(N)}(x)-f(x)|$ replaced by $\left\lVert\phi^{(N)}(x)-f(x)\right\rVert_{2}$ .

Lemma 3.6.

Let $\phi^{(N)}:M_{N}(\mathbb{C})_{sa}^{m}\to\mathbb{C}$ (respectively, $\to M_{N}(\mathbb{C})$ ). Then $\phi^{(N)}$ is asymptotically approximable by trace polynomials if and only if there exists $f\in\overline{\operatorname{TrP}}_{m}^{0}$ (respectively, $f\in\overline{\operatorname{TrP}}_{m}^{1}$ ) such that $\phi^{(N)}\rightsquigarrow f$ . Moreover, $\left\lVert f\right\rVert_{u,R}=\lim_{N\to\infty}\left\lVert\phi^{(N)}\right\rVert_{u,R}^{(N)}$ for each $R$ .

Proof.

We record the proof only for the case of scalar-valued functions, since the proof for operator-valued case is identical with minor changes of notation. Suppose that $\{\phi^{(N)}\}$ is asymptotically approximable by trace polynomials. Then there exists a sequence $\{f_{k}\}$ of trace polynomials such that for every $R>0$ ,

[TABLE]

As in [29, Lemma 8.1], if $g\in\operatorname{TrP}_{m}^{0}$ , then

[TABLE]

which implies that

[TABLE]

Applying this to $g=f_{j}-f_{k}$ , we obtain from the triangle inequality

[TABLE]

and hence $f_{k}$ is Cauchy with respect to $\left\lVert\cdot\right\rVert_{u,R}$ for each $R>0$ . Hence, $f_{k}$ converges to some $f\in\overline{\operatorname{TrP}}_{m}^{0}$ . By similar use of the triangle inequality,

[TABLE]

Hence, $\phi^{(N)}\rightsquigarrow f$ .

Conversely, suppose that $\phi^{(N)}\rightsquigarrow f\in\overline{\operatorname{TrP}}_{m}^{0}$ . Choose $f_{k}\in\operatorname{TrP}_{m}^{0}$ such that $\left\lVert f_{k}-f\right\rVert_{u,R}\to 0$ for every $R$ . Then

[TABLE]

Hence, it follows that $\{\phi^{(N)}\}$ is asymptotically approximable by trace polynomials, namely the polynomials $\{f_{k}\}$ .

We leave the proof of the last claim that $\left\lVert f\right\rVert_{u,R}=\lim_{N\to\infty}\left\lVert\phi^{(N)}\right\rVert_{u,R}^{(N)}$ to the reader. ∎

*Remark 3.7**.*

If $\phi^{(N)}:M_{N}(\mathbb{C})_{sa}^{m}\to M_{N}(\mathbb{C})_{sa}$ and $\{\phi^{(N)}\}$ is asymptotically approximable by trace polynomials, then we can asymptotically approximate it using self-adjoint trace polynomials. Indeed, if

[TABLE]

then the same holds with $f$ replaced by $(1/2)(f+f^{*})$ . Similarly, if $\phi^{(N)}(x)$ is self-adjoint and $\phi^{(N)}\rightsquigarrow f\in\overline{\operatorname{TrP}}_{m}^{1}$ , then $f$ must be self-adjoint.

*Remark 3.8**.*

Definitions 3.4 and 3.5 and Lemma 3.6 extend naturally to tuples $f=(f_{1},\dots,f_{n})\in(\overline{\operatorname{TrP}}_{m}^{1})^{n}$ and $\phi^{(N)}=(\phi_{1}^{(N)},\dots,\phi_{n}^{(N)}):M_{N}(\mathbb{C})_{sa}^{m}\to M_{N}(\mathbb{C})^{n}$ . We shall apply them to tuples without further comment in the rest of the paper.

3.4. Algebra, Composition, and Limits

Lemma 3.9.

$\overline{\operatorname{TrP}}_{m}^{0}$ * is an algebra and $\overline{\operatorname{TrP}}_{m}^{1}$ is a module over $\overline{\operatorname{TrP}}_{m}^{0}$ . Also, if $f,g\in\overline{\operatorname{TrP}}_{m}^{1}$ , then $\tau(fg)\in\overline{\operatorname{TrP}}_{m}^{0}$ . Moreover, suppose that $\phi^{(N)},\phi^{(N)}:M_{N}(\mathbb{C})_{sa}^{m}\to\mathbb{C}$ and $f^{(N)},g^{(N)}:M_{N}(\mathbb{C})_{sa}^{m}\to M_{N}(\mathbb{C})$ are asymptotically approximable, and $\phi^{(N)}\rightsquigarrow\phi$ , $\psi^{(N)}\rightsquigarrow\psi$ , $f^{(N)}\rightsquigarrow f$ , and $g^{(N)}\rightsquigarrow g$ . Then we have*

[TABLE]

Proof.

Since the proofs of all the statements are straightforward and similar to each other, we will only explain how to show that if $\phi\in\overline{\operatorname{TrP}}_{m}^{0}$ and $f\in\overline{\operatorname{TrP}}_{m}^{1}$ , then $\phi f\in\overline{\operatorname{TrP}}_{m}^{1}$ and that if $\phi^{(N)}\rightsquigarrow\phi$ and $f^{(N)}\rightsquigarrow f$ , then $\phi^{(N)}f^{(N)}\rightsquigarrow\phi f$ .

First, note that $\phi f$ is well-defined as a function on $(\mathcal{R}^{\omega})_{sa}^{m}$ by multiplying the scalar $\phi(x)$ times the vector $f(x)$ for each $x\in(\mathcal{R}^{\omega})_{sa}^{m}$ , and also clearly $\left\lVert\phi f\right\rVert_{u,R}\leq\left\lVert\phi\right\rVert_{u,R}\left\lVert f\right\rVert_{u,R}$ . To show that $\phi f\in\overline{\operatorname{TrP}}_{m}^{1}$ , it suffices to show that for every $\epsilon>0$ and $R>0$ , the function $\phi f$ can be approximated by an element of $\operatorname{TrP}_{m}^{1}$ with respect to $\left\lVert\cdot\right\rVert_{u,R}$ with error less than $\epsilon$ . We first choose $h\in\operatorname{TrP}_{m}^{1}$ such that

[TABLE]

Then we choose $\theta\in\operatorname{TrP}_{m}^{0}$ such that

[TABLE]

and we conclude with the routine observation that

[TABLE]

Next, to show $\phi^{(N)}f^{(N)}\rightsquigarrow\phi f$ , first observe that

[TABLE]

Then

[TABLE]

which implies that $\left\lVert\phi^{(N)}f^{(N)}-\phi f\right\rVert_{u,R}^{(N)}\to 0$ . ∎

In addition to their algebraic structure, functions $(\mathcal{R}^{\omega})_{sa}^{m}\to(\mathcal{R}^{\omega})_{sa}^{n}$ given by trace polynomials are closed under composition. It turns out that self-adjoint tuples from $\overline{\operatorname{TrP}}_{m}^{1}$ are closed under composition under the assumption of $\left\lVert\cdot\right\rVert_{2}$ -uniform continuity of the “outside” function (Lemma 3.12 below).

We say that $f\in\overline{\operatorname{TrP}}_{m}^{1}$ is $\left\lVert\cdot\right\rVert_{2}$ -uniformly continuous if for every $\epsilon>0$ , there exists $\delta>0$ such that

[TABLE]

Furthermore, we say $f\in\overline{\operatorname{TrP}}_{m}^{1}$ is $\left\lVert\cdot\right\rVert_{2}$ -Lipschitz if $\left\lVert f(x)-f(y)\right\rVert_{2}\leq K\left\lVert x-y\right\rVert$ for some constant $K$ , which is an important special case of uniform continuity. We denote the minimum such constant by $\left\lVert f\right\rVert_{\operatorname{Lip}}$ . We make the analogous definitions for $f\in\overline{\operatorname{TrP}}_{m}^{0}$ .

Observation 3.10.

If $f$ is a function from $(\mathcal{R}^{\omega})_{sa}^{m}$ to $\mathcal{R}^{\omega}$ or $\mathbb{C}$ that is $\left\lVert\cdot\right\rVert_{2}$ -uniformly continuous, then it has a unique continuous extension to $L^{2}(\mathcal{R}^{\omega})_{sa}^{m}$ , which is also $\left\lVert\cdot\right\rVert_{2}$ -uniformly continuous. Similarly, if $f$ is Lipschitz on $(\mathcal{R}^{\omega})_{sa}^{m}$ , then the extension is also Lipschitz.

Lemma 3.11.

Suppose that $f^{(N)}:M_{N}(\mathbb{C})_{sa}^{m}\to\mathbb{C}$ or $M_{N}(\mathbb{C})$ and that $f^{(N)}\rightsquigarrow f$ . If $f^{(N)}$ is $\lVert\cdot\rVert_{2}$ -uniformly continuous with respect to some modulus of continuity independent of $N$ , then $f$ is $\lVert\cdot\rVert_{2}$ -uniformly continuous on $(\mathcal{R}^{\omega})_{sa}^{m}$ with the same modulus of continuity.

Proof.

Let us only explain the operator-valued case where $f^{(N)}$ is $M_{N}(\mathbb{C})$ -valued and $f\in\overline{\operatorname{TrP}}_{m}^{1}$ , since the scalar-valued case is easier. We define scalar-valued functions of $2m$ variables by $F^{(N)}(x,y)=\lVert f^{(N)}(x)-g^{(N)}(y)\rVert_{2}^{2}$ and $F(x,y)=\lVert f(x)-f(y)\rVert_{2}^{2}$ . By Lemma 3.9, we have $F^{(N)}\rightsquigarrow F\in\overline{\operatorname{TrP}}_{2m}^{0}$ .

Let $\epsilon(\delta)$ be a common modulus of continuity for $f^{(N)}$ . Let $x$ and $y\in(\mathcal{R}^{\omega})_{sa}^{m}$ . Then we may embed $\mathrm{W}^{*}(x,y)$ into $(\mathcal{M},\tau):=\prod_{N\to\omega}(M_{N}(\mathbb{C}),\tau_{N})$ , that is the tracial $\mathrm{W}^{*}$ -ultraproduct of matrices. There exist tuples $x^{(N)}$ and $y^{(N)}$ of $N\times N$ matrices such that $x=\{x^{(N)}\}_{N\in\mathbb{N}}$ and $y=\{y^{(N)}\}_{N\in\mathbb{N}}$ in the ultraproduct and also $\lVert x^{(N)}\rVert_{\infty}\leq\lVert x\rVert_{\infty}$ and $\lVert y^{(N)}\rVert_{\infty}\leq\lVert y\rVert_{\infty}$ . Observe that

[TABLE]

(This equality holds for trace polynomials and hence holds for all functions in $\overline{\operatorname{TrP}}_{2m}^{0}$ by approximation.) On the other hand, we also have for $R>\max(\lVert x\rVert_{\infty},\lVert y\rVert_{\infty})$ that

[TABLE]

Therefore,

[TABLE]

since $\lVert x^{(N)}-y^{(N)}\rVert_{2}\to\lVert x-y\rVert_{2}$ . ∎

Lemma 3.12.

Let $j=0$ or $1$ . Let $f\in\overline{\operatorname{TrP}}_{m}^{j}$ be $\left\lVert\cdot\right\rVert_{2}$ -uniformly continuous and let $g=(g_{1},\dots,g_{m})\in(\overline{\operatorname{TrP}}_{n}^{1})_{sa}^{m}$ .

(1)

Then $f\circ g$ is a well-defined function on $(\mathcal{R}^{\omega})_{sa}^{n}$ , and it is in $\overline{\operatorname{TrP}}_{n}^{j}$ . 2. (2)

If $g$ is also $\left\lVert\cdot\right\rVert_{2}$ -uniformly continuous, then so is $f\circ g$ . 3. (3)

Suppose $f^{(N)}$ is a function on $M_{N}(\mathbb{C})_{sa}^{m}$ and $g^{(N)}:M_{N}(\mathbb{C})_{sa}^{n}\to M_{N}(\mathbb{C})_{sa}^{m}$ such that $f^{(N)}\rightsquigarrow f$ and $g^{(N)}\rightsquigarrow g$ . Also, suppose that $f^{(N)}$ is $\left\lVert\cdot\right\rVert_{2}$ -uniformly continuous with the modulus of continuity also uniform in $N$ . Then $f^{(N)}\circ g^{(N)}\rightsquigarrow f\circ g$ .

Proof.

(1) Because $f$ extends to a function on $L^{2}(\mathcal{R}^{\omega})_{sa}^{m}$ , we can define $f\circ g$ . Now let us show $f\circ g\in\overline{\operatorname{TrP}}_{m}^{j}$ . Choose $\epsilon>0$ and $R>0$ . By uniform continuity of $f$ , there exists a $\delta>0$ such that $\left\lVert x-y\right\rVert_{2}<\delta$ implies $|f(x)-f(y)|$ or $\left\lVert f(x)-f(y)\right\rVert_{2}<\epsilon/2$ (for $j=0$ or $1$ respectively). Now choose $\tilde{g}\in(\operatorname{TrP}_{n}^{1})_{sa}^{m}$ such that $\left\lVert\tilde{g}-g\right\rVert_{u,R}<\delta$ , and hence

[TABLE]

Because $\tilde{g}$ is a trace polynomial, there is some $R^{\prime}$ such that $\left\lVert x\right\rVert_{\infty}\leq R$ implies $\left\lVert\tilde{g}\right\rVert_{\infty}\leq R^{\prime}$ . Choose $\tilde{f}\in\overline{\operatorname{TrP}}_{m}^{j}$ with $\left\lVert\tilde{f}-f\right\rVert_{u,R^{\prime}}<\epsilon/2$ , and hence

[TABLE]

Then altogether we have $\left\lVert f\circ g-\tilde{f}\circ\tilde{g}\right\rVert_{u,R}<\epsilon$ .

(2) This is immediate.

(3) This is similar to the proof of (1). Fix $R>0$ and $\epsilon>0$ . Choose $\delta>0$ such that $\left\lVert x-y\right\rVert_{2}<\delta$ implies $|f(x)-f(y)|$ or $\left\lVert f(x)-f(y)\right\rVert_{2}<\epsilon/2$ and such that the same holds for $f^{(N)}$ as well. Let $\tilde{g}\in(\operatorname{TrP}_{n}^{1})_{sa}^{m}$ such that $\left\lVert\tilde{g}-g\right\rVert_{u,R}<\delta$ . Note that for sufficiently large $N$ , we have $\left\lVert g^{(N)}-\tilde{g}\right\rVert_{u,R}^{(N)}<\delta$ and hence

[TABLE]

Then let $R^{\prime}$ and $\tilde{f}$ be as in (1). Then for sufficiently large $N$ , we have

[TABLE]

so overall

[TABLE]

so that $\left\lVert f^{(N)}\circ g^{(N)}-f\circ g\right\rVert_{u,R}^{(N)}<2\epsilon$ for large enough $N$ . ∎

Moreover, asymptotically approximable sequences are closed under limits in an appropriate sense.

Lemma 3.13.

Let $f_{k}^{(N)}:M_{N}(\mathbb{C})_{sa}^{m}\to\mathbb{C}$ or to $M_{N}(\mathbb{C})$ for $k$ and $N\in\mathbb{N}$ . Suppose that $f_{k}^{(N)}\rightsquigarrow f_{k}$ in $\overline{\operatorname{TrP}}_{m}^{j}$ for each $k$ , and that

[TABLE]

Then $f_{k}$ converges in $\overline{\operatorname{TrP}}_{m}^{j}$ to some $f$ , and we have $f^{(N)}\rightsquigarrow f$ .

Proof.

Note that

[TABLE]

Then because of our assumption (3.1), we see that $\{f_{k}\}_{k\in\mathbb{N}}$ is Cauchy with respect to $\left\lVert\cdot\right\rVert_{u,R}$ for each $R$ . Thus, $f_{k}$ converges to some $f$ . Then to show that $f^{(N)}\rightsquigarrow f$ is a routine argument. ∎

3.5. Functional Calculus and Operator Norm Bounds

Now we will show that every element of $L^{2}(\mathrm{W}^{*}(x_{1},\dots,x_{m}))$ can be expressed as $f(x_{1},\dots,x_{m})$ for some $f\in\overline{\operatorname{TrP}}_{m}^{1}$ . In fact, we can arrange that $f$ can be approximated uniformly by Lipschitz functions. It will be convenient to define the uniform norm

[TABLE]

and we make the same definition for $\left\lVert f\right\rVert_{u}^{(N)}$ where the supremum is instead taken over $x\in M_{N}(\mathbb{C})_{sa}^{m}$ .

Proposition 3.14.

Let $x_{1}$ , …, $x_{m}$ be self-adjoint variables which generate a tracial $\mathrm{W}^{*}$ -algebra $(\mathcal{M},\tau)$ that is embeddable into $\mathcal{R}^{\omega}$ . Let $z\in L^{2}(\mathcal{M},\tau)$ .

(1)

There exists a $\left\lVert\cdot\right\rVert_{2}$ -uniformly continuous $f\in\overline{\operatorname{TrP}}_{m}^{1}$ such that $Z=f(x_{1},\dots,x_{m})$ . 2. (2)

The $f$ in (1) can be chosen so that there are $\left\lVert\cdot\right\rVert_{2}$ -Lipschitz functions $f_{k}\in\overline{\operatorname{TrP}}_{m}^{1}$ such that $\left\lVert f_{k}-f\right\rVert_{u}\to 0$ . 3. (3)

If $z\in\mathbb{C}\langle x_{1},\dots,x_{m}\rangle$ , then $f$ can be chosen to be $\left\lVert\cdot\right\rVert_{2}$ -Lipschitz.

We use the following auxiliary observation. Here $\Sigma_{m,R}$ will denote the space of non-commutative laws for an $m$ -tuple of operators with operator norm $\leq R$ . We equip $\Sigma_{m,R}$ with the topology of convergence in moments. Recall that $\Sigma_{m,R}$ is compact, separable, and metrizable. In [29, Lemma 8.2], we noted the relationship between $\overline{\operatorname{TrP}}_{m}^{0}$ and continuous functions on $\Sigma_{m,R}$ for each $R$ . This same idea motivates the proof of the next lemma.

Lemma 3.15.

Let $\mu\in\Sigma_{m,R}$ and let $\mathcal{U}$ be a neighborhood of $\mu$ , and let $\epsilon>0$ . Then there exists a trace polynomial $f$ such that

[TABLE]

Proof.

By Urysohn’s lemma, there exists a continuous function $F:\Sigma_{m,R}\to[0,1]$ such that $F(\mu)=1$ and $F(\nu)=0$ for $\nu\not\in\mathcal{U}$ . The functions $\Sigma_{m,R}\to\mathbb{C}$ of the form $\mu\mapsto\mu(f)$ for $f\in\operatorname{TrP}_{m}^{0}$ form a self-adjoint algebra in $C(\Sigma_{m,R})$ , and they separate points because by definition two laws are the same if they agree on every non-commutative polynomial. So by the Stone-Weierstrass theorem, this algebra is dense in $C(\Sigma_{m,R})$ . In particular, there exists a trace polynomial $g$ such that $|\nu(g)-F(\nu)|<\epsilon/2$ for all $\nu\in\Sigma_{m,R}$ . Then let $f=(g+\epsilon/2)/(g(\mu)+\epsilon/2)$ . ∎

We will also use the following smooth cut-off trick.

Lemma 3.16.

Let $0<R^{\prime}\leq R$ . Let $\phi\in C_{c}^{\infty}(\mathbb{R};\mathbb{R})$ such that $\phi(t)=t$ for $t\leq R^{\prime}$ and $|\phi(t)|\leq R$ . For $y\in(\mathcal{R}^{\omega})_{sa}$ , define $\Phi(y)=\phi(y)$ where $\phi$ is applied through functional calculus. Then

(1)

$\Phi(y)=y$ * if $\left\lVert y\right\rVert_{\infty}\leq R^{\prime}$ .* 2. (2)

$\left\lVert\Phi(y)\right\rVert_{\infty}\leq R$ * for all $y$ .* 3. (3)

$\Phi\in\overline{\operatorname{TrP}}_{m}^{1}$ . 4. (4)

$\Phi$ * is globally $\left\lVert\cdot\right\rVert_{2}$ -Lipschitz.*

Proof.

(1) and (2) follow from the properties of functional calculus. To prove (3), note by the Weierstrass approximation theorem that for every $r>0$ , there is a polynomial $p$ such that $|p(t)-\phi(t)|<\epsilon$ for $|t|\leq r$ . This implies as with (1) that $|p(y)-\phi(y)|<\epsilon$ for all $y$ with $\left\lVert y\right\rVert_{\infty}\leq r$ . Claim (4) follows from the results of [41]; the argument is explained in [29, (8.9) and Proposition 8.8]. ∎

Proof of Proposition 3.14.

Let $\mu$ be the law of $x=(x_{1},\dots,x_{m})$ , and let $R>\left\lVert X\right\rVert_{\infty}$ . Since $z\in L^{2}(\mathcal{M},\tau)$ , there exist non-commutative polynomials $\{p_{k}\}_{k=1}^{\infty}$ such that $\left\lVert p_{k}(x)-z\right\rVert_{2}<1/2^{k+1}$ and hence for $k\geq 1$ ,

[TABLE]

By scaling, we may assume without loss of generality that $\left\lVert z\right\rVert_{2}<1$ and set $p_{0}=0$ , and then the above statement also holds for $k=0$ . Now let

[TABLE]

which is a neighborhood of $\mu$ in $\Sigma_{m,R}$ . By the previous lemma, there exists a scalar-valued trace polynomial $u_{k}$ such that $\mu(u_{k})=1$ and

[TABLE]

(We can assume without loss of generality that $\left\lVert p_{k+1}-p_{k}\right\rVert_{u,R}\neq 0$ .) Now the function $u_{k}(p_{k+1}-p_{k})$ will evaluate at the point $X$ to $p_{k+1}(x)-p_{k}(x)$ . If $y\in(\mathcal{R}^{\omega})_{sa}^{m}$ with $\left\lVert y\right\rVert_{\infty}\leq R$ and if the law of $y$ is in $\mathcal{U}_{k}$ , then we will have

[TABLE]

On the other hand, if the law of $y$ is not in $\mathcal{U}_{k}$ , then $\left\lVert u_{k}(Y)(p_{k+1}(y)-p_{k}(y))\right\rVert_{2}\leq 1/2^{k}$ . Overall, we have

[TABLE]

This implies that $\sum_{k=0}^{\infty}u_{k}\cdot(p_{k+1}-p_{k})$ converges with respect to $\left\lVert\cdot\right\rVert_{u,R}$ for our given choice of $R$ , and of course evaluating this function on $X$ it produces the desired operator $Z$ since $u_{k}(x)=1$ .

To extend the function to be be globally defined on $(\mathcal{R}^{\omega})_{sa}^{m}$ , we use the smooth cut-off trick. Let $\phi\in C_{c}^{\infty}(\mathbb{R};\mathbb{R})$ such that $\phi(t)=t$ for $|t|\leq\left\lVert X\right\rVert_{\infty}$ and $|\phi|\leq R$ . For $y=(y_{1},\dots,y_{m})\in(\mathcal{R}^{\omega})_{sa}^{m}$ , let $\Phi(y)=(\phi(y_{1}),\dots,\phi(y_{m}))$ . Then $[u_{k}\cdot(p_{k+1}-p_{k})]\circ\Phi\in\overline{\operatorname{TrP}}_{m}^{1}$ because it is the composition of a trace polynomial with a function $\Phi\in(\overline{\operatorname{TrP}}_{m}^{1})_{sa}^{m}$ that is uniformly bounded in operator norm.

Also, since $\Phi$ is globally $\left\lVert\cdot\right\rVert_{2}$ -Lipschitz and since $u_{k}\cdot(p_{k+1}-p_{k})$ is $\left\lVert\cdot\right\rVert_{2}$ -Lipschitz on the operator norm ball of radius $R$ , we see that $[u_{k}\cdot(p_{k+1}-p_{k})]\circ\Phi$ is globally Lipschitz in $\left\lVert\cdot\right\rVert_{2}$ . For all $y\in(\mathcal{R}^{\omega})_{sa}^{m}$ ,

[TABLE]

Therefore,

[TABLE]

converges, and clearly $f\in\overline{\operatorname{TrP}}_{m}^{1}$ since each of the individual terms is. Furthermore, $\left\lVert\cdot\right\rVert_{2}$ -uniform continuity of each term and the uniform convergence of the series implies uniform continuity of $f$ . Since $\left\lVert x\right\rVert_{\infty}\leq R$ , we have $\Phi(x)=x$ and $u_{k}(x)=1$ , so that

[TABLE]

This concludes the proof of (1).

To verify (2), we take $f_{n}$ to be the $n$ th partial sum of the series defining $f$ ; we have shown that the individual terms are $\left\lVert\cdot\right\rVert_{2}$ -Lipschitz, hence so are the partial sums. Finally, to prove (3), note that if $z=p(x_{1},\dots,x_{m})$ , then $z$ also equals $f(x_{1},\dots,x_{m})$ where $f=p\circ\Phi$ , and by the same reasoning as above $p\circ\Phi$ is globally $\left\lVert\cdot\right\rVert_{2}$ -Lipschitz. ∎

We have shown that every element of $L^{2}(\mathrm{W}^{*}(x_{1},\dots,x_{m}))$ has the form $f(x_{1},\dots,x_{m})$ for some $f\in\overline{\operatorname{TrP}}_{m}^{1}$ . On the other hand, we will prove that if $f$ is Lipschitz, then $f(x)$ is actually bounded in operator norm. We state our estimate in terms of unitarily invariant random matrix models which satisfy concentration (2.5), but as explained in Remark 3.18 such models exist whenever $L^{2}(\mathrm{W}^{*}(x_{1},\dots,x_{m}))$ is embeddable into $\mathcal{R}^{\omega}$ .

Proposition 3.17.

Let $x=(x_{1},\dots,x_{m})$ be a tuple of self-adjoint variables in a $\mathrm{W}^{*}$ -algebra $(\mathcal{M},\tau)$ whose non-commutative law is $\lambda$ . Suppose there is a sequence $\{\mu^{(N)}\}$ of probability measures on $M_{N}(\mathbb{C})_{sa}^{m}$ , invariant under unitary conjugation, that satisfies the concentration estimate (2.5) for some constant $c$ , and such that the corresponding random variables $X^{(N)}=(X_{1}^{(N)},\dots,X_{m}^{(N)})$ satisfy $\lambda_{X^{(N)}}\to\lambda$ in probability. Then $\mathrm{W}^{*}(x)$ is embeddable into $\mathcal{R}^{\omega}$ . Moreover, if $f\in\overline{\operatorname{TrP}}_{m}^{1}$ is $\lVert\cdot\rVert_{2}$ -Lipschitz, then $f(x)$ is a bounded operator and

[TABLE]

where $\Theta$ is a universal constant.

Proof.

In light of Lemma 2.12,

[TABLE]

and

[TABLE]

Also, the non-commutative law of $X^{(N)}$ converges in probability to that of $x$ and finally $\tau_{N}(f(X^{(N)}))-E[\tau_{N}(f(X^{(N)}))]\to 0$ in probability as a consequence of concentration. Therefore, we may choose a sequence of elements $y^{(N)}\in M_{N}(\mathbb{C})_{sa}^{m}$ such that

[TABLE]

Because $E(X_{j}^{(N)})=E(\tau_{N}(X_{j}^{(N)}))$ by unitary invariance and because of concentration, $E(\tau_{N}(X_{j}^{(N)}))$ must converge to $\tau(x_{j})$ since $\tau_{N}(X_{j}^{(N)})$ converges to the $\tau(x_{j})$ in probability. So overall $E(X_{j}^{(N)})-\tau_{N}(x_{j})\to 0$ in operator norm. In particular,

[TABLE]

and hence $\lVert y^{(N)}\rVert_{\infty}$ is bounded as $N\to\infty$ . Moreover, our choice of $y^{(N)}$ also satisfies

[TABLE]

since $E[f(X^{(N)})]=E[\tau_{N}(f(X^{(N)})]$ again by unitary invariance.

Fix a free ultrafilter $\omega$ and let $(\mathcal{M},\tau)=\prod_{N\to\omega}(M_{N}(\mathbb{C}),\tau_{N})$ be the tracial $\mathrm{W}^{*}$ -ultraproduct of the sequence of matrix algebras. Since $\{y^{(N)}\}$ is bounded in operator norm, $y=\{y^{(N)}\}_{N\in\mathbb{N}}$ defines an element of $(\mathcal{M},\tau)$ . By definition of ultraproducts, $\tau(p(y))=\lim_{N\to\omega}\tau_{N}(p(y^{(N)}))$ for every non-commutative polynomial $p$ and therefore the non-commutative law of $y$ is $\lambda$ (which is the same as that of $x$ ). In particular, $\mathrm{W}^{*}(x)\cong\mathrm{W}^{*}(y)$ embeds into $(\mathcal{M},\tau)$ and hence also into $\mathcal{R}^{\omega}$ . (Compare [22, Theorem 4.4].)

Since $\mathrm{W}^{*}(x)$ is $\mathcal{R}^{\omega}$ -embeddable, $f(x)$ is well-defined, and clearly $\lVert f(x)-\tau(f(x))\rVert_{\infty}=\lVert f(y)-\tau(f(y))\rVert_{\infty}$ . Now we claim that $f(y)$ is given by the sequence $\{f(y^{(N)})\}_{N\in\mathbb{N}}$ as an element of $(\mathcal{M},\tau)$ (that is, application of $f$ commutes with ultralimits). It is easy to check that $g(y)=\{g(y^{(N)})\}_{N\in\mathbb{N}}$ when $g\in\operatorname{TrP}_{m}^{1}$ . But for any $\epsilon>0$ , there exists $g\in\operatorname{TrP}_{m}^{1}$ with $\lVert f-g\rVert_{c^{-1/2}\Theta+1}<\epsilon$ . Thus, $\lVert f(y)-g(y)\rVert_{2}<\epsilon$ and also $\lVert f(y^{(N)})-g(y^{(N)})\rVert_{2}<\epsilon$ for sufficiently large $N$ . This implies that $\lVert f(y)-\{f(y^{(N)})\}_{N\in\mathbb{N}}\rVert_{2}<2\epsilon$ . Thus, $f(y)=\{f(y^{(N)})\}_{N\in\mathbb{N}}$ as claimed. The same holds with $f$ replaced by $f-\tau(f)$ . This implies

[TABLE]

*Remark 3.18**.*

Suppose that $\mathrm{W}^{*}(x_{1},\dots,x_{m})$ is embeddable into $\mathcal{R}^{\omega}$ . Then there exist tuples $x^{(N)}=(x_{1}^{(N)},\dots,x_{m}^{(N)})$ in $M_{N}(\mathbb{C})_{sa}^{m}$ such that $\lVert x^{(N)}\rVert_{\infty}\leq\lVert x\rVert_{\infty}$ and $\lambda_{x^{(N)}}\to\lambda_{x}$ . Let $U^{(N)}$ be an $N\times N$ random Haar unitary matrix and let $X^{(N)}=U^{(N)}x^{(N)}(U^{(N)})^{*}$ . Clearly, the probability distribution of $X^{(N)}$ is unitarily invariant and also $\lambda_{X^{(N)}}\to\lambda_{x}$ in probability.

To check concentration, observe that $u\mapsto ux^{(N)}u^{*}$ is a $2m^{1/2}\lVert x\rVert_{\infty}$ -Lipschitz function from the unitary group to $M_{N}(\mathbb{C})_{sa}^{m}$ with respect to $\lVert\cdot\rVert_{2}$ . Therefore, if $f:M_{N}(\mathbb{C})_{sa}^{m}\to\mathbb{R}$ is Lipschitz, then $u\mapsto f(ux^{(N)}u^{*})$ is also Lipschitz, with the Lipschitz constant $2m^{1/2}\lVert x\rVert_{\infty}\lVert f\rVert_{\operatorname{Lip}}$ . It was proved in [35, Theorem 15], [34, Theorem 5.16] that the Haar measure on the unitary group satisfies the (non-normalized) log-Sobolev inequality with constant $6/N$ and the corresponding concentration of measure for Lipschitz functions with respect to the Hilbert-Schmidt metric $N^{1/2}\lVert\cdot\rVert_{2}$ . After renormalization this implies that the Haar measure on the unitary group satisfies (2.5) with $c=1/6$ . Hence, $X^{(N)}$ satisfies (2.5) with $c=1/12m\lVert x\rVert_{\infty}^{2}$ .

4. Tools for Differential Equations in $\overline{\operatorname{TrP}}_{m}^{j}$

This section describes two analytic operations — solution of ODE and convolution with the Gaussian law — that can be performed on tuples in $\overline{\operatorname{TrP}}_{m}^{1}$ and on asymptotically approximable sequences of functions on $N\times N$ matrices. These operations were applied in [29], and will be applied in the remainder of this paper, to analyze the large $N$ limit of certain PDE associated to random matrix models, and hence to understand the behavior of convex matrix models in the large $N$ limit.

4.1. Flows Along Vector Fields

Several times in our study of partial differential equations, we will use flows along vector fields given by functions in $\overline{\operatorname{TrP}}_{m}^{1}$ and by asymptotically approximable sequences of functions on matrices. For instance, this idea was used in [29, Lemma 4.10], and in this paper, it will be used in the proof of Lemma 5.13 and Theorem 7.11.

The setup is roughly speaking as follows. Consider a time interval $[0,T]\subseteq\mathbb{R}$ . Let $H:(\mathcal{R}^{\omega})_{sa}^{m}\times[0,T]\to L^{2}(\mathcal{R}^{\omega})_{sa}^{m}$ be a function such that $H(\cdot,t)$ is a tuple of functions in $\overline{\operatorname{TrP}}_{m}^{1}$ for each $t$ (satisfying certain uniform continuity assumptions). Also, let $F_{0}:(\mathcal{R}^{\omega})_{sa}^{m}\to L^{2}(\mathcal{R}^{\omega})_{sa}^{m}$ . Then we would like to construct $F:(\mathcal{R}^{\omega})_{sa}^{m}\times[0,T]\to(\mathcal{R}^{\omega})_{sa}^{m}$ such that

[TABLE]

Moreover, we would like to show that if $H^{(N)}$ is a function on $M_{N}(\mathbb{C})_{sa}^{m}\times[0,T]$ that is asymptotic to $H$ and $F_{0}^{(N)}\rightsquigarrow F_{0}$ , then the solutions $F^{(N)}$ are asymptotic to the solution $F$ .

Such a proof was essentially carried out in [29, Lemma 4.10], but now we introduce the added complexity that $H$ will depend on $x$ , $t$ , and an auxiliary parameter $y\in(\mathcal{R}^{\omega})_{sa}^{m}$ , and we must solve the initial value problem

[TABLE]

The added parameter $y$ arises naturally in our analysis of conditional expectation, entropy, and transport since it represents the variables we are conditioning upon (see for instance §5.3).

For the sake of future reference, let us state the set of assumptions we make about the vector field $H(x,y,t)$ . These assumptions are framed for a convenient and applicable level of generality rather than maximum generality.

Assumption 4.1.

We are given $T>0$ and a function $H:(\mathcal{R}^{\omega})_{sa}^{m}\times(\mathcal{R}^{\omega})_{sa}^{n}\times[0,T]\to L^{2}(\mathcal{R}^{\omega})_{sa}^{m}$ satisfying:

(1)

For each $t$ , we have $H(\cdot,\cdot,t)\in(\overline{\operatorname{TrP}}_{m+n}^{1})_{sa}^{m}$ . 2. (2)

$H$ * is $\left\lVert\cdot\right\rVert_{2}$ -Lipschitz in $(x,y)$ , that is, for some constant $K$ independent of $t$ , we have*

[TABLE] 3. (3)

The map $t\mapsto H(\cdot,\cdot,t)$ is a continuous function $[0,T]\to(\overline{\operatorname{TrP}}_{m+n}^{1})_{sa}^{m}$ with respect to the Fréchet topology on $\overline{\operatorname{TrP}}_{m+n}^{1}$ . This implies that for every $R>0$ and for every $\epsilon>0$ , there exists $\delta>0$ , such that

[TABLE]

(where we have upgraded from continuity to uniform continuity because of compactness of $[0,T]$ ).

Observation 4.2.

Under this assumption, as in Observation 3.10, we see that $H(\cdot,\cdot,t)$ has a unique continuous extension to $L^{2}(\mathcal{R}^{\omega})_{sa}^{m+n}$ . Furthermore, for each $(x,y)\in L^{2}(\mathcal{R}^{\omega})_{sa}^{m+n}$ , the function $t\mapsto H(x,y,t)$ is continuous (though the modulus of continuity cannot be chosen independent of $(x,y)$ ). Continuity follows because there exists a sequence $(x_{n},y_{n})\in(\mathcal{R}^{\omega})_{sa}^{m+n}$ such that $(x_{n},y_{n})\to(x,y)$ in $\left\lVert\cdot\right\rVert_{2}$ . Now $H(x_{n},y_{n},\cdot)$ is continuous by assumption (3), but assumption (2) implies that $H(x_{n},y_{n},\cdot)\to H(x,y,\cdot)$ uniformly on $[0,T]$ .

Under these assumptions, (4.1) can be solved by the standard method of Picard iteration. We first verify that Assumption 4.1 is preserved under the composition and integration operations used to define Picard iterates.

Lemma 4.3.

Suppose that $H(x,y,t)$ satisfies Assumption 4.1 and suppose that $G_{0}\in(\overline{\operatorname{TrP}}_{m+n}^{1})_{sa}^{m}$ is globally $\left\lVert\cdot\right\rVert_{2}$ -Lipschitz. Then the function

[TABLE]

is well-defined by Riemann integration and it also satisfies Assumption 4.1.

Proof.

The Riemann integral is defined because $t\mapsto H(x,y,t)$ is continuous with respect to $\left\lVert\cdot\right\rVert_{2}$ for each $(x,y)\in(\mathcal{R}^{\omega})_{sa}^{m+n}$ (and in fact, each $(x,y)\in L^{2}(\mathcal{R}^{\omega})_{sa}^{m+n}$ ). Now let us check that $G$ satisfies Assumption 4.1.

(1) Fix $R>0$ and $\epsilon>0$ . By assumption (2) for $H$ , there exists $\delta>0$ such that

[TABLE]

Fix $t$ , then choose a partition $0=t_{0}$ , …, $t_{n}=t$ of $[0,t]$ such that $|t_{j}-t_{j-1}|<\delta$ . Then let $h_{j}\in(\operatorname{TrP}_{m+n}^{1})_{sa}^{m}$ such that

[TABLE]

Then

[TABLE]

Therefore,

[TABLE]

This shows that $\int_{0}^{t}H(\cdot,\cdot,s)\,ds$ is in $(\overline{\operatorname{TrP}}_{m+n}^{1})_{sa}^{m}$ . Because $G_{0}$ is in this space as well, this implies that $G(\cdot,\cdot,t)$ is in $(\overline{\operatorname{TrP}}_{m+n}^{1})_{sa}^{m}$ as desired.

(2) If $H(\cdot,\cdot,t)$ is $K$ -Lipschitz for all $t$ , then $\left\lVert G(\cdot,t)\right\rVert_{\operatorname{Lip}}\leq\left\lVert G_{0}\right\rVert_{\operatorname{Lip}}+tK$ .

(3) Since $t\mapsto H(\cdot,\cdot,t)$ is continuous with respect to $\left\lVert\cdot\right\rVert_{u,R}$ , we must have $\left\lVert H(\cdot,\cdot,t)\right\rVert_{u,R}\leq M$ for some constant $M$ . Then $\left\lVert G(\cdot,\cdot,t)-G(\cdot,\cdot,t^{\prime})\right\rVert_{u,R}\leq M|t-t^{\prime}|$ . ∎

Lemma 4.4.

Suppose that $H(x,y,t)$ and $G(x,y,t)$ satisfy Assumption 4.1. Then $H(G(x,y,t),y,t)$ also satisfies Assumption 4.1.

Proof.

The composition makes sense because $H(x,y,t)$ extends to be defined for $(x,y)\in L^{2}(\mathcal{R}^{\omega})_{sa}^{m+n}$ . It follows from Lemma 3.12 that $H(G(x,y,t),y,t)$ satisfies (1). The Lipschitz estimate (2) is straightforward and left to the reader. To prove (3), let $K$ be a Lipschitz constant for $H$ as a function of $(x,y)$ that works for all $t$ . Fix $\epsilon>0$ . Proceeding as in the proof of Lemma 4.3, we can choose a partition $\{t_{0},\dots,t_{n}\}$ of $[0,T]$ and $g_{j}\in(\overline{\operatorname{TrP}}_{m+n}^{1})_{sa}^{m}$ such that

[TABLE]

Then there exists some $R^{\prime}$ such that $\left\lVert(x,y)\right\rVert_{\infty}\leq R$ implies $\left\lVert(g_{j}(x,y),y)\right\rVert_{\infty}\leq R^{\prime}$ for all $j$ . Then by applying assumption (3) to $H$ , there exists $\delta$ such that

[TABLE]

We also choose $\delta^{\prime}$ such that

[TABLE]

Supposing that $\left\lVert(x,y)\right\rVert_{\infty}\leq R$ and $|t-t^{\prime}|<\min(\delta,\delta^{\prime})$ , we have

[TABLE]

Meanwhile, after we pick $j$ such that $t^{\prime}\in[t_{j-1},t_{j}]$ , then

[TABLE]

The middle term can be estimated by $\epsilon/4$ because $\left\lVert g_{j}(x,y),y)\right\rVert_{\infty}\leq R^{\prime}$ . Meanwhile, the first and third terms can each be estimated by $K(\epsilon/4K)=\epsilon/4$ using the Lipschitz property of $H$ and our choice of $g_{j}$ . Altogether, $|t-t^{\prime}|<\min(\delta,\delta^{\prime})$ implies that $\left\lVert H(G(x,y,t),y,t)-H(G(x,y,t^{\prime}),y,t^{\prime})\right\rVert_{2}<\epsilon$ whenever $\left\lVert(x,y)\right\rVert_{\infty}\leq R$ . ∎

Proposition 4.5.

Let $H(x,y,t)$ satisfy Assumption 4.1 and let $G_{0}\in(\overline{\operatorname{TrP}}_{m+n}^{1})_{sa}^{m}$ . Then there exists a unique continuous $F:L^{2}(\mathcal{R}^{\omega})_{sa}^{m+n}\times[0,T]\to L^{2}(\mathcal{R}^{\omega})_{sa}^{m}$ satisfying

[TABLE]

Moreover, $F(x,y,t)$ also satisfies Assumption 4.1.

Proof.

We define the Picard iterates $F_{\ell}$ inductively by

[TABLE]

The previous two lemmas imply that $F_{k}$ is well-defined and satisfies Assumption 4.1. Convergence of the Picard iterates follows from the standard proof of Picard-Lindelöf. Briefly, given that $H$ is $K$ -Lipschitz in $(x,y)$ with respect to $\left\lVert\cdot\right\rVert_{2}$ , we have

[TABLE]

Also, we have

[TABLE]

where $M(x,y)=\sup_{s\in[0,T]}\left\lVert H(G_{0}(x,y),y,s)\right\rVert_{2}$ , which is finite because of continuity of $H(G_{0}(x,y),y,t)$ in $t$ . From here a straightforward induction on $\ell$ shows that for $\ell\geq 1$ ,

[TABLE]

because $K\int_{0}^{t}K^{\ell-1}s^{\ell}/\ell!\,ds=K^{\ell}s^{\ell+1}/(\ell+1)!$ . Now because $\sum_{\ell=1}^{\infty}K^{\ell-1}s^{\ell}/\ell!$ converges, we know that

[TABLE]

and

[TABLE]

The fact that $F(x,y,t)$ satisfies the integral equation is straightforward, and the proof of the uniqueness of this $F$ is also standard.

It remains to show that $F$ satisfies Assumption 4.1. First, recall that $H(G_{0}(x,y),y,t)$ is Lipschitz in $(x,y)$ uniformly for all $t$ . If $K^{\prime}$ is a Lipschitz constant for this function, then

[TABLE]

In particular,

[TABLE]

This implies that the convergence of $F_{\ell}$ to $F$ occurs uniformly for $(x,y)$ with $\left\lVert(x,y)\right\rVert_{\infty}\leq R$ and all $t\in[0,T]$ . Then because $F_{\ell}(\cdot,\cdot,t)$ can be approximated in $\left\lVert\cdot\right\rVert_{u,R}$ by trace polynomials, the same must be true for $F(\cdot,\cdot,t)$ for each $t$ , which shows that $F$ satisfies (1). Similarly, because of the uniform convergence of $F_{\ell}$ to $F$ for $\left\lVert(x,y)\right\rVert_{\infty}\leq R$ and $t\in[0,T]$ , the uniform continuity property (3) for $F$ follows from property (3) for $F_{\ell}$ .

Finally, we must show (2) that $F$ is Lipschitz in $(x,y)$ . More precisely, we claim that

[TABLE]

Now it suffices to check that each Picard iterate $F_{\ell}$ satisfies this estimate. This can be verified by induction on $\ell$ . The base case $F_{0}(x,y,t)=G_{0}(x,y)$ is immediate. For the induction step, we observe that

[TABLE]

using the fact that $H$ is $K$ -Lipschitz. Then we plug in our induction hypothesis that $\left\lVert F_{\ell}(x,y,s)-F_{\ell}(x^{\prime},y^{\prime},s)\right\rVert_{2}$ is bounded by $e^{Kt}\left\lVert G_{0}(x,y)-G_{0}(x^{\prime},y^{\prime})\right\rVert_{2}+(e^{Kt}-1)\left\lVert y-y^{\prime}\right\rVert_{2}$ , and then directly evaluate the integral to close the induction. ∎

We have now shown that it makes sense to solve ODE for tuples in $(\overline{\operatorname{TrP}}_{m}^{1})_{sa}$ . There is a parallel list of results which instead deal with functions on $N\times N$ matrices that are asymptotically approximable as $N\to\infty$ . We use the following assumptions.

Assumption 4.6.

We are given $T>0$ and for each $N\in\mathbb{N}$ a function $H^{(N)}:M_{N}(\mathbb{C})_{sa}^{m}\times M_{N}(\mathbb{C})_{sa}^{n}\times[0,T]\to M_{N}(\mathbb{C})_{sa}^{m}$ such that

(1)

For each $t$ , there exists $H(\cdot,\cdot,t)\in(\overline{\operatorname{TrP}}_{m+n}^{1})_{sa}^{m}$ such that $H^{(N)}(\cdot,\cdot,t)\rightsquigarrow H(\cdot,\cdot,t)$ . 2. (2)

$H^{(N)}$ * is $\left\lVert\cdot\right\rVert_{2}$ -Lipschitz in $(x,y)$ with some Lipschitz constant $K$ independent of $t$ and $N$ .* 3. (3)

For every $R>0$ and for every $\epsilon>0$ , there exists $\delta>0$ , such that

[TABLE]

Proposition 4.7.

Let $\{H^{(N)}\}$ satisfy Assumption 4.6, and let $G_{0}^{(N)}:M_{N}(\mathbb{C})_{sa}^{m+n}\to M_{N}(\mathbb{C})_{sa}^{m}$ be asymptotically approximable such that $G_{0}^{(N)}\rightsquigarrow G_{0}$ and $G_{0}^{(N)}$ is $\left\lVert\cdot\right\rVert_{2}$ Lipschitz uniformly in $N$ . Then for each $N$ there is a unique $F^{(N)}:M_{N}(\mathbb{C})_{sa}^{m+n}\times[0,T]\to M_{N}(\mathbb{C})_{sa}^{m}$ satisfying

[TABLE]

Moreover, $\{F^{(N)}\}$ also satisfies Assumption 4.1. Furthermore, the vector field $H$ such that $H^{(N)}(\cdot,\cdot,t)\rightsquigarrow H(\cdot,\cdot,t)$ satisfies Assumption 4.1, and we have $F^{(N)}\rightsquigarrow F$ where $F$ is the solution given by Proposition 4.5.

Proof.

The proof of existence and uniqueness of the solution is almost identical to that of Proposition 4.5. First, one shows that Assumption 4.6 is preserved under integration and composition (analogous to Lemma 4.3 and 4.4). Then exactly as in the proof of Proposition 4.5, one defines Picard iterates, proves they converge, establishes Lipschitz bounds, and checks they satisfy Assumption 4.6. The one additional feature in these proofs is to make all the estimates uniform in $N$ . For instance, the quantity $M(x,y)$ in the proof of Proposition 4.5 is replaced by

[TABLE]

Then $H^{(N)}(G_{0}^{(N)}(x,y),y,t)$ has some Lipschitz constant $K^{\prime}$ independent of $N$ , and

[TABLE]

But then we can show that $\sup_{N}M^{(N)}(0,0)$ is finite. This is because if $\Phi^{(N)}(x,y,t)=H^{(N)}(G_{0}^{(N)}(x,y),y,t)$ , then $\sup_{N}\sup_{t}\left\lVert\Phi^{(N)}(\cdot,\cdot,t)\right\rVert_{u,R}^{(N)}$ is finite because of Assumption 4.1 (3) and the fact that $\Phi^{(N)}(x,y,0)$ is asymptotically approximable and hence bounded in $\left\lVert\cdot\right\rVert_{u,R}^{(N)}$ as $N\to\infty$ .

Now the fact that $H$ satisfies Assumption 4.1 is a straightforward limiting argument. The key ingredient is that if $f^{(N)}\rightsquigarrow f$ , then $\left\lVert f\right\rVert_{u,R}=\lim_{N\to\infty}\left\lVert f^{(N)}\right\rVert_{u,R}^{(N)}$ .

Finally, to show that $F^{(N)}\rightsquigarrow F$ , it suffices to show that for each of the Picard iterates $F_{\ell}^{(N)}\rightsquigarrow F_{\ell}$ because of the uniform convergence of $F_{\ell}^{(N)}\to F^{(N)}$ as $\ell\to\infty$ for $\left\lVert(x,y)\right\rVert_{\infty}\leq R$ , where the rate of convergence is also independent of $N$ . Furthermore, since the Picard iterates are defined inductively by composition and integration, it suffices to show that the asymptotic approximation relation $\rightsquigarrow$ is preserved by these operations. Preservation under integration follows because the integrals can be approximated by Riemann sums and this approximation is uniformly good for $\left\lVert(x,y)\right\rVert_{\infty}\leq R$ and for all $N$ because of the uniform continuity Assumption 4.6 (3). Preservation under composition follows from Lemma 3.12. ∎

4.2. The Heat Semigroup

Recall that the solution to the classical heat equation is given by convolution the heat kernel (which is given by a Gaussian probability density). In particular, let $\sigma_{m,t}^{(N)}$ be the probability distribution of an $m$ -tuple of independent GUE matrices $(S_{1}^{(N)},\dots,S_{m}^{(N)})$ such that $E[\tau_{N}[(S_{j}^{(N)})^{2}]]=t$ , which is given by density $(1/Z^{(N)})e^{-\left\lVert x\right\rVert_{2}^{2}/2t}\,dx$ . If $u_{0}:M_{N}(\mathbb{C})_{sa}^{m}\to\mathbb{C}$ , then $u_{t}:=u_{0}*\sigma_{t}^{(N)}$ solves the normalized heat equation

[TABLE]

Here $u_{0}*\sigma_{m,t}^{(N)}$ is meant in the sense of convolving a function with a measure, and this is the same as convolving of $u_{0}$ with the density function for $\sigma_{m,t}^{(N)}$ . The meaning of $\Delta$ is to be interpreted using coordinates with respect to some orthonormal basis of $M_{N}(\mathbb{C})_{sa}$ in the inner product $\langle x,y\rangle=\operatorname{Tr}(xy)$ ; this is not the same as differentiating entrywise since some of the entries are real and some are complex.

Our goal is to describe the large $N$ behavior of $u^{(N)}*\sigma_{m,t}^{(N)}$ when $\{u^{(N)}\}$ is asymptotically approximable by trace polynomials, and to define “ $u\boxplus\sigma_{m,t}$ ” when $u\in\overline{\operatorname{TrP}}_{m}^{j}$ .

In [29, §3.2 and 3.3], using similar methods to [10], we explained the computation of $(1/N)\Delta f$ as a function on $M_{N}(\mathbb{C})_{sa}^{m}$ when $f\in\operatorname{TrP}_{m}^{0}$ or $\operatorname{TrP}_{m}^{1}$ . More precisely, let $\Delta_{j}f(x_{1},\dots,x_{m})$ denote the Laplacian with respect to the coordinates of the matrix $x_{j}$ . We found that for $j=1,\dots,m$ there are linear maps $L_{j}^{(N)},L_{j}:\operatorname{TrP}_{m}^{0}\to\operatorname{TrP}_{m}^{0}$ defined purely algebraically, such that $(1/N)\Delta_{j}f=L_{j}^{(N)}f$ when $f$ is viewed as a function on $M_{N}(\mathbb{C})_{sa}^{m}$ , $L_{j}^{(N)}$ and $L_{j}$ do not increase the degree of a trace polynomial, and $\lim_{N\to\infty}L_{j}^{(N)}f=L_{j}f$ coefficient-wise.

A similar analysis holds for the Laplacian of $f\in\operatorname{TrP}_{m}^{1}$ viewed as a function $M_{N}(\mathbb{C})_{sa}^{m}\to M_{N}(\mathbb{C})$ . Here we follow the standard convention of using the same symbol $\Delta$ for the Laplacians of vector-valued functions as for the Laplacians of scalar-valued functions; thus, the reader must be careful to distinguish scalar-valued and vector-valued functions based on context. We saw that there were linear transformations $L_{j}^{(N)},L_{j}:\operatorname{TrP}_{m}^{1}\to\operatorname{TrP}_{m}^{1}$ such that $(1/N)\Delta_{j}f=L_{j}^{(N)}f$ as a function on matrices, $L_{j}^{(N)}$ and $L_{j}$ do not increase degree, and $L_{j}^{(N)}f\to L_{j}f$ coefficient-wise.

We deduced as a consequence that $e^{L^{(N)}t/2}f=f*\sigma_{m,t}^{(N)}$ has a well-defined large $N$ limit if $f$ is a trace polynomial [29, Lemma 3.21], and that if $\{u^{(N)}\}$ is asymptotically approximable by trace polynomials, then so is $\{u^{(N)}*\sigma_{m,t}^{(N)}\}$ [29, Lemma 3.28].

In order to establish “conditional versions” of our earlier results, we must consider trace polynomials $f(x_{1},\dots,x_{m},y_{1},\dots,y_{n})$ in $m+n$ variables and take the Laplacian with respect to $x=(x_{1},\dots,x_{m})$ while treating $y=(y_{1},\dots,y_{n})$ as an auxiliary parameter. We denote by $\Delta_{x}=\sum_{j=1}^{m}\Delta_{x_{j}}$ , $L_{x}^{(N)}=\sum_{j=1}^{m}L_{x_{j}}^{(N)}$ , and $L_{x}=\sum_{j=1}^{m}L_{x_{j}}$ the various Laplacian operators with respect to $x$ .

Because $L_{x}^{(N)}$ and $L_{x}$ map the finite-dimensional vector space trace polynomials of degree $\leq d$ into itself, there are well-defined linear operators $e^{tL_{x}^{(N)}/2}$ and $e^{tL_{x}/2}$ on the space of trace polynomials in $\operatorname{TrP}_{m+n}^{j}$ of degree $\leq d$ for each $j=0,1$ each $d\in\mathbb{N}$ , and each real $t\geq 0$ . Since trace polynomials are the union of the subspaces of trace polynomials with degree $\leq d$ , there are linear operators $e^{tL_{x}^{(N)}/2},e^{tL_{x}/2}:\operatorname{TrP}_{m+n}^{j}\to\operatorname{TrP}_{m+n}^{j}$ . Moreover, these operators form a semigroup, and they satisfy the following property, which is an extension of [10, Theorem 2.4] to the spaces $\overline{\operatorname{TrP}}_{m}^{j}$ .

Lemma 4.8.

Let $(X,Y)$ be a random variable in $M_{N}(\mathbb{C})_{sa}^{m+n}$ with finite moments, and let $S\sim\sigma_{m,t}^{(N)}$ be an independent GUE random variable in $M_{N}(\mathbb{C})_{sa}^{m}$ . Then we have

[TABLE]

Similarly, suppose that $(X,Y)$ is a tuple of self-adjoint non-commutative random variables, and let $S$ be a freely independent tuple with non-commutative law $\sigma_{m,t}$ . Then

[TABLE]

and

[TABLE]

where $E_{\mathrm{W}^{*}(X,Y)}:\mathrm{W}^{*}(X,Y,S)\to\mathrm{W}^{*}(X,Y)$ is the unique trace-preserving conditional expectation.

Proof.

Since $S$ is independent and distributed according to $\sigma_{m,t}^{(N)}$ , we have

[TABLE]

On the other hand, for $(x,y)\in M_{N}(\mathbb{C})_{sa}^{m+n}$ ,

[TABLE]

because both sides are the solution to the heat equation on the space of coordinate-wise polynomials on $M_{N}(\mathbb{C})_{sa}^{m+n}$ of degree $\leq d$ . This shows (4.2).

To prove the free versions, we assume familiarity with the results of free probability (see e.g. [55], [38], [1, Chapter 5]). Suppose that $(X,Y)$ are non-commutative random variables and $S_{t}$ is a freely independent free semicircular $m$ -tuple with law $\sigma_{m,t}^{(N)}$ . We may assume that $(S_{t})_{t\geq 0}$ is a free Brownian motion, so that $S_{t}-S_{s}\sim S_{t-s}$ for $0\leq s\leq t$ and $S_{t}\sim t^{1/2}S_{1}$ . Note that $e^{-tL_{x}/2}$ is a well-defined operator on trace polynomials. To prove (4.3), it suffices to show that $[e^{tL_{x}/2}f](X+S_{t},Y)=f(X,Y)$ for $f\in\operatorname{TrP}_{m}^{0}$ . This will follow if we check that

[TABLE]

From a free probabilistic computation sketched in [29, Lemma 3.23], we have

[TABLE]

and hence

[TABLE]

Next, to prove (4.4), it suffices to show that for $g\in\operatorname{TrP}_{m+n}^{1}$ , we have

[TABLE]

since functions of the form $g(X,Y)$ for $g\in\operatorname{TrP}_{m}^{1}$ are dense in $L^{2}(\mathrm{W}^{*}(X,Y))$ . Consider the function $F\in\operatorname{TrP}_{m+n+m}$ given by $F(x,y,x^{\prime})=\tau(f(x,y)g(x^{\prime},y))$ . Notice that

[TABLE]

Here the first equality is checked directly from the definition of the Laplacian [29, see Def. 3.13 and 3.16, proof of Lemma 3.18]. The equality $L_{x}[f(x,y)g(x^{\prime},y)]=L_{x}[f(x,y)]g(x^{\prime},y)$ again is checked from the definition of the Laplacian; this equality is intuitive since $g(x^{\prime},y)$ is independent of $x$ . Since the same reasoning may be applied to compute the Laplacian $L_{x}$ of $\tau([e^{tL_{x}/2}f](x,y)g(x,y))$ , we have

[TABLE]

We can view $F(x,y,x^{\prime})$ as a function of the $m$ -tuple $x$ and the $(n+m)$ -tuple $(y,x^{\prime})$ , that is, an element of $\operatorname{TrP}_{m+(n+m)}^{1}$ . We apply (4.3) to $f$ and the pair $(X,(Y,X))$ and obtain

[TABLE]

which means precisely that

[TABLE]

which completes the proof of (4.4). ∎

*Remark 4.9**.*

The free conditional expectation formulas (4.3) and (4.4) could also be proved using random matrices provided that $\mathrm{W}^{*}(X,Y)$ is $\mathcal{R}^{\omega}$ -embeddable. Indeed, let $(X^{(N)},Y^{(N)})$ be (deterministic) tuples of matrices with non-commutative laws converging to the law of $(X,Y)$ and let $S^{(N)}\sim\sigma_{m,t}^{(N)}$ . Then to prove (4.3) for instance, we could use the fact that $E[f(X^{(N)}+S^{(N)},Y^{(N)})=[e^{tL_{x}^{(N)}/2}f](X^{(N)},Y^{(N)})$ and take the limit as $N\to\infty$ using Voiculescu’s theorem on asymptotic freeness [52, Theorem 2.2]. A similar proof could be done for (4.4).

Lemma 4.10.

If $f\in\operatorname{TrP}_{m+n}^{j}$ for $j=0,1$ , then we have $\left\lVert e^{tL_{x}/2}f\right\rVert_{u,R}\leq\left\lVert f\right\rVert_{u,R+2t^{1/2}}$ for $t\geq 0$ . In particular, $f\mapsto e^{tL_{x}/2}f$ extends to a unique continuous linear operator $\overline{\operatorname{TrP}}_{m+n}^{j}\to\overline{\operatorname{TrP}}_{m+n}^{j}$ .

Proof.

Let $(X,Y)\in(\mathcal{R}^{\omega})_{sa}^{m+n}$ with $\left\lVert(X,Y)\right\rVert_{\infty}\leq R$ . Let $S\sim\sigma_{m,t}$ be a freely independent semicircular tuple. If $f\in\operatorname{TrP}_{m+n}^{0}$ , then

[TABLE]

Since $\left\lVert S\right\rVert_{\infty}=2t^{1/2}$ , we have $\left\lVert(X+S,Y)\right\rVert_{\infty}\leq R+2t^{1/2}$ . Therefore, $\left\lVert e^{-tL_{x}/2}f\right\rVert_{u,R}\leq\left\lVert f\right\rVert_{u,R+2t^{1/2}}$ as desired. Similarly, if $f\in\operatorname{TrP}_{m+n}^{1}$ , then we check $\left\lVert e^{-tL_{x}/2}f\right\rVert_{u,R}\leq\left\lVert f\right\rVert_{u,R+2t^{1/2}}$ using the conditional expectation formula (4.4). Now the continuous extension of $e^{tL_{x}/2}$ to $\overline{\operatorname{TrP}}_{m}^{j}$ is immediate. ∎

The semigroup $e^{tL_{x}/2}$ acting on $\overline{\operatorname{TrP}}_{m+n}^{1}$ describes the large $N$ limit of the Gaussian convolution semigroup on $M_{N}(\mathbb{C})_{sa}^{N}$ defined as follows.

Definition 4.11.

For $f:M_{N}(\mathbb{C})_{sa}^{m+n}\to\mathbb{C}$ or $M_{N}(\mathbb{C})$ , we denote

[TABLE]

Moreover, we denote by $P_{t}^{\operatorname{TrP}}:\overline{\operatorname{TrP}}_{m}^{j}\to\overline{\operatorname{TrP}}_{m}^{j}$ the continuous extension of $e^{tL_{x}/2}$ .

Lemma 4.12.

Suppose that $f^{(N)}:M_{N}(\mathbb{C})_{sa}^{m+n}\to\mathbb{C}$ is asymptotically approximable by trace polynomials and $f^{(N)}\rightsquigarrow f\in\overline{\operatorname{TrP}}_{m}^{0}$ . Furthermore, assume that for some $A,B>0$ and $k\in\mathbb{N}$ , we have

[TABLE]

Then $P_{t}^{(N)}f^{(N)}\rightsquigarrow P_{t}^{\operatorname{TrP}}f$ . The same holds for $f^{(N)}:M_{N}(\mathbb{C})_{sa}^{m+n}\to M_{N}(\mathbb{C})$ and $f\in\overline{\operatorname{TrP}}_{m}^{1}$ with $|f^{(N)}(x)|$ replaced by $\left\lVert f^{(N)}(x)\right\rVert_{2}$ .

The proof of this lemma is the same as in [29, Lemma 3.28].

*Remark 4.13**.*

In both the scalar-valued and matrix-valued cases, the assumption (4.5) holds automatically with $k=1$ provided that $f^{(N)}$ and $f$ are $\left\lVert\cdot\right\rVert_{2}$ -uniformly continuous (with modulus of continuity independent of $N$ ). Let us focus on the matrix-valued case of $\operatorname{TrP}_{m}^{1}$ , there exists $\delta>0$ such that

[TABLE]

In particular, given $x\in(\mathcal{R}^{\omega})_{sa}^{m}$ , we can choose an integer $j$ such that $j\delta<\left\lVert x\right\rVert_{2}\leq 2j\delta$ . Then we have

[TABLE]

Thus,

[TABLE]

which implies the first estimate of (4.5). The case for $f^{(N)}$ is handled similarly, and we note that $\left\lVert f^{(N)}(0)\right\rVert_{2}$ is bounded as $N\to\infty$ because of our assumption that $f^{(N)}\rightsquigarrow f$ . The same argument works in the case of scalar-valued functions and $f\in\overline{\operatorname{TrP}}_{m}^{0}$ .

5. Conditional Expectation for Free Gibbs States

5.1. Free Gibbs States from Convex Potentials

In [29] and in the present work, we focus on the following situation:

Assumption 5.1.

We are given $0<c\leq C$ and $V^{(N)}:M_{N}(\mathbb{C})_{sa}^{m}\to\mathbb{R}$ such that

(1)

$HV^{(N)}\geq c$ , that is, $V^{(N)}(x)-\frac{1}{2}c\left\lVert x\right\rVert_{2}^{2}$ is convex. 2. (2)

$HV^{(N)}\leq C$ , that is, $V^{(N)}(x)-\frac{1}{2}C\left\lVert x\right\rVert_{2}^{2}$ is concave. 3. (3)

$\{DV^{(N)}\}_{N\in\mathbb{N}}$ * is asymptotically approximable by trace polynomials.*

We denote by $\mu^{(N)}$ the probability measure on $M_{N}(\mathbb{C})_{sa}^{m}$ given by

[TABLE]

Furthermore, we assume that the mean $\int x_{j}\,d\mu^{(N)}(x)$ is a scalar multiple of the identity matrix.

The following was proved in [29, Theorem 4.1].

Theorem 5.2.

Let $V^{(N)}$ and $\mu^{(N)}$ be as in Assumption 5.1. Then there exists a non-commutative law $\lambda$ such that for every non-commutative polynomial $p$ , we have

[TABLE]

Moreover, we have for every $R>0$ and $\epsilon>0$ that

[TABLE]

Corollary 5.3.

Let $\mu^{(N)}$ and $\lambda$ be as in Theorem 5.2. Let $X^{(N)}$ be a random $m$ -tuple of matrices distributed according to $\mu^{(N)}$ and let $X$ be a non-commutative random $m$ -tuple distributed according to $\lambda$ . Let $f^{(N)},g^{(N)}:M_{N}(\mathbb{C})_{sa}^{m}\to M_{N}(\mathbb{C})$ . Suppose there are constants $A$ and $B>0$ and $k\in\mathbb{N}$ such that

[TABLE]

Suppose that $f^{(N)}\rightsquigarrow f$ and $g^{(N)}\rightsquigarrow g$ where $f,g\in\overline{\operatorname{TrP}}_{m}^{1}$ . Then

[TABLE]

Proof.

Let $a_{j}^{(N)}=E[X_{j}^{(N)}]$ which we assumed to be a scalar multiple of the identity, and which we know has a limit as $N\to\infty$ . By Lemma 2.12, we have

[TABLE]

In particular, letting $R>\sup_{N,j}|a_{j}^{(N)}|+c^{-1/2}\Theta$ , we have

[TABLE]

and

[TABLE]

Therefore, in order to prove convergence of the expectation, it suffices to check that $\tau_{N}(f^{(N)}(X^{(N)})g^{(N)}(X^{(N)}))$ converges in probability to $\tau(f(X)g(X))$ .

We already know that $\tau_{N}(p(X^{(N)}))$ converges to $\tau(p(X))$ in probability for every non-commutative polynomial $p$ . It follows that if $u$ is a scalar-valued trace polynomial, then $u(X^{(N)})\to u(X)$ in probability. This also holds for $u\in\overline{\operatorname{TrP}}_{m}^{1}$ ; indeed, we know that $\left\lVert X^{(N)}\right\rVert_{\infty}\leq R$ with probability tending to $1$ and $\left\lVert X\right\rVert_{\infty}\leq R$ , whereas $u$ can be approximated in $\left\lVert\cdot\right\rVert_{u,R}$ by trace polynomials. Finally, if $u^{(N)}$ is a sequence of scalar-valued function such that $u^{(N)}\rightsquigarrow u\in\overline{\operatorname{TrP}}_{m}^{0}$ , then $u^{(N)}(X^{(N)})-u(X^{(N)})$ converges to [math] in probability, and hence $u^{(N)}(X^{(N)})$ converges in probability to $u(X)$ . By Lemma 3.9, we can apply this statement to $u^{(N)}=\tau_{N}(f^{(N)}g^{(N)})$ and $u=\tau(fg)$ , which completes the argument. ∎

Definition 5.4.

Let $V\in\overline{\operatorname{TrP}}_{m}^{0}$ and suppose $V$ extends to a function $L^{2}(\mathcal{R}^{\omega})_{sa}^{m}\to\mathbb{R}$ such that $V(x)-(c/2)\left\lVert x\right\rVert_{2}^{2}$ is convex and $V(x)-(C/2)\left\lVert x\right\rVert_{2}^{2}$ is concave. In this case, $V$ is differentiable as a function on the real Hilbert space $L^{2}(\mathcal{R}^{\omega})_{sa}^{m}$ , as a consequence of the existence of supporting hyperplanes for convex functions on a Hilbert space. If we assume also that $DV\in\overline{\operatorname{TrP}}_{m}^{1}$ , then we say that $V\in\mathcal{E}_{m}^{\operatorname{TrP}}(c,C)$ .

*Remark 5.5**.*

We did not prove or assume that the trace polynomials which approximate $DV$ are the gradients of the same trace polynomials that approximate $V$ . Thus, this definition is technically different from that of [29, §8.2].

Definition 5.6.

If $V\in\mathcal{E}_{m}^{\operatorname{TrP}}(c,C)$ , then we may define $V^{(N)}=V|_{M_{N}(\mathbb{C})_{sa}^{m}}$ , and in this case $DV^{(N)}=DV|_{M_{N}(\mathbb{C})_{sa}^{m}}$ . Clearly, $DV^{(N)}$ is asymptotically approximable by trace polynomials, and so by Theorem 5.2, there exists a non-commutative law $\lambda_{V}$ that arises as the large $N$ limit of the associated random matrix models. Furthermore, the limiting free Gibbs law $\lambda_{V}$ only depends on $V$ , that is, every approximating sequence of functions $V^{(N)}\in\mathcal{E}_{m}^{(N)}(c,C)$ will produce the same free Gibbs law (see [29, §8.2]). We call $\lambda_{V}$ the free Gibbs state given by potential $V$ .

*Remark 5.7**.*

One can check that if $V^{(N)}$ is as in Assumption 5.1, then there exists a $V\in\mathcal{E}_{m}^{\operatorname{TrP}}(c,C)$ such that $V^{(N)}\rightsquigarrow V$ and $DV^{(N)}\rightsquigarrow DV$ . Thus, the non-commutative laws that arise from these random matrix models are precisely $\lambda_{V}$ for $V\in\mathcal{E}_{m}^{\operatorname{TrP}}(c,C)$ .

*Remark 5.8**.*

Since $\lambda_{V}$ is independent of the choice of approximating sequence $V^{(N)}$ , we can in particular take $V^{(N)}=V|_{M_{N}(\mathbb{C})_{sa}^{m}}$ , which produces a canonical unitarily invariant sequence of random matrices models.

5.2. Main Result on Conditional Expectation

Our main result in this section is in some sense a generalization of [29, Theorem 4.1], which deals with conditional expectations rather than expectations. The proof of the earlier theorem was reduced to the following statement: Suppose $V^{(N)}$ satisfies Assumption 5.1 and that $u^{(N)}:M_{N}(\mathbb{C})_{sa}^{m}\to\mathbb{C}$ is $\left\lVert\cdot\right\rVert_{2}$ -Lipschitz (uniformly in $N$ ) and asymptotically approximable by trace polynomials. Then

[TABLE]

Now, our goal is to prove the following.

Theorem 5.9.

Consider functions $V^{(N)}:M_{N}(\mathbb{C})_{sa}^{m+n}\to\mathbb{R}$ , denoted as $V^{(N)}(x,y)$ , which satisfy Assumption 5.1 as functions of $(x,y)$ . Let $\mu^{(N)}$ be the associated probability measure on $M_{N}(\mathbb{C})_{sa}^{m+n}$ . Let $(X^{(N)},Y^{(N)})$ be an $(m+n)$ -tuple of random matrices distributed according to $\mu^{(N)}$ , and let $(X,Y)$ be a $(m+n)$ -tuple of non-commutative random variables distributed according to the limiting free Gibbs law $\lambda$ given by Theorem 5.2

Let $f^{(N)}:M_{N}(\mathbb{C})_{sa}^{m+n}\to M_{N}(\mathbb{C})$ be $\left\lVert\cdot\right\rVert_{2}$ -Lipschitz (uniformly in $N$ ) and suppose $f^{(N)}\rightsquigarrow f\in\overline{\operatorname{TrP}}_{m+n}^{1}$ . Let $g^{(N)}$ be the function given by

[TABLE]

which is well-defined function $M_{N}(\mathbb{C})_{sa}^{m}$ because $\mu^{(N)}$ has positive density everywhere. Then $g^{(N)}$ is Lipschitz with

[TABLE]

Moreover, there exists $g\in\overline{\operatorname{TrP}}_{m}^{1}$ such that $g^{(N)}\rightsquigarrow g$ and hence

[TABLE]

The gist of the theorem is that the conditional expectation $E[\cdot|Y^{(N)}]$ behaves in the large $N$ limit like the $\mathrm{W}^{*}$ -algebraic expectation $\mathrm{W}^{*}(X,Y)\to\mathrm{W}^{*}(Y)$ . For instance, if $f\in\overline{\operatorname{TrP}}_{m+n}^{1}$ is globally Lipschitz in $\left\lVert\cdot\right\rVert_{2}$ , then the $\mathrm{W}^{*}$ -algebraic conditional expectation of $f(X,Y)$ can be approximated by the classical conditional expectation $E[f(X^{(N)},Y^{(N)})|Y^{(N)}]$ .

In fact, we can approximate $E_{\mathrm{W}^{*}}(Z)$ for every $Z\in L^{2}(\mathrm{W}^{*}(X,Y))$ using classical conditional expectations in the same sense. Indeed, we showed in Proposition 3.14 that every $Z$ can be expressed as $f(X,Y)$ where $f\in\overline{\operatorname{TrP}}_{m}^{1}$ is $\left\lVert\cdot\right\rVert_{2}$ -uniformly continuous, and there exist $\left\lVert\cdot\right\rVert_{2}$ -Lipschitz functions $f_{k}\in\overline{\operatorname{TrP}}_{m}^{1}$ such that $f_{k}\to f$ with respect to the uniform norm $\left\lVert\cdot\right\rVert_{u}$ . Let $g_{k}^{(N)}$ and $g^{(N)}$ be given by

[TABLE]

and the analogous relation for $g^{(N)}$ and $f$ . Because conditional expectation is a contraction in $L^{\infty}(\mu^{(N)})$ (for functions taking values in $M_{N}(\mathbb{C})$ with $\left\lVert\cdot\right\rVert_{2}$ ), we have

[TABLE]

By the theorem, there exists $g_{k}\in\overline{\operatorname{TrP}}_{m}^{1}$ such that $g_{k}^{(N)}\rightsquigarrow g_{k}$ . Given that $\left\lVert g_{k}^{(N)}-g^{(N)}\right\rVert_{u}^{(N)}\leq\left\lVert f_{k}-f\right\rVert_{u}\to 0$ , a routine argument (“exchange of limits and uniform limits”) shows that there exists $g\in\overline{\operatorname{TrP}}_{m}^{1}$ such that $g^{(N)}\rightsquigarrow g$ . In other words, the conclusion of Theorem 5.9 holds also for $f$ and thus $E_{\mathrm{W}^{*}(Y)}[Z]=E_{\mathrm{W}^{*}(Y)}[f(X,Y)]$ can be viewed as the large $N$ limit of $E[f(X^{(N)},Y^{(N)})|Y^{(N)}]$ .

5.3. Strategy

Our proof will follow the same strategy as the special case in [29, §4]. In that paper, we showed that if $V^{(N)}$ and $\mu^{(N)}$ on $M_{N}(\mathbb{C})_{sa}^{m}$ are as in Assumption 5.1 and if $u^{(N)}:M_{N}(\mathbb{C})_{sa}^{m}\to\mathbb{C}$ is uniformly Lipschitz and asymptotically approximable by trace polynomials, then $\lim_{N\to\infty}\int u^{(N)}\,d\mu^{(N)}$ exists.

We considered the diffusion semigroup $T_{t}^{(N)}=T_{t}^{V^{(N)}}$ that solves the equation

[TABLE]

As mentioned in [29, §4], this diffusion semigroup has an equivalent SDE formulation, and is a standard tool in proving the log-Sobolev inequality and concentration estimates (see for instance, [32], [1, §4.4.2], [14]).

Now $\int T_{t}^{(N)}u^{(N)}\,d\mu^{(N)}=\int u^{(N)}\,d\mu^{(N)}$ and $\left\lVert T_{t}^{(N)}u^{(N)}\right\rVert_{\operatorname{Lip}}\leq e^{-ct/2}\left\lVert u^{(N)}\right\rVert_{\operatorname{Lip}}$ . As $t\to\infty$ , the function $T_{t}^{(N)}u^{(N)}$ converges to the constant function $\int u^{(N)}\,d\mu^{(N)}$ at a rate independent of $N$ . On the other hand, we showed in [29, Lemma 4.10] that if $\{u^{(N)}\}_{N\in\mathbb{N}}$ and $\{DV^{(N)}\}_{N\in\mathbb{N}}$ are asymptotically approximable by trace polynomials, then so is $\{T_{t}^{(N)}u^{(N)}\}_{N\in\mathbb{N}}$ . Hence, we concluded that the sequence of constant functions $\{\int u^{(N)}\,d\mu^{(N)}\}$ is asymptotically approximable by trace polynomials, which means that the limit as $N\to\infty$ exists.

Now we apply the same method in the conditional setting to prove Theorem 5.9. Let $V^{(N)}(x,y)$ be a function satisfying Assumption 5.1. If we fix $y$ , then $V^{(N)}(\cdot,y)$ is uniformly convex and semi-concave function of $x$ , so it defines a log-concave probability measure on $M_{N}(\mathbb{C})_{sa}^{m}$ . This produces a well-behaved conditional distribution of $X^{(N)}$ given $Y^{(N)}$ , where $(X^{(N)},Y^{(N)})\sim\mu^{(N)}$ . Explicitly, for $f\in L^{1}(\mu^{(N)},M_{N}(\mathbb{C}))$ , we have

[TABLE]

We will evaluate this conditional expectation as the limit as $t\to\infty$ of $T_{t}^{(N)}f$ , where $T_{t}^{(N)}=T_{t}^{V^{(N)}}$ is the semigroup, acting on Lipschitz functions of $(x,y)$ , that solves

[TABLE]

where $J_{x}(T_{t}^{(N)}f)$ denotes the differential (Jacobian) of $T_{t}^{(N)}f$ as a function $x$ from $M_{N}(\mathbb{C})_{sa}^{m}$ to $M_{N}(\mathbb{C})$ and $*$ denotes the adjoint. In §5.4, we will analyze how $T_{t}^{(N)}$ affects the Lipschitz norms with respect to $x$ and $y$ separately and hence show that the conditional expectation is given by a Lipschitz function of $y$ . In §5.5, we will show that $T_{t}^{(N)}$ preserves asymptotic approximability by trace polynomials of $(x,y)$ and conclude our argument. The new aspect compared to [29] is that the functions are matrix-valued and depend on an extra parameter $y$ .

5.4. Conditional Diffusion Semigroup

To simplify notation, let us fix $N$ and fix $V:M_{N}(\mathbb{C})_{sa}^{m}\times M_{N}(\mathbb{C})_{sa}^{n}\to\mathbb{R}$ for the remainder of §5.4. We will denote

[TABLE]

which is a measure on $M_{N}(\mathbb{C})_{sa}^{m}$ depending on the parameter $y$ . The associated semigroup $T_{t}$ will be approximated by alternating two other operators $P_{t}$ and $S_{t}$ on short time intervals. Let $P_{t}$ denote the semigroup of convolution with Gaussian with respect to $x$ , that is,

[TABLE]

The semigroup $S_{t}$ is given by

[TABLE]

where $W_{t}:M_{N}(\mathbb{C})_{sa}^{m}\times M_{N}(\mathbb{C})_{sa}^{n}\to M_{N}(\mathbb{C})_{sa}^{n}$ is the solution to the initial value problem

[TABLE]

This solution is defined for all $t\geq 0$ by the Picard-Lindelöf theorem because $D_{x}V(x,y)$ is globally Lipschitz in $x$ (compare §4.1).

Proposition 5.10.

There exists a semigroup $T_{t}$ acting on Lipschitz functions $M_{N}(\mathbb{C})_{sa}^{m}\times M_{N}(\mathbb{C})_{sa}^{n}$ such that the following hold:

(1)

If $t=n/2^{\ell}$ is a dyadic rational, let $T_{t,\ell}f=(P_{2^{-\ell}}S_{2^{-\ell}})^{n}$ . Then $T_{t,\ell}f\to T_{t}f$ as $\ell\to\infty$ and more precisely

[TABLE] 2. (2)

If $0\leq s\leq t$ , we have

[TABLE] 3. (3)

$\left\lVert T_{t}f(\cdot,y)\right\rVert_{\operatorname{Lip}}\leq e^{-ct/2}\left\lVert f(\cdot,y)\right\rVert_{\operatorname{Lip}}$ . 4. (4)

$\int T_{t}f(x,y)\,d\mu(x|y)=\int f(x,y)\,d\mu(x|y)$ . 5. (5)

We have $T_{t}f(x,y)\to\int f(x^{\prime},y)\,d\mu(x^{\prime}|y)$ as $t\to\infty$ and specifically

[TABLE]

Proof.

These results follow by freezing the variable $y$ and applying the results from our previous paper, specifically,

(1)

see [29, Lemma 4.5], 2. (2)

see [29, Lemma 4.6], 3. (3)

see [29, Lemma 4.6], 4. (4)

see [29, Lemma 4.8], 5. (5)

see [29, Lemma 4.9].

The results of [29, §4] were stated only for scalar-valued functions. However, the arguments hold for functions from $M_{N}(\mathbb{C})_{sa}^{m}$ to any finite-dimensional normed vector space. The result (4) that $T_{t}$ is expectation-preserving follows immediately by applying the scalar-valued result to each coordinate of the vector-valued function in some basis. To verify the estimates, one simply replaces the “ $|\cdot|$ ” in the arguments by the appropriate norm, which in our case would be $\left\lVert\cdot\right\rVert_{2}$ on $M_{N}(\mathbb{C})$ . ∎

We will next show that $W_{t}(x,y)$ and $T_{t}f(x,y)$ depend in a Lipschitz manner upon $y$ . Let us denote

[TABLE]

Lemma 5.11.

With the setup above, we have for Lipschitz $f:M_{N}(\mathbb{C})_{sa}^{m}\times M_{N}(\mathbb{C})_{sa}^{n}\to M_{N}(\mathbb{C})$

(1)

$\left\lVert W_{t}\right\rVert_{\operatorname{Lip},dx}\leq e^{-ct/2}$ * and $\left\lVert W_{t}\right\rVert_{\operatorname{Lip},dy}\leq(C/c)(1-e^{-ct/2})$ .* 2. (2)

$\left\lVert S_{t}f\right\rVert_{\operatorname{Lip},dx}\leq e^{-ct/2}\left\lVert f\right\rVert_{\operatorname{Lip},dx}$ . 3. (3)

$\left\lVert S_{t}f\right\rVert_{\operatorname{Lip},dy}\leq\left\lVert f\right\rVert_{\operatorname{Lip},dy}+(C/c)(1-e^{-ct/2})\left\lVert f\right\rVert_{\operatorname{Lip},dx}$ . 4. (4)

$\left\lVert P_{t}f\right\rVert_{\operatorname{Lip},dy}\leq\left\lVert f\right\rVert_{\operatorname{Lip},dy}$ * and $\left\lVert P_{t}f\right\rVert_{\operatorname{Lip},dx}\leq\left\lVert f\right\rVert_{\operatorname{Lip},dx}$ .* 5. (5)

$\left\lVert T_{t}f\right\rVert_{\operatorname{Lip},dx}\leq e^{-ct/2}\left\lVert f\right\rVert_{\operatorname{Lip},dx}$ . 6. (6)

$\left\lVert T_{t}f\right\rVert_{\operatorname{Lip},dy}\leq\left\lVert f\right\rVert_{\operatorname{Lip},dy}+(C/c)(1-e^{-ct/2})\left\lVert f\right\rVert_{\operatorname{Lip},dx}$ .

Proof.

(1) Fix $x,x^{\prime}\in M_{N}(\mathbb{C})_{sa}^{m}$ and $y,y^{\prime}\in M_{N}(\mathbb{C})_{sa}^{n}$ . Define

[TABLE]

Note that $\phi$ is locally Lipschitz in $t$ and hence absolutely continuous. Moreover, $\phi(t)^{2}$ is $C^{1}$ with

[TABLE]

Here we have employed the inequality $\langle D_{x}V(z,w)-D_{x}V(z^{\prime},w),z-z^{\prime}\rangle_{2}\geq c\lVert z-z^{\prime}\rVert_{2}^{2}$ coming from the uniform convexity of $V$ as well as the Cauchy-Schwarz inequality. This implies that

[TABLE]

Thus, $\phi^{\prime}(t)\leq-(c/2)\phi(t)+(C/2)\left\lVert y-y^{\prime}\right\rVert$ , so that $\partial_{t}[e^{ct/2}\phi(t)]\leq(C/2)e^{ct/2}\left\lVert y-y^{\prime}\right\rVert_{2}$ . This implies that

[TABLE]

But $\phi(t)=\left\lVert W_{t}(x,y)-W_{t}(x^{\prime},y^{\prime})\right\rVert_{2}$ and $\phi(0)=\left\lVert x-x^{\prime}\right\rVert_{2}$ . Hence,

[TABLE]

This proves both estimates of (1).

(2) This is immediate since $S_{t}f(x,y)=f(W_{t}(x,y),y)$ , as in [29, Lemma 4.4 (5)].

(3) Note that

[TABLE]

(4) This follows from basic properties of convolution of a function with a probability measure.

(5) By iterating the estimates (2) and (4), we obtain $\left\lVert T_{t,\ell}f\right\rVert_{\operatorname{Lip},dx}\leq e^{-ct/2}\left\lVert f\right\rVert_{\operatorname{Lip},dx}$ . Then by Proposition 5.10 (2) and (3) we may take $\ell\to\infty$ and then extend to all real values of $t\geq 0$ .

(6) First, consider $T_{t,\ell}$ for a dyadic rational $t=n/2^{\ell}$ . Denote $\delta=2^{-\ell}$ . For $j=0$ , …, $n-1$ , we have

[TABLE]

where the last inequality follows from (3). Therefore, by induction

[TABLE]

In light of Proposition 5.10 (1), we can take $\ell\to\infty$ and conclude that $\left\lVert T_{t}f\right\rVert_{\operatorname{Lip},dy}\leq\left\lVert f\right\rVert_{\operatorname{Lip},dy}+(C/c)(1-e^{-ct/2})\left\lVert f\right\rVert_{\operatorname{Lip},dx}$ for dyadic rational $t$ . This inequality can then be extended to all real $t\geq 0$ by Proposition 5.10 (2). ∎

Corollary 5.12.

Let $f:M_{N}(\mathbb{C})_{sa}^{m}\times M_{N}(\mathbb{C})_{sa}^{n}\to M_{N}(\mathbb{C})$ be Lipschitz with respect to $\left\lVert\cdot\right\rVert_{2}$ . Let $g(y)=\int f(x,y)\,d\mu(x|y)$ . Then $g$ is Lipschitz with

[TABLE]

Proof.

By the previous lemma,

[TABLE]

As $t\to\infty$ , we have $T_{t}f(x,y)\to g(y)$ by Proposition 5.10 (5). Hence, $\left\lVert g\right\rVert_{\operatorname{Lip}}\leq(1+C/c)\left\lVert f\right\rVert_{\operatorname{Lip}}$ . ∎

5.5. Asymptotic Approximation and Convergence

Let $V^{(N)}$ and $\mu^{(N)}$ be as in Theorem 5.9, let $(X^{(N)},Y^{(N)})$ be a random variable with distribution $\mu^{(N)}$ . Let $\mu^{(N)}(x|y)$ denote the conditional distribution of $X^{(N)}$ given $Y^{(N)}$ .

Let $P_{t}^{(N)}$ , $S_{t}^{(N)}$ , and $T_{t}^{(N)}$ be the semigroups acting on Lipschitz functions defined as in §5.4 with respect to the potential $V^{(N)}$ .

Lemma 5.13.

With the notation above, suppose that $f^{(N)}:M_{N}(\mathbb{C})_{sa}^{m}\to M_{N}(\mathbb{C})$ , that $f^{(N)}$ is $K$ -Lipschitz for every $N$ , and that $f^{(N)}$ is asymptotically approximable by trace polynomials. Then

(1)

$\{P_{t}^{(N)}f^{(N)}\}$ * is asymptotically approximable by trace polynomials.* 2. (2)

$\{S_{t}^{(N)}f^{(N)}\}$ * is asymptotically approximable by trace polynomials.* 3. (3)

$\{T_{t}^{(N)}f^{(N)}\}$ * is asymptotically approximable by trace polynomials.*

Proof.

(1) We proved in Lemma 4.12 that $P_{t}^{(N)}$ preserves asymptotic approximability by trace polynomials.

(2) Recall that $S_{t}^{(N)}f^{(N)}(x,y)=f^{(N)}(W_{t}^{(N)}(x,y),y)$ , where

[TABLE]

Now $D_{x}V^{(N)}(x,y)$ is $C$ -Lipschitz in $(x,y)$ , asymptotically approximable by trace polynomials, and independent of $t$ , and thus it satisfies Assumption 4.6, so by Proposition 4.7, $W_{t}^{(N)}(x,y)$ is asymptotically approximable by trace polynomials (here we rely on Lemma 3.6 that asymptotic approximability is equivalent to being asymptotic to some element of $\overline{\operatorname{TrP}}_{m+n}^{1}$ ). Then because $f^{(N)}$ is $K$ -Lipschitz in $(x,y)$ , Lemma 3.12 implies asymptotic approximability of $f^{(N)}(W_{t}^{(N)}(x,y),y)$ .

(3) Let $T_{t,\ell}^{(N)}=(P_{2^{-\ell}}^{(N)}S_{2^{-\ell}}^{(N)})^{n}$ whenever $t=n2^{-\ell}$ . From (1) and (2), it follows that $T_{t,\ell}^{(N)}f^{(N)}$ is asymptotically approximable by trace polynomials. Now for each dyadic $t$ , Proposition 5.10 (1) shows that $T_{t,\ell}^{(N)}f^{(N)}\to T_{t}^{(N)}f^{(N)}$ uniformly on $\left\lVert\cdot\right\rVert_{2}$ -balls (and hence on $\left\lVert\cdot\right\rVert_{\infty}$ ). Therefore, by Lemma 3.13, $T_{t}^{(N)}f^{(N)}$ is asymptotically approximable by trace polynomials. Then we extend this property from dyadic $t$ to all real $t$ using Proposition 5.10 (2) and Lemma 3.13. ∎

Proof of Theorem 5.9.

Let $f^{(N)}:M_{N}(\mathbb{C})_{sa}^{m+n}\to M_{N}(\mathbb{C})$ be $K$ -Lipschitz and asymptotically approximable by trace polynomials. Let

[TABLE]

We showed in Corollary 5.12 that $g^{(N)}$ is Lipschitz with $\left\lVert g^{(N)}\right\rVert_{\operatorname{Lip}}\leq(1+C/c)\left\lVert f^{(N)}\right\rVert_{\operatorname{Lip}}$ . We know that $T_{t}^{(N)}f^{(N)}$ is asymptotically approximable by trace polynomials in $(x,y)$ . By Proposition 5.10 (5), we have $T_{t}^{(N)}f^{(N)}(x,y)\to g^{(N)}(x,y)$ as $t\to\infty$ , with the error bounded by

[TABLE]

Given that $\{DV^{(N)}\}$ is asymptotically approximable by trace polynomials, $\left\lVert D_{x}V\right\rVert_{u,R}^{(N)}$ is bounded as $N\to\infty$ . This implies that the rate of convergence of $T_{t}^{(N)}f^{(N)}(x,y)\to g^{(N)}(x,y)$ as $t\to\infty$ is uniform on $\left\lVert(x,y)\right\rVert_{\infty}\leq R$ and independent of $N$ . So by Lemma 3.13, $g^{(N)}$ is asymptotically approximable by trace polynomials of $(x,y)$ . Yet $g^{(N)}$ is independent of $x$ , and so we may approximate $g^{(N)}(y)$ by evaluating these trace polynomials at $(0,y)$ , which reduces them to trace polynomials of $y$ .

Since $g^{(N)}$ is asymptotically approximable by trace polynomials, let $g\in\overline{\operatorname{TrP}}_{m}^{1}$ such that $g^{(N)}\rightsquigarrow g$ . Then it remains to show that $g(Y)=E_{\mathrm{W}^{*}(Y)}[f(X,Y)]$ , where $(X,Y)$ are non-commutative random variables for the free Gibbs law $\lambda$ as in the theorem statement. It suffices to check that

[TABLE]

whenever $\phi$ is a non-commutative polynomial. But using Corollary 5.3,

[TABLE]

*Remark 5.14**.*

We showed in §4.2 that $P_{t}^{(N)}$ has a large $N$ limit $P_{t}^{\operatorname{TrP}}$ acting on $\overline{\operatorname{TrP}}_{m+n}^{1}$ . Similarly, the results of §4.1 imply that $S_{t}^{(N)}$ has a large $N$ limit $S_{t}^{\operatorname{TrP}}$ acting on $\overline{\operatorname{TrP}}_{m+n}^{1}$ . This implies that the semigroup $T_{t}^{(N)}$ also has a large $N$ limit $T_{t}^{\operatorname{TrP}}$ in light of Proposition 5.10 (1) and (2) and Lemma 3.13. Future research should investigate in what sense $F(x,t)=T_{t}^{\operatorname{TrP}}f(x)$ would solve the differential equation

[TABLE]

where $V$ is the large $N$ limit of $\{V^{(N)}\}$ and $J_{x}F$ is the Jacobian matrix of $F$ with respect to the variable $x$ .

6. Conditional Entropy and Fisher’s Information

In this section, we show that for random matrix models satisfying Assumption 5.1, the conditional (classical) entropy $h(X^{(N)}|Y^{(N)})$ converges to the conditional non-microstates free entropy $\chi^{*}(X|Y)$ (also known as $\chi^{*}(X:\mathrm{W}^{*}(Y))$ ).

6.1. Conditional Entropy and Fisher’s Information in the Classical Setting

We refer to [54, §3] and [29, §5] for background on classical entropy and Fisher’s information and motivation for the free case. The conditional setting is more technical, and we will state several standard results without proof, since the proofs in the non-conditional case were repeated in some detail in [29].

Recall that the classical entropy of a random variable $X$ in $\mathbb{R}^{m}$ with probability density $\rho$ is $h(X)=-\int\rho\log\rho$ . Similarly, if $(X,Y)$ is a random variable in $\mathbb{R}^{m}\times\mathbb{R}^{n}$ with density $\rho_{X,Y}(x,y)$ , then the conditional entropy $h(X|Y)$ is defined by

[TABLE]

where $\rho_{Y}$ is the marginal density

[TABLE]

and $\rho_{X|Y}$ is the conditional density

[TABLE]

It is a standard fact that if $X$ has finite variance, then $h(X|Y)$ is well-defined. The proof for the non-conditional entropy $h(X)$ was reviewed in [29, Lemma 5.1], and the conditional case can be handled similarly.

The conditional Fisher information given by

[TABLE]

whenever the right hand side makes sense and $\infty$ otherwise. It describes the rate of change of $h(X+t^{1/2}S|Y)$ , where $S$ is a Gaussian random variable in $\mathbb{R}^{m}$ with covariance matrix $I$ independent of $(X,Y)$ . Knowing that the density $\rho_{X+t^{1/2}S,Y}$ satisfies the heat equation

[TABLE]

one can show that $\mathcal{I}(X+t^{1/2}S|Y)$ is well-defined and finite for $t>0$ and that

[TABLE]

The Fisher information is the $L^{2}$ norm of the ( $\mathbb{R}^{m}$ -valued) random variable $\Xi$ given by evaluating the score function $-\nabla_{x}\rho_{X|Y}/\rho_{X|Y}$ on the random variable $(X,Y)$ , provided that this random variable is in $L^{2}$ . In this case, the random variable $\Xi$ is known as the score function for $X$ given $Y$ , and it is the unique element of $L^{2}$ satisfying the integration-by-parts relation

[TABLE]

More generally, if there exists a random variable $\Xi$ in $L^{2}$ satisfying this integration-by-parts formula, then we define the conditional Fisher information to be $\mathcal{I}(X|Y)=E|\Xi|^{2}$ (and this extends our previous definition of $\mathcal{I}(X|Y)$ ). Otherwise, $\mathcal{I}(X|Y)$ is defined to be $\infty$ .

In light of the integration-by-parts characterization, score functions behave well under conditionally independent sums. The following lemma is proved in the same way as the non-conditional case (see [29, Lemma 5.6]) and the free case (see [51, Proposition 3.7]).

Lemma 6.1.

Let $Y$ be a random variable in $\mathbb{R}^{n}$ and let $X_{1}$ and $X_{2}$ be random variables in $\mathbb{R}^{m}$ that are conditionally independent given $Y$ . Suppose that $\Xi$ is a score function for $X_{1}$ given $Y$ . Then $E[\Xi|X_{1}+X_{2},Y]$ is a score function for $X_{1}+X_{2}$ given $Y$ . Hence,

[TABLE]

In particular, this holds if $X_{2}$ is independent from $(X_{1},Y)$ or $X_{1}$ is independent of $(X_{2},Y)$ .

Score functions also scale in the following way. The proof is straightforward from the integration-by-parts relation.

Lemma 6.2.

If $\Xi$ is a score function for $X$ given $Y$ and $t>0$ , then $(1/t)\Xi$ is a score function for $tX$ given $Y$ , and hence $\mathcal{I}(tX|Y)=t^{-2}\mathcal{I}(X|Y)$ .

6.2. Random Matrix Renormalization

Suppose that $(X,Y)$ is a random variable in $M_{N}(\mathbb{C})_{sa}^{m}\times M_{N}(\mathbb{C})_{sa}^{n}$ with density $\rho_{X,Y}$ . The trace on $M_{N}(\mathbb{C})_{sa}$ produces a real inner product. But to study the large $N$ limit, we use the normalized trace $\tau_{N}=(1/N)\operatorname{Tr}$ . The corresponding normalized Gaussian is the GUE ensemble $S=(S_{1},\dots,S_{m})$ where $S_{j}$ has variance $1$ with respect to $\tau_{N}$ . We use the following renormalized entropy, which is motivated by computation of the Gaussian case and by (6.5) below,

[TABLE]

Due to the normalization of Gaussian, the evolution of the density for $(X+t^{1/2}S,Y)$ is given by the renormalized heat equation

[TABLE]

This results in

[TABLE]

where $\mathcal{I}^{(N)}(X|Y):=N^{-3}\mathcal{I}(X|Y)$ , assuming that $X$ has finite variance and $t>0$ .

Another heuristic for the normalization $\mathcal{I}^{(N)}=N^{-3}\mathcal{I}$ comes from analyzing the case where $(X,Y)$ have density $(1/Z)e^{-N^{2}V(x,y)}\,dx\,dy$ where $V$ is uniformly convex and semi-concave. Indeed, in this case, the classical score function for $X$ given $Y$ is $-N^{2}\nabla_{x}V(X,Y)$ . Recall that $D_{x}V=N\nabla_{x}V$ is the gradient of $V$ with respect to the normalized inner product $\langle\cdot,\cdot\rangle_{2}$ . Thus,

[TABLE]

is a dimension-independent normalization. Furthermore, the normalized score function $\xi=(1/N)\Xi$ (which would be $D_{x}V(X,Y)$ in the case where the law is given by a potential $V$ ) satisfies the integration-by-parts relation

[TABLE]

where $\xi=(\xi_{1},\dots,\xi_{m})$ and where $\operatorname{Div}$ is the divergence with respect to the classical coordinates (not normalized). But if $f$ is a non-commutative polynomial, then

[TABLE]

where $\partial_{x_{j}}$ denotes the non-commutative derivative or free difference quotient with respect to $x_{j}$ . Thus, applying the integration-by-parts relation to non-commutative polynomials results in the dimension-independent relation

[TABLE]

that characterizes the normalized score function.

As a consequence of (6.5), $h^{(N)}(X|Y)$ can be recovered by integrating $\mathcal{I}^{(N)}(X+t^{1/2}S|Y)$ and modifying the integral to converge at $\infty$ . This results in

[TABLE]

provided that $(X,Y)$ has a density $\rho_{X,Y}$ and that $X$ has finite variance. The proof is similar to [29, Lemma 5.7]. Convergence of the integral at $\infty$ can be deduced from the following estimate, and it also shows convergence of the integral at [math] if $\mathcal{I}(X|Y)$ is finite. Compare [51, Corollary 6.14 and Remark 6.15] and [29, Lemma 5.7].

Lemma 6.3.

Let $(X,Y)$ be a random variable in $M_{N}(\mathbb{C})_{sa}^{m}\times M_{N}(\mathbb{C})_{sa}^{n}$ such that $a=(1/m)\sum_{j=1}^{m}E[\tau_{N}(X_{j}^{2})]<\infty$ , and let $S$ be an independent GUE $m$ -tuple. Then

[TABLE]

Proof.

We observe that $\xi_{t}=E[t^{-1/2}S|X+t^{1/2}S,Y]$ is a normalized score function for $X+t^{1/2}S$ given $Y$ by Lemma 6.1. This yields $\mathcal{I}^{(N)}(X+t^{1/2}S|Y)\leq m/t$ . On the other hand, if $\xi$ is a normalized score function for $X$ given $Y$ , we also have $\xi_{t}=E[\xi|X+t^{1/2}S,Y]$ , which yields the upper bound $\mathcal{I}^{(N)}(X|Y)$ . The lower bound follows from observing $(E\lVert\xi_{t}\rVert_{2}^{2})^{1/2}(E\lVert X+t^{1/2}S\rVert_{2}^{2})^{1/2}\geq E\langle\xi_{t},X+t^{1/2}S\rangle_{2}$ and evaluating the right hand side using integration by parts. ∎

6.3. Convergence to Conditional Free Entropy

Motivated by the normalized entropy and Fisher’s information in the previous section, Voiculescu defined the free versions as follows. Let $(X,Y)$ be an $(m+n)$ -tuple of self-adjoint non-commutative random variables in a tracial $\mathrm{W}^{*}$ -algebra $(\mathcal{M},\tau)$ . We say that $\xi=(\xi_{1},\dots,\xi_{m})\in L^{2}(\mathcal{M},\tau)_{sa}^{m}$ is a free score function for $X$ given $Y$ (also known as a conjugate variable) if for every non-commutative polynomial $f(X,Y)$ , we have

[TABLE]

The free Fisher information $\Phi^{*}(X|Y)$ is defined to be $\lVert\xi\rVert_{2}^{2}$ if such a $\xi$ exists, and $\infty$ otherwise. The non-microstates free entropy $\chi^{*}(X|Y)$ is defined to be

[TABLE]

Convergence of the integral at $\infty$ follows from the free analogue of Lemma 6.3, so that $\chi^{*}(X|Y)$ is well-defined in $[-\infty,\infty)$ whenever $X$ has finite variance.

*Remark 6.4**.*

Voiculescu’s original notation in [51, §7] was $\chi^{*}(X:\mathrm{W}^{*}(Y))$ rather than $\chi^{*}(X|Y)$ , since the definition of the free score function can be rephrased so as to depend only on $\mathrm{W}^{*}(Y)$ rather than $Y$ . However, we prefer to write $\chi^{*}(X|Y)$ instead by analogy with the classical case, using the vertical bar to denote “conditioning.” This avoids potential confusion with the notation $\chi(X:Y)$ for microstates entropy of $X$ in the presence of $Y$ used in [50, §1].

The following lemma gives sufficient conditions for classical Fisher information for random matrix models to converge to free Fisher information. The main hypotheses are that the non-commutative laws converge, the score functions $D_{x}V^{(N)}$ for the $N\times N$ matrix models are asymptotically approximable by trace polynomials, and some mild growth conditions on score functions and probability measures as $\lVert(x,y)\rVert_{\infty}\to\infty$ . We omit the proof since it is a direct adaptation of the proof of [29, Proposition 5.10].

Lemma 6.5.

Let $V^{(N)}:M_{N}(\mathbb{C})_{sa}^{m+n}\to\mathbb{R}$ be a potential with $\int_{M_{N}(\mathbb{C})^{m+n}}e^{-N^{2}V^{(N)}(x,y)}\,dx\,dy<+\infty$ , let $\mu^{(N)}$ be the associated probability density, and let $(X^{(N)},Y^{(N)})$ be a random variable distributed according to $\mu^{(N)}$ . Let $(X,Y)$ be an $(m+n)$ -tuple of self-adjoint non-commutative random variables in the tracial $\mathrm{W}^{*}$ -algebra $(\mathcal{M},\tau)$ . Assume that:

(A)

The non-commutative law of $(X^{(N)},Y^{(N)})$ with respect to $\tau_{N}$ converges in probability to the non-commutative law of $(X,Y)$ . 2. (B)

$D_{x}V^{(N)}$ * is defined and continuous, and the sequence $\{D_{x}V^{(N)}\}$ is asymptotically approximable by trace polynomials, and hence $D_{x}V^{(N)}\rightsquigarrow g\in(\overline{\operatorname{TrP}}_{m+n}^{1})_{sa}^{m}$ .* 3. (C)

For some $k\geq 0$ and $a,b>0$ , we have

[TABLE] 4. (D)

There exists $R_{0}>0$ such that

[TABLE]

Then $\mathcal{I}^{(N)}(X|Y)$ is finite. Moreover, $g(X,Y)$ is in $L^{2}(\mathcal{M},\tau)$ and it is the free score function for $X$ given $Y$ , and we have

[TABLE]

Theorem 6.6.

Let $V^{(N)}:M_{N}(\mathbb{C})_{sa}^{m}\times M_{N}(\mathbb{C})_{sa}^{n}\to\mathbb{R}$ satisfy Assumption 5.1 for some $0<c\leq C$ . Let $\mu^{(N)}$ be the corresponding measure, let $X^{(N)},Y^{(N)}$ be random variables chosen according to $\mu^{(N)}$ , and let $S^{(N)}$ be an independent $m$ -tuple of GUE matrices.

Let $X=(X_{1},\dots,X_{m})$ and $Y=(Y_{1},\dots,Y_{n})$ be non-commutative random variables with non-commutative law $\mu=\mu_{V}$ , and let $S$ be a freely independent free semicircular $m$ -tuple. Then for every $t\geq 0$ , we have

[TABLE]

and

[TABLE]

Proof.

We want to show that the law of $(X^{(N)}+t^{1/2}S^{(N)},Y^{(N)})$ satisfies the assumptions of Lemma 6.5 for each $t\geq 0$ . The joint law of $(X^{(N)},Y^{(N)},S^{(N)})$ is given by the convex potential $U^{(N)}(x,y,s)=V(x,y)+(1/2)\lVert s\rVert_{2}^{2}$ . Now $U^{(N)}$ satisfies $\min(c,1)\leq HU^{(N)}\leq\max(C,1)$ and $DU^{(N)}$ is asymptotically approximable by trace polynomials. Thus, the law of $(X^{(N)},Y^{(N)},S^{(N)})$ has a large $N$ limit given by Theorem 5.2. In fact, the large $N$ limit must be non-commutative law of $(X,Y,S)$ because of Voiculescu’s asymptotic freeness theorem [52] and because the non-commutative law of $S^{(N)}$ converges to the non-commutative law of $S$ . (Alternatively, this could be proved the same way as [29, Lemma 7.4].)

Since the non-commutative law of $(X^{(N)},Y^{(N)},S^{(N)})$ converges in probability to that of $(X,Y,S)$ , the non-commutative law of $(X^{(N)}+t^{1/2}S^{(N)},Y^{(N)})$ converges in probability to that of $(X+t^{1/2}S,Y)$ , and thus (A) of Lemma 6.5 holds. Moreover, Lemma 2.12 shows that

[TABLE]

From this it is not hard to show that $(X^{(N)}+t^{1/2}S^{(N)},Y^{(N)})$ satisfies (D).

It remains to check (B) and (C). The potential for $(X^{(N)}+t^{1/2}S^{(N)},Y^{(N)})$ is given by

[TABLE]

which follows by applying the change of variables formula for the density. Here we write $\tilde{x}$ to emphasize that this variable corresponds to $X^{(N)}+t^{1/2}S^{(N)}$ rather than $X^{(N)}$ . Note that $W_{t}^{(N)}$ is uniformly convex and semi-concave since it is the composition of $U^{(N)}$ with an invertible linear transformation. Also,

[TABLE]

is asymptotically approximable by trace polynomials. The potential corresponding to $(X^{(N)}+t^{1/2}S^{(N)},Y^{(N)})$ is

[TABLE]

Since $W_{t}^{(N)}$ is uniformly convex, the integrand vanishes rapidly at $\infty$ , and thus it is straightforward to differentiate under the integral by dominated convergence, and deduce that $V_{t}^{(N)}$ is continuously differentiable. Furthermore,

[TABLE]

so that

[TABLE]

or in other words $DV_{t}^{(N)}$ is given by the conditional expectation

[TABLE]

Now we apply Theorem 5.9 using the potential $W_{t}^{(N)}$ and conditioning on $(X^{(N)}+t^{1/2}S^{(N)},Y^{(N)})$ to conclude conclude that $DV_{t}^{(N)}(\tilde{x},y)$ is asymptotically approximable by trace polynomials, which establishes (B).

Furthermore, Theorem 5.9 implies that

[TABLE]

This implies that (C) of Lemma 6.5 holds with $k=1$ , using Remark 4.13.

Therefore, we may apply Lemma 6.5 to $(X^{(N)}+t^{1/2}S^{(N)},Y^{(N)})$ to obtain that (6.8) holds for every $t\geq 0$ , that is,

[TABLE]

For the second claim (6.9) regarding $h^{(N)}$ and $\chi^{*}$ , it remains to show that

[TABLE]

We just showed the integrand converges pointwise. But we can take the limit inside the integral by the dominated convergence theorem, because by Lemma 6.3, we have

[TABLE]

and we also know that $\mathcal{I}^{(N)}(X^{(N)}|Y^{(N)})$ is bounded as $N\to\infty$ because it converges to $\Phi^{*}(X|Y)$ . ∎

*Remark 6.7**.*

Of course, (6.10) leads to the same conclusion as Lemma 6.1. Indeed, $\xi_{t}=D_{x}V_{t}^{(N)}(X^{(N)}+t^{1/2}S^{(N)},Y^{(N)})$ is the score function for $X^{(N)}+t^{1/2}S^{(N)}$ , and Lemma 6.1 says that $\xi_{t}$ is the conditional expectation of $\xi_{0}=D_{x}V^{(N)}(X^{(N)},Y^{(N)})$ given $X^{(N)}+t^{1/2}S^{(N)}$ and $Y^{(N)}$ .

*Remark 6.8**.*

In [29, §7], we did not use the conditional expectation method to prove $DV_{t}^{(N)}$ is asymptotically approximable by trace polynomials, but rather we analyzed the evolution of $DV_{t}^{(N)}$ directly using PDE semigroups. The proof given here for convergence of entropy is thus considerably shorter. However, our results on the evolution of $DV_{t}^{(N)}$ will come in handy for our construction of transport in the next section.

7. Conditional Transport to Gaussian

In this section, we prove our main results about transport (Theorems 7.11 and 7.13). Suppose that $V^{(N)}(x,y)$ is a potential as in Assumption 5.1, $\mu^{(N)}$ is the corresponding probability distribution and that $(X^{(N)},Y^{(N)})$ is a random variable with this law. Let $S^{(N)}$ be an independent $m$ -tuple of GUE matrices. Let $\mu_{t}^{(N)}$ be the law of $(X^{(N)}+t^{1/2}S^{(N)},Y^{(N)})$ .

The evolution of the potential $V_{t}^{(N)}$ corresponding to $\mu_{t}^{(N)}$ was studied in [29], and in particular, we established a dimension-independent way to obtain $DV_{t}^{(N)}$ from $DV^{(N)}$ using operations that preserve asymptotic approximability by trace polynomials. By solving an ODE in terms of $DV_{t}^{(N)}$ , we will obtain transport maps $F_{s,t}^{(N)}:M_{N}(\mathbb{C})_{sa}^{m+n}\to M_{N}(\mathbb{C})_{sa}^{m}$ such that

[TABLE]

Upon renormalizing and taking the limit as $s$ or $t$ goes to $\infty$ , we obtain transport to the law of $(S^{(N)},Y^{(N)})$ .

To make each part of the proof more computationally tractable, we proceed in stages. Up until §7.5, we fix $N$ (and thus suppress it in the notation). First, in §7.1, we describe the basic construction of transport for functions of $x$ alone (imagining that we have frozen the variable $y$ ). In §7.2, we describe the properties of $V_{t}^{(N)}(x,y)$ . Next, §7.3 proves Lipschitz estimates for the transport maps $F_{s,t}^{(N)}(x,y)$ .

In §7.4, we introduce renormalized transport maps $\tilde{F}_{s,t}^{(N)}$ that transport $\tilde{\mu}_{t}$ to $\tilde{\mu}_{s}$ , where $\tilde{\mu}_{t}$ is the law of $(e^{-t/2}X^{(N)}+e^{-t/2}(e^{t}-1)^{1/2}S^{(N)},Y^{(N)})$ . The renormalized transport map $\tilde{F}_{s,t}$ is the same one used by Otto and Villani in their proof of the Talagrand transportation-entropy inequality [39, §4, proof of Lemma 2], in the special case where the target measure is Gaussian (and generalized to the conditional setting). We will explain this inequality further in §8.3.

The new element in our paper is the analysis of the large $t$ and large $N$ limits of the transport maps. In §7.4, we show that the limit as $s$ or $t$ tends to $\infty$ exists. Then in §7.5, we use the machinery of asymptotic approximation by trace polynomials to study the large $N$ limit of $\tilde{F}_{s,t}^{(N)}$ . In order to get dimension-independent estimates for convergence as $s$ or $t$ tends to $\infty$ , we conduct a finer analysis of convexity properties of $V_{t}$ and Lipschitz properties of $\tilde{F}_{s,t}$ . It is convenient to carry out the earlier stages of this analysis in §7.2 and §7.3 for $F_{s,t}$ rather than $\tilde{F}_{s,t}$ .

7.1. Basic Construction of Transport

In this section, we will fix $N$ and fix a function $V:M_{N}(\mathbb{C})_{sa}^{m}\to\mathbb{R}$ in $\mathcal{E}_{c,C}$ for some $0<c<C$ . Later, we will allow $V$ to depend on $N$ and to depend on another self-adjoint tuple $y$ , but we prefer to simplify notation for the sake of carrying out the basic computation.

Let $\mu$ be the probability measure with density $(1/Z)e^{-N^{2}V}$ where $Z=\int_{M_{N}(\mathbb{C})_{sa}^{m}}e^{-N^{2}V}$ . We showed in [29] that the density of $\mu_{t}:=\mu*\sigma_{t,N}$ is $(1/Z)e^{-N^{2}V_{t}}$ , where $V_{t}$ solves the equation

[TABLE]

Because $(1/Z)e^{-N^{2}V_{t}}$ solves the heat equation, we know that $V_{t}$ is a smooth function of $(x,t)$ for $t>0$ and a continuous function of $(x,t)$ for $t\geq 0$ . Moreover, $V_{t}\in\mathcal{E}(c(1+ct)^{-1},C(1+Ct)^{-1})$ for each $t$ as proved in Theorem 6.1 (1) of [29].

Now we can describe explicit transport functions $F_{s,t}$ such that $(F_{s,t})_{*}\mu_{s}=\mu_{t}$ for all $s,t\in[0,+\infty)$ .

Proposition 7.1.

Let $V$ , $\mu$ , $V_{t}$ , and $\mu_{t}$ be as above.

(1)

There exists a unique family of functions $F_{s,t}:M_{N}(\mathbb{C})_{sa}^{m}\to M_{N}(\mathbb{C})_{sa}^{m}$ for $0\leq s\leq t<+\infty$ such that

[TABLE] 2. (2)

$F_{t_{1},t_{2}}\circ F_{t_{2},t_{3}}=F_{t_{1},t_{3}}$ * and in particular $F_{t,s}=F_{s,t}^{-1}$ .* 3. (3)

$(F_{s,t})_{*}\mu_{t}=\mu_{s}$ .

Proof.

(1) Because $V_{s}\in\mathcal{E}(c(1+cs)^{-1},C(t+Cs)^{-1})$ , we know that $DV_{s}(x)$ is $C$ -Lipschitz with respect to $\left\lVert\cdot\right\rVert_{2}$ . Hence, given $t\in[0,+\infty)$ , by the Picard-Lindelöf theorem, the initial value problem (7.2) has a solution defined for all $s\in[0,+\infty)$ .

(2) Fix $t_{1}$ , $t_{2}$ , and $t_{3}$ and fix $x\in M_{N}(\mathbb{C})_{sa}^{m}$ . Let $G(t)$ be the function defined by $G(t_{3})=x$ and $\partial_{t}G(t)=DV_{t}(G(t))$ . By definition of the functions $F_{s,t}$ , we have $G(t_{1})=F_{t_{1},t_{3}}(x)$ and $G(t_{2})=F_{t_{2},t_{3}}(x)$ . So $G$ also satisfies the initial value problem $\partial_{t}G(t)=DV_{t}(G(t))$ and $G(t_{2})=F_{t_{2},t_{3}}(x)$ . Therefore, $G(t_{1})=F_{t_{1},t_{2}}(F_{t_{2},t_{3}}(x))$ , so that $F_{t_{1},t_{2}}(F_{t_{2},t_{3}}(x))=F_{t_{1},t_{3}}(x)$ .

(3) We first prove the claim for $s,t>0$ . Because $V_{s}$ is smooth, it follows that $F_{s,t}$ is smooth for $s,t>0$ by standard theory of smooth dependence for ODE. Let $JF_{s,t}$ denote the Jacobian linear transformation (differential) of $F_{s,t}$ . Let $\rho_{t}=(1/Z)e^{-V_{t}}$ is the density of $\mu_{t}$ . As a consequence of the change-of-variables formula for multivariable integrals, we see that the density of $(F_{s,t})_{*}\mu_{t}=(F_{t,s}^{-1})_{*}\mu_{t}$ is

[TABLE]

Fix $s$ . If $t=s$ , then clearly this reduces to $\rho_{s}$ . Therefore, it suffices to show that $(\rho_{t}\circ F_{t,s})|\det JF_{t,s}|$ is a constant function of $t$ , or equivalently that

[TABLE]

Recalling smoothness $V_{t}$ and $F_{t,s}$ for $s,t>0$ and using the differential equations (7.1) for $V_{t}$ and (7.2) for $F_{t,s}$ , we obtain

[TABLE]

Meanwhile, to compute $\partial_{t}\log|\det JF_{t,s}|$ , note that for small $\epsilon\in\mathbb{R}$ ,

[TABLE]

so that

[TABLE]

Using smoothness,

[TABLE]

Since $JF_{t+\epsilon,t}$ becomes the identity when $\epsilon=0$ , we know that for small enough $\epsilon$ , the linear transformation $JF_{t+\epsilon,t}$ has positive determinant and $\log JF_{t+\epsilon,t}$ is well-defined by power series, so that

[TABLE]

Hence, $\partial_{t}\log|\det JF_{t,s}|=\frac{N}{2}\Delta V_{t}\circ F_{t,s}$ . This implies that

[TABLE]

completing the proof of the claim for $s,t>0$ . The equality $(F_{s,t})_{*}\mu_{t}=\mu_{s}$ extends to the case where $s$ or $t$ is zero because both sides depend continuously on $s$ and $t$ with respect to the weak topology on measures. ∎

In particular, the map $F_{0,t}$ transports $\mu_{t}=\mu*\sigma_{t,N}$ to the original law $\mu$ . In other words, if $X\sim\mu$ and $S\sim\sigma_{1,N}$ , then $F_{0,t}(X+t^{1/2}S)\sim X$ and $F_{t,0}(X)\sim X+t^{1/2}S$ . This implies that $(1+t)^{-1/2}F_{t,0}(X)\sim(1+t)^{-1/2}(X+t^{1/2}S)$ . This suggests that we can find a transport map from the law of $X$ to the law of $S$ as the large $t$ limit of $(1+t)^{-1/2}F_{t,0}$ . In the interest of efficiency, we postpone the details of this argument until after we introduce the dependence on the other set of parameters $y$ .

7.2. Conditional Hamilton-Jacobi-Bellman Equation and Semigroups

Let us now fix $N$ and fix a potential $V:M_{N}(\mathbb{C})_{sa}^{m}\times M_{N}(\mathbb{C})_{sa}^{n}\to\mathbb{R}$ in $\mathcal{E}_{m+n}^{(N)}(c,C)$ for some $0\leq c\leq C$ . Let $\mu$ be the corresponding law and let $(X,Y)$ be a random variable in $M_{N}(\mathbb{C})_{sa}^{m}\times M_{N}(\mathbb{C})_{sa}^{n}$ distributed according to $\mu$ . Let $\mu_{t}$ be the law of $(X+t^{1/2}S,Y)$ , where $S$ is an independent tuple of independent GUE.

Our goal is to transport the law $\mu_{s}$ to the law $\mu_{t}$ . Upon freezing the variable $y$ , the methods of the previous section will produce a transport map $F_{s,t}(x,y)$ such that $F_{s,t}(\cdot,y)$ pushes forward the conditional distribution of $X+s^{1/2}S$ given $Y$ to the conditional distribution of $X+t^{1/2}S$ given $Y$ . Specifically, $F_{s,t}:M_{N}(\mathbb{C})_{sa}^{m}\times M_{N}(\mathbb{C})_{sa}^{n}\to M_{N}(\mathbb{C})_{sa}^{m}$ is the solution to the initial value problem

[TABLE]

Then $(F_{s,t}(X+t^{1/2}S,Y),Y)\sim(X+s^{1/2}S,Y)$ .

We seek to understand the large $t$ and large $N$ behavior of $F_{s,t}(x,y)$ as a function of $(x,y)$ rather than simply as a function of $x$ for a fixed $y$ . To achieve this, we must understand the behavior of $V(x,y)$ and $D_{x}V(x,y)$ as a functions of $(x,y)$ . We will first import the results of [29, §6] regarding $D_{x}V(x,y)$ as a function of $x$ , then we will extend them to handle the dependence on $y$ .

The potential $V_{t}$ satisfies

[TABLE]

We express $V_{t}=R_{t}V$ , where $R_{t}$ is a semigroup acting on convex and semi-concave functions defined as follows. Let

[TABLE]

Then as suggested by Trotter’s formula, we want to express $R_{t}u=\lim_{n\to\infty}(P_{t/n}Q_{t/n})^{n}u$ , but for technical convenience we only apply this to dyadic rationals $t$ and values of $n$ that are powers of $2$ . The following is a direct application of [29, Theorems 6.1 and 6.17] to $V(\cdot,y)$ .

Theorem 7.2.

There exists a semigroup of nonlinear operators $R_{t}:\bigcup_{C>0}\mathcal{E}_{m+n}^{(N)}(0,C)\to\bigcup_{C>0}\mathcal{E}_{m+n}^{(N)}(0,C)$ with the following properties:

(1)

Change in Convexity:* If $u(\cdot,y)\in\mathcal{E}_{m}^{(N)}(c,C)$ , then $R_{t}u(\cdot,y)\in\mathcal{E}_{m}^{(N)}(c(1+ct)^{-1},C(1+Ct)^{-1})$ .* 2. (2)

Approximation by Iteration:* For $\ell\in\mathbb{Z}$ and $t\in 2^{-\ell}\mathbb{N}_{0}$ , denote $R_{t,\ell}u=(P_{2^{-\ell}}Q_{2^{-\ell}})^{2^{\ell}t}u$ . Suppose $t\in\mathbb{Q}_{2}^{+}$ and $u\in\mathcal{E}_{m+n}^{(N)}(0,C)$ .*

(a)

If $2^{-\ell-1}C\leq 1$ , then

[TABLE] 2. (b)

$\displaystyle\left\lVert D_{x}(R_{t,\ell}u)-D_{x}(R_{t}u)\right\rVert_{L^{\infty}}\leq[t/2+C(t/2)^{2}]C^{2}m^{1/2}(2\cdot 2^{-\ell/2}+2^{-3\ell/2}C)$ . 3. (3)

Continuity in Time:* Suppose $s\leq t\in\mathbb{R}_{+}$ and $u\in\mathcal{E}_{m+n}^{(N)}(0,C)$ . Then*

(a)

$R_{t}u\leq R_{s}u+\frac{m}{2}[\log(1+Ct)-\log(1+Cs)]$ . 2. (b)

$R_{t}u\geq R_{s}u-\frac{1}{2}(t-s)(Cm+\left\lVert D_{x}u\right\rVert_{2}^{2})$ . 3. (c)

If $C(t-s)\leq 1$ , then $\left\lVert D_{x}(R_{t}u)-D_{x}(R_{s}u)\right\rVert_{2}\leq 5Cm^{1/2}2^{1/2}(t-s)^{1/2}+C(t-s)\left\lVert D_{x}u\right\rVert_{2}$ . 4. (4)

Differential Equation:* $R_{t}u(x)$ is continuous as a function of $(x,t)$ on $M_{N}(\mathbb{C})_{sa}^{m}\times[0,+\infty)$ and smooth on $M_{N}(|C)_{sa}^{m}\times(0,+\infty)$ , and it satisfies (7.3), and we have $P_{t}[\exp(-N^{2}u)]=\exp(-N^{2}R_{t}u)$ .*

Result (1) regarding convexity and semi-concavity only applies to $R_{t}u$ as a function of $x$ for a fixed $y$ . We now extend this result to control the dependence on $y$ , using the same techniques as in [29, Lemma 6.6]. As remarked in that paper, this type of analysis of $Q_{t}$ is standard in the PDE literature on viscosity solutions.

We use the following notation, as in Definition 2.1: Consider a function $u(x,y)$ on $M_{N}(\mathbb{C})_{sa}^{m}\times M_{N}(\mathbb{C})_{sa}^{n}$ . Let us write $Hu\geq cI_{m}\oplus c^{\prime}I_{n}$ to mean that

[TABLE]

and similarly let us write $Hu\leq CI_{m}\oplus C^{\prime}I_{n}$ to mean that

[TABLE]

Lemma 7.3.

Suppose that $u:M_{N}(\mathbb{C})_{sa}^{m}\times M_{N}(\mathbb{C})_{sa}^{n}\to\mathbb{R}$ and that

[TABLE]

Then

(1)

$cI_{m}\oplus c^{\prime}I_{n}\leq H(P_{t}u)\leq CI_{m}\oplus C^{\prime}I_{n}$ . 2. (2)

$D(Q_{t}u)(x,y)=Du(x-tD_{x}(Q_{t}u)(x,y),y)$ . 3. (3)

$c(1+ct)^{-1}I_{m}\oplus c^{\prime}I_{n}\leq H(Q_{t}u)\leq C(1+Ct)^{-1}I_{m}\oplus C^{\prime}I_{n}$ . 4. (4)

$c(1+ct)^{-1}I_{m}\oplus c^{\prime}I_{n}\leq H(R_{t}u)\leq C(1+Ct)^{-1}I_{m}\oplus C^{\prime}I_{n}$ .

Proof.

(1) This is left as an exercise.

(2) The proof is a modification of that of [29, Lemma 6.6], which proves an analogous result in the simpler case of functions of $x$ without the extra variable $y$ . Fix $x_{0}$ and $y_{0}$ . Because the function $u(z,y_{0})+\frac{1}{2t}\left\lVert z-x_{0}\right\rVert_{2}^{2}$ is uniformly convex with respect to $z$ , it has a unique minimizer $z_{0}$ . This minimizer must be a critical point with respect to the first variable, and hence

[TABLE]

that is,

[TABLE]

Let $p=D_{x}u(z_{0},y_{0})$ and $q=D_{y}u(z_{0},y_{0})$ , so that $Du(z_{0},y_{0})=(p,q)$ . Our assumption $cI_{m}\oplus c^{\prime}I_{n}\leq Hu\leq CI_{m}\oplus C^{\prime}I_{n}$ implies that

[TABLE]

where

[TABLE]

Note that $\underline{v}\leq u\leq v$ implies $Q_{t}\underline{v}\leq Q_{t}u\leq Q_{t}\overline{v}$ since monotonicity of $Q_{t}$ is immediate from the definition. One can compute $Q_{t}\underline{v}$ and $Q_{t}\overline{v}$ directly as in Lemma 6.4 (2) and the proof of Lemma 6.6 in [29] and obtain

[TABLE]

where the last two lines following from substituting $z_{0}=x_{0}-tp$ and that the infimum defining $Q_{t}u$ is achieved at $z_{0}$ . The analogous formula for $Q_{t}\overline{v}(x,t)$ holds as well. The functions $Q_{t}\underline{v}$ and $Q_{t}\overline{v}$ thus provide second-order Taylor expansions from above and below for the function $Q_{t}u$ with respect to $(x,y)$ at the point $(x_{0},y_{0})$ . Looking at the first-order terms in the expansions shows that $Q_{t}u$ is differentiable at $(x_{0},y_{0})$ with

[TABLE]

which proves (2).

(3) We examine the second-order terms of upper and lower Taylor expansions $Q_{t}\underline{v}$ and $Q_{t}\overline{v}$ and apply the claim (2) $\implies$ (1) from Lemma 2.2. This is the same argument as in the proof of [29, Proposition 2.13 (2)].

(4) Recall that if $A\leq Hu\leq B$ , then $A\leq H(P_{t}u)\leq B$ . Using this fact together with (3) iteratively, we see that if $t$ is a dyadic rational and $t=2^{-\ell}k$ , then

[TABLE]

In light of Theorem 7.2 (2), this will also hold in the limit as $\ell\to\infty$ , since for any two self-adjoint matrices $A$ and $B$ , the family of functions with $A\leq Hu\leq B$ is closed under pointwise limits. Similarly, using Theorem 7.2 (3), we extend this to all real $t\geq 0$ . ∎

*Remark 7.4**.*

The convexity conditions of Lemma 7.3 (4) can alternatively be deduced from [9, Theorem 4.3]. However, it is convenient for us to use Theorem 7.2 here because we want the dimension-independent time-continuity estimates Theorem 7.2 (3) in the proof of Theorem 7.11 below.

7.3. Lipschitz Estimates for Conditional Transport

This subsection proves the technical estimate Lemma 7.6 on the Lipschitz seminorm of $F_{s,t}$ . This depends crucially on the convexity properties of $V_{t}(x,y)$ .

Lemma 7.5.

[TABLE]

Proof.

First, note that

[TABLE]

By Lemma 2.2, the first term on the right hand side of (7.4) can be estimated by

[TABLE]

To handle the second term on the right hand side of (7.4), define

[TABLE]

and recall that $\overline{V}_{t}$ is convex and $\underline{V}_{t}$ is concave and in particular

[TABLE]

Note that

[TABLE]

Therefore,

[TABLE]

Now we apply Lemma 2.3 to $\overline{V}_{t}$ with the matrix $A=\frac{C-c}{(1+Ct)(1+ct)}I_{m}\oplus(C-c)I_{n}$ and conclude that

[TABLE]

Combining this estimate for the second term of (7.4) with our earlier estimate for the first term completes the proof. ∎

Lemma 7.6.

We have

[TABLE]

and

[TABLE]

Proof.

Fix $t\geq 0$ and $(x,y)$ and $(x^{\prime},y^{\prime})$ in $M_{N}(\mathbb{C})_{sa}^{m}\times M_{N}(\mathbb{C})_{sa}^{n}$ and define

[TABLE]

Note that $\phi$ is locally Lipschitz, hence absolutely continuous. Also,

[TABLE]

where we have applied Lemma 7.5. It follows that whenever $\phi(s)>0$ ,

[TABLE]

On the other hand, since $\phi(s)\geq 0$ , any point where $\phi$ is zero and $\phi$ is differentiable must be a critical point, so when $\phi(s)=0$ the estimate is vacuously true. This inequality implies

[TABLE]

where in the last line we have observed that $(1+cs)^{1/2}\geq(c/C)^{1/2}(1+Cs)^{1/2}\geq(c/C)(1+Cs)^{1/2}$ . Hence for $s\geq t$

[TABLE]

Now we substitute $\phi(s)=\left\lVert F_{s,t}(x,y)-F_{s,t}(x^{\prime},y^{\prime})\right\rVert_{2}$ and $\phi(t)=\left\lVert x-x^{\prime}\right\rVert_{2}$ and rearrange to obtain

[TABLE]

This proves the asserted estimates in the case where $s\geq t$ . The argument for the case $s\leq t$ is similar. Here we use the lower bound rather than the upper bound in Lemma 7.5 and get

[TABLE]

so that

[TABLE]

Now we take $s\leq t$ and obtain

[TABLE]

which yields the desired estimates. ∎

7.4. Transport in the Large $t$ Limit

We remind the reader here that we are still working in the finite-dimensional setting for a fixed value of $N$ which is suppressed in the notation. To understand the large $t$ limit of our transport maps, consider the renormalized law

[TABLE]

A brief computation shows that the corresponding potential is

[TABLE]

(here the potential is only well-defined up to an additive constant because the probability measure $\tilde{\mu}_{t}$ includes a normalizing constant $1/\tilde{Z}_{t}$ anyway, so we made a convenient choice of the additive constant). This potential satisfies the equation

[TABLE]

We remark that if $\tilde{\rho}_{t}=(1/Z_{t})e^{-N^{2}\tilde{V}_{t}}$ is the density at time $t$ and $r(x,y)=\text{const}e^{-\lVert(x,y)\rVert_{2}^{2}/2}$ is the Gaussian density, then

[TABLE]

In other words, $\tilde{\rho}_{t}$ evolves according to the diffusion semigroup with respect to Gaussian measure (compare equation (33) of [39]), while the heat equation represents diffusion with respect to Lebesgue measure.

The transport functions are renormalized as follows. Because $(F_{s,t}(x,y),y)$ pushes forward $\mu_{t}$ to $\mu_{s}$ , we may compute that $(\tilde{F}_{s,t}(x,y),y)$ pushes forward $\tilde{\mu}_{t}=\tilde{\mu}_{s}$ , where

[TABLE]

Moreover, from the differential equation (7.2), we deduce that

[TABLE]

As $t\to\infty$ , the law $\tilde{\mu}_{t}$ converges to the law of $(S,Y)$ , which we denote $\tilde{\mu}_{\infty}$ . Thus, if we show that $\tilde{F}_{s,t}$ has a limit as $s\to+\infty$ or $t\to+\infty$ , we will be able to transport our given law $\mu=\tilde{\mu}_{0}$ of $(X,Y)$ to the law of $(S,Y)$ . As the first step, we deduce from Lemma 7.6 the following Lipschitz estimates on $\tilde{F}_{s,t}$ which are uniform in $s$ and $t$ . Note also that the coefficient of $\left\lVert y-y^{\prime}\right\rVert_{2}$ goes to zero as $s,t\to\infty$ .

Lemma 7.7.

We have

[TABLE]

and

[TABLE]

In particular,

[TABLE]

Proof.

For the first estimate, for the case where $s\geq t$ , direct substitution of (7.9) into (7.5) of Lemma 7.6 shows that

[TABLE]

The function $C+(1-C)e^{-s}$ is clearly monotone on $[0,+\infty)$ and achieves the values $1$ and $C$ at [math] and $+\infty$ respectively, and hence is between $\min(1,C)$ and $\max(1,C)$ . Hence,

[TABLE]

The case where $s\leq t$ follows by the same argument, where the bound this time is $\max(c,1/c)^{1/2}\leq\max(C,1/c)^{1/2}$ .

For the second estimate, we apply (7.6). Note in (7.6), in the case $s\leq t$ , we may use $(1+cs)^{1/2}\leq(1+Cs)^{1/2}$ and thus in both cases $s\geq t$ or $s\leq t$ ,

[TABLE]

This implies that

[TABLE]

where we have again applied $\min(1,C)e^{s}\leq 1+C(e^{s}-1)\leq\max(1,C)e^{s}$ .

For the last estimate (7.13), observe that

[TABLE]

Lemma 7.8.

Let $\pi_{1}$ denote the function $\pi_{1}(x,y)=x$ . Then

[TABLE]

and

[TABLE]

Proof.

Let $U_{s}(x,y)=\tilde{V}_{s}(x,y)-(1/2)\left\lVert x\right\rVert_{2}^{2}$ . Then (7.10) says that

[TABLE]

Moreover, we have

[TABLE]

We can bound $H_{x}U_{s}$ above and below by subtracting $1$ from both sides, which after some computation reduces to

[TABLE]

Therefore, we have $-L\leq H_{x}U_{s}\leq L$ where

[TABLE]

We claim that $L\leq L^{\prime}:=(\max(C,1/c)-1)e^{-s}$ . If the first term $(1-c)/(1+c(e^{s}-1))$ is negative, then it is $\leq L^{\prime}$ automatically, but if it is positive, then $c\leq 1$ and hence

[TABLE]

Similarly, if $(C-1)/(1+C(e^{s}-1))$ is negative, there is nothing to prove, but otherwise $C\geq 1$ , and hence

[TABLE]

But $-L^{\prime}\leq H_{x}U_{s}\leq L^{\prime}$ implies that $D_{x}U_{s}$ is $L^{\prime}$ -Lipschitz in $x$ . Therefore,

[TABLE]

Applying (7.11) in the case where $s\geq t$ , we get

[TABLE]

Hence,

[TABLE]

which proves the desired estimate (7.14).

To check the second estimate (7.15), first observe

[TABLE]

Moreover, $\lVert\tilde{F}_{s,t}-\pi_{1}\rVert_{\operatorname{Lip},dy}=\lVert\tilde{F}_{s,t}\rVert_{\operatorname{Lip},dy}$ . Therefore, using (7.12) and (7.14),

[TABLE]

Proposition 7.9.

The limits $\tilde{F}_{s,\infty}:=\lim_{t\to\infty}\tilde{F}_{s,t}$ and $\tilde{F}_{\infty,t}=\lim_{s\to\infty}\tilde{F}_{s,t}$ exist for $s,t\geq 0$ . More precisely, let $(X,Y)$ and $(\tilde{X}_{t},Y)$ be a pair of random variables with the laws $\mu$ and $\tilde{\mu}_{t}$ as above. Then

[TABLE]

and

[TABLE]

The estimates of Lemmas 7.7 and 7.8 extend to the cases where $s$ or $t$ is infinite, where we define $\tilde{F}_{\infty,\infty}(x,y)=x$ . Moreover, if $(\tilde{X}_{t},Y)\sim\tilde{\mu}_{t}$ , then we have the relation $(\tilde{F}_{s,t}(\tilde{X}_{t},Y),Y)\sim(\tilde{X}_{s},Y)$ when $s,t\in[0,\infty]$ .

*Remark 7.10**.*

We have written the explicit form of the estimates here in order to emphasize that the bounds are dimension-independent; they only depend on the parameters $m$ , $n$ , $c$ , $C$ , $\lVert E(X)\rVert_{2}$ , $\lVert E(Y)\rVert_{2}$ , $\operatorname{Var}(X)$ , and $\operatorname{Var}(Y)$ . The estimates also become sharper when $c$ and $C$ are close to $1$ , which would include the situation where $V(x,y)$ is a perturbation of the quadratic potential $(1/2)[\left\lVert x\right\rVert_{2}^{2}+\left\lVert y\right\rVert_{2}^{2}]$ . This perturbative setting was studied first in the literature, for instance in [21] and [23]; see [29, §8.3] for further discussion.

Proof.

We first consider the case where $s$ is fixed and $t\to+\infty$ . Note that by (7.11),

[TABLE]

By Lemma 7.8,

[TABLE]

where $L=\max(C,1/c)^{3}-1)\max(C,1/c)^{1/2}$ . Then we apply Lemma 2.5 to $G(x,y)=\tilde{F}_{t,t^{\prime}}(x,y)-x$ with the random variable $(\tilde{X}_{t^{\prime}},Y)$ . Note that $(\tilde{X}_{t^{\prime}},Y)$ has mean $(e^{-t^{\prime}/2}E(X),E(Y))$ and variance $e^{-t^{\prime}}\operatorname{Var}(X)+(1-e^{-t^{\prime}})m+\operatorname{Var}(Y)$ . Moreover,

[TABLE]

Thus, by Lemma 2.5,

[TABLE]

Plugging this into (7.18), we see that $\tilde{F}_{s,t}$ is Cauchy in $t$ as $t\to+\infty$ . Moreover, we obtain the estimate (7.16) by taking $t^{\prime}\to\infty$ in (7.19) and multiplying by $\left\lVert\tilde{F}_{s,t}\right\rVert_{\operatorname{Lip},dx}\leq\max(c,1/c)^{1/2}$ .

Now let us fix $t$ and consider when $s^{\prime}$ and $s$ approach $\infty$ . The argument for this case is similar but antisymmetrical. We estimate

[TABLE]

where the last line follows from (7.14). Then by applying Lemma 2.5 to the function $\tilde{F}_{s,t}(x,y)$ and the random variable $(\tilde{X}_{t},Y)$ , together with (7.13), we obtain

[TABLE]

This produces an estimate on $\left\lVert\tilde{F}_{s^{\prime},t}-\tilde{F}_{s,t}\right\rVert_{2}$ which shows that $\tilde{F}_{s,t}$ is Cauchy as $s\to\infty$ , so that $\tilde{F}_{\infty,t}$ is well-defined. The explicit bound on the rate of convergence follows fixing $s$ and $t$ , combining the above estimates, and taking $s^{\prime}\to\infty$ .

Finally, since we have established convergence of $\tilde{F}_{s,t}$ as $s$ or $t$ approaches $\infty$ , a routine argument with limits will extend the estimates of Lemmas 7.7 and 7.8, and the transport relations, to the cases where $s$ or $t$ is $+\infty$ . ∎

7.5. Transport in the Large $N$ Limit

If $V^{(N)}\in\mathcal{E}_{m+n}^{(N)}(c,C)$ and $\{DV^{(N)}\}$ is asymptotically approximable by trace polynomials, then we must show that the associated sequence of transport maps is asymptotically approximable by trace polynomials, and hence conclude that they define transport for the non-commutative random variables in the large $N$ limit.

Theorem 7.11.

For $N\in\mathbb{N}$ , let $V^{(N)}(x,y)$ be a potential on $M_{N}(\mathbb{C})_{sa}^{m+n}$ satisfying Assumption 5.1 for some $0<c\leq C$ , and let $\mu^{(N)}$ be the corresponding probability measures on $M_{N}(\mathbb{C})_{sa}^{m+n}$ . Let $(X^{(N)},Y^{(N)})$ be a random variable given by $\mu^{(N)}$ and let $S^{(N)}$ be an independent GUE $m$ -tuple. Let

[TABLE]

and let $\tilde{V}_{t}^{(N)}(x,y)=R_{e^{t}-1}^{(N)}V^{(N)}(e^{t/2}x,y)$ be the corresponding potential. Similarly, let $\mu_{\infty}^{(N)}$ be the law of $(S^{(N)},Y^{(N)})$ . For $s,t\in[0,\infty)$ , let $\tilde{F}_{s,t}^{(N)}:M_{N}(\mathbb{C})_{sa}^{m+n}\to M_{N}(\mathbb{C})_{sa}^{m}$ be the solution of the initial value problem

[TABLE]

Then

(1)

The family $\tilde{F}_{s,t}^{(N)}$ extends continuously to $(s,t)\in[0,\infty]^{2}$ . 2. (2)

$\tilde{F}_{s,t}^{(N)}\circ\tilde{F}_{t,u}^{(N)}=\tilde{F}_{s,u}^{(N)}$ . 3. (3)

$(\tilde{F}_{s,t}^{(N)}(\tilde{X}_{t}^{(N)},Y^{(N)}),Y^{(N)})\sim(\tilde{X}_{s}^{(N)},Y^{(N)})$ . 4. (4)

For $s,t\in[0,\infty]$ , the sequence $\{\tilde{F}_{s,t}^{(N)}\}_{N\in\mathbb{N}}$ is $(C/c)\max(C,1/c)^{1/2}$ -Lipschitz for all $s$ , $t$ , and $N$ , and it is asymptotically approximable by trace polynomials as $N\to\infty$ .

Proof.

Recall in §7.4 we defined $\tilde{F}_{s,t}^{(N)}$ by renormalizing $F_{s,t}^{(N)}$ . However, that definition is equivalent to the definition of $\tilde{F}_{s,t}$ given in this theorem because both definitions produce a solution to the ODE (7.10). Of course, global uniqueness of the solution holds because the vector field $D_{x}\tilde{V}_{t}^{(N)}(x,y)-x$ is uniformly Lipschitz in $(x,y)$ on any compact time interval (as we discuss in more detail below).

So claims (1), (2), and (3) follow immediately from Proposition 7.9. The estimate for the Lipschitz norm of $\tilde{F}_{s,t}^{(N)}$ was shown in (7.13).

We finish by showing asymptotic approximability using the results of §4.1. Let $V_{t}^{(N)}=R_{t}^{(N)}V^{(N)}$ . By Theorem 7.2 (3c), $D_{x}V_{t}^{(N)}$ is uniformly continuous in $t$ on $[0,\infty)$ . Since $D_{x}\tilde{V}_{t}^{(N)}(x,y)=e^{t/2}D_{x}V_{e^{t}-1}^{(N)}(e^{t/2}x,y)$ , it follows that $D_{x}\tilde{V}_{t}^{(N)}$ is uniformly continuous in $t$ on $[0,T]$ for every $T>0$ , with modulus of continuity independent of $N$ , and recall it is also uniformly Lipschitz in $(x,y)$ , since $0\leq H\tilde{V}_{t}^{(N)}\leq\max(C,Ce^{t}/(1+C(e^{t}-1))$ .

Consequently, $(1/2)(D_{x}\tilde{V}_{t}^{(N)}(x,y)-x)$ is uniformly continuous in $t$ on $[0,T]$ and uniformly Lipschitz in $(x,y)$ . Also, we showed that $D_{x}V_{t}^{(N)}$ is asymptotically approximable by trace polynomials in the proof of Theorem 6.6, and hence so is $D_{x}\tilde{V}_{t}^{(N)}$ . Thus, $(1/2)(D_{x}\tilde{V}_{t}^{(N)}(x,y)-x)$ satisfies Assumption 4.6, so we may apply Proposition 4.7 to deduce that $\tilde{F}_{s,t}^{(N)}$ is asymptotically approximable by trace polynomials for $s,t\in[0,\infty)$ . This property extends to the case where $s$ or $t$ is infinite using Lemma 3.13 and Proposition 7.9. ∎

*Remark 7.12**.*

Rather than citing the proof of Theorem 6.6, one could also argue that $D_{x}V_{t}^{(N)}$ is asymtotically approximable directly from the construction of the semigroup $R_{t}^{(N)}$ using the same reasoning as [29, Proposition 6.8]. Moreover, this method would also show that $D(R_{t}^{(N)}V^{(N)})$ is asymptotically approximable by trace polynomials provided we can prove analogues of Theorem 7.2 (2) and (3) for $D(R_{t}^{(N)}V^{(N)})$ rather than only $D_{x}(R_{t}^{(N)}V^{(N)})$ . However, all this is unnecessary work for our present purpose.

Theorem 7.13.

With all the notation of the previous theorem, let $(X,Y)$ be a non-commutative random variable distributed according to the limiting free Gibbs law $\lambda$ , let $S$ be a freely independent free semicircular $m$ -tuple, and let $\tilde{X}_{t}=e^{-t/2}X+(1-e^{-t})^{1/2}S$ . Define $\tilde{F}_{s,t}$ by $\tilde{F}_{s,t}^{(N)}\rightsquigarrow\tilde{F}_{s,t}$ . For $s,t,u\in[0,+\infty]$ , we have

(1)

$\tilde{F}_{s,t}$ * is $(C/c)\max(C,1/c)^{1/2}$ -Lipschitz with respect to $\lVert\cdot\rVert_{2}$ .* 2. (2)

$\tilde{F}_{s,t}\circ\tilde{F}_{t,u}=\tilde{F}_{s,u}$ . 3. (3)

$(\tilde{F}_{s,t}(\tilde{X}_{t},Y),Y)\sim(\tilde{X}_{s},Y)$ * in non-commutative law.* 4. (4)

We have

[TABLE]

where $\Theta$ is the universal constant from Proposition 3.17.

In particular, $\mathrm{W}^{*}(X,Y)$ is isomorphic to $\mathrm{W}^{*}(S,Y)$ , which is the free product $\mathrm{W}^{*}(S)*\mathrm{W}^{*}(Y)$ .

Proof.

We know that there exists $\tilde{F}_{s,t}$ such that $\tilde{F}_{s,t}^{(N)}\rightsquigarrow\tilde{F}_{s,t}$ because of Lemma 3.6. Then (1) and (2) follow from the corresponding properties of $\tilde{F}_{s,t}^{(N)}$ by straightforward limit arguments.

As remarked in the last proof $D\tilde{V}_{t}^{(N)}$ is asymptotically approximable by trace polynomials. We also know $D\tilde{V}_{t}^{(N)}$ is uniformly convex and semi-concave, and thus by Theorem 5.2, the non-commutative law of $(\tilde{X}_{t}^{(N)},Y^{(N)})$ converges in probability to some non-commutative law. Of course, the limiting non-commutative law must be the non-commutative law of $(\tilde{X}_{t},Y)$ because the joint non-commutative law of $(X^{(N)},Y^{(N)},S^{(N)})$ converges in probability to that $(X,Y,S)$ (as in the proof of Theorem 6.6).

With this relation between the laws of $(\tilde{X}_{t}^{(N)},Y^{(N)})$ and $(\tilde{X}_{t},Y)$ in hand, we can prove (3) by taking the large $N$ limit using Corollary 5.3. Indeed, if $f\in\overline{\operatorname{TrP}}_{m+n}^{1}$ is $\lVert\cdot\rVert_{2}$ -uniformly continuous, then $f(\tilde{F}_{s,t}^{(N)}(x,y),y)$ is also $\lVert\cdot\rVert_{2}$ -uniformly continuous and asymptotically approximable by trace polynomials by Lemma 3.12. Thus, applying Corollary 5.3 to this function and the function $1$ , we get

[TABLE]

Hence, $\tau\left(f(\tilde{F}_{s,t}(\tilde{X}_{t},Y),Y)\right)=\tau(f(\tilde{X}_{s},Y))$ for all $f\in\overline{\operatorname{TrP}}_{m}^{1}$ that are uniformly continuous in $\lVert\cdot\rVert_{2}$ . But by Proposition 3.14 such functions $f$ can realize every element in the $\mathrm{W}^{*}$ -algebra generated by $(\tilde{X}_{s},Y)$ , and in particular all the non-commutative polynomials in $(\tilde{X}_{s},Y)$ . Hence, $(\tilde{F}_{s,t}(\tilde{X}_{t},Y),Y)\sim(\tilde{X}_{s},Y)$ in non-commutative law as desired.

(4) Note that

[TABLE]

but $(\tilde{F}_{s^{\prime},t}(\tilde{X}_{t},Y),Y)\sim(\tilde{X}_{s^{\prime}},Y)$ in non-commutative law. Hence, it suffices to prove the desired estimate for $\tilde{F}_{s,s^{\prime}}(\tilde{X}_{s^{\prime}},Y)-\tilde{X}_{s^{\prime}}$ rather than $\tilde{F}_{s,t}(\tilde{X}_{t},Y)-\tilde{F}_{s^{\prime},t}(\tilde{X}_{t},Y)$ . Now $(\tilde{X}_{s^{\prime}},Y)$ arises as the large $N$ limit of the matrix models given by potential $\tilde{V}_{s^{\prime}}^{(N)}$ . By Lemma 7.3 (4), we have $HV_{t}^{(N)}\geq c(1+ct)^{-1}I_{m}\oplus cI_{n}$ , so that

[TABLE]

By Remark 5.8, there exists a sequence of random matrix models for $(\tilde{X}_{s^{\prime}},Y)$ given by uniformly convex potentials which are also unitarily invariant (even if this is not true of our original model), with the same lower bound $1/\max(C,1/c)$ for the Hessian of the potential. Therefore, by Proposition 3.17,

[TABLE]

We finish by substituting the estimate

[TABLE]

which follows from (7.15) and Lemma 3.11 (the latter lemma is needed since the original statement of (7.15) is for the finite-dimensional setting for a fixed $N$ ).

The last claim regarding $\mathrm{W}^{*}$ -algebras follows from (3) by examining the case with $s=0$ and $t=\infty$ or vice versa. ∎

8. Applications

We show that Assumption 5.1 is preserved under independent joins, marginals, convolution, and linear changes of variables. We conclude that for the convex free Gibbs laws considered here, $\chi^{*}$ satisfies additivity under conditioning. Moreover, by iterating our conditional transport results, we obtain “lower-triangular” transport maps from a convex free Gibbs law to the law of a free semicircular family, which also satisfy the entropy-cost inequality relative to the semicircular law, analogous to the triangular transport achieved in the classical case by [6, Corollary 3.10].

8.1. Operations on Convex Gibbs Laws

Recall that Assumption 5.1 for a sequence $\{V^{(N)}\}$ of potentials $M_{N}(\mathbb{C})_{sa}^{m}\to\mathbb{R}$ states that $c\leq HV^{(N)}\leq C$ for some constants $c$ and $C$ , the sequence $\{DV^{(N)}\}$ is asymptotically approximable by trace polynomials, and $\int x_{j}\,d\mu^{(N)}(x)$ is a scalar matrix for each $j$ , where $\mu^{(N)}$ is the measure associated to $DV^{(N)}$ .

Proposition 8.1.

Suppose that $V_{1}^{(N)}:M_{N}(\mathbb{C})_{sa}^{m}\to\mathbb{R}$ and $V_{2}^{(N)}:M_{N}(\mathbb{C})_{sa}^{n}\to\mathbb{R}$ satisfy Assumption 5.1 for some $0<c\leq C$ . Then $V^{(N)}(x,y):=V_{1}^{(N)}(x)+V_{2}^{(N)}(y)$ also satisfies Assumption 5.1 for the same $c$ and $C$ .

Moreover, let $\mu_{1}^{(N)}$ , $\mu_{2}^{(N)}$ , and $\mu^{(N)}$ be the measures associated to $V_{1}^{(N)}$ , $V_{2}^{(N)}$ , and $V^{(N)}$ respectively, and let $\lambda_{1}$ , $\lambda_{2}$ , and $\lambda$ be the respective limiting free Gibbs laws given by Theorem 5.2. Then $\mu^{(N)}$ is the independent join of $\mu_{1}^{(N)}$ and $\mu_{2}^{(N)}$ and $\lambda$ is the freely independent join of $\lambda_{1}$ and $\lambda_{2}$ .

Proof.

The claim $c\leq HV^{(N)}\leq C$ follows because $HV^{(N)}(x,y)=HV_{1}^{(N)}(x)\oplus HV_{2}^{(N)}(y)$ . The claim about asymptotic approximation by trace polynomials follows because $DV^{(N)}(x,y)=(DV_{1}^{(N)}(x),DV_{2}^{(N)}(y))$ and each component is asymptotically approximable by trace polynomials.

The probability density for $\mu^{(N)}$ is the tensor product of the probability densities for $\mu_{1}^{(N)}$ and $\mu_{2}^{(N)}$ and hence $\mu^{(N)}$ is the independent join of these two marginal laws. It follows that $\int x_{j}\,d\mu^{(N)}(x)$ and $\int y_{j}\,d\mu^{(N)}(y)$ are scalar matrices, hence Assumption 5.1 holds for $V^{(N)}$ .

Let $(X^{(N)},Y^{(N)})\sim\mu^{(N)}$ be random variables and let $(X,Y)\sim\lambda$ be non-commutative random variables. Then by Theorem 6.6,

[TABLE]

It was shown in [53, Proposition 5.18(c)] that $\Phi^{*}(X,Y)=\Phi^{*}(X)+\Phi^{*}(Y)$ implies that $X$ and $Y$ are freely independent. ∎

Proposition 8.2.

Suppose that $V^{(N)}:M_{N}(\mathbb{C})_{sa}^{m+n}\to\mathbb{R}$ satisfies Assumption 5.1. Let $\mu^{(N)}$ be the corresponding law, let $(X^{(N)},Y^{(N)})\sim\mu^{(N)}$ and let $\mu_{1}^{(N)}$ and $\mu_{2}^{(N)}$ be the laws of $X^{(N)}$ and $Y^{(N)}$ . Then $\mu_{1}^{(N)}$ and $\mu_{2}^{(N)}$ are given by a potentials $W_{1}^{(N)}$ and $W_{2}^{(N)}$ that also satisfy Assumption 5.1 for the same values of $c$ and $C$ .

Proof.

By symmetry, it suffices to prove the claims for $\mu_{2}^{(N)}$ . First, it is immediate that the mean of $y_{j}$ under $\mu_{2}^{(N)}$ is a scalar, since it is $E[Y_{j}^{(N)}]$ . Moreover, if we define

[TABLE]

then (as in the proof of Theorem 6.6) we may compute $DV_{2}^{(N)}$ by differentiating under the integral and obtain

[TABLE]

It follows by Theorem 5.9 that $\{DV_{2}^{(N)}\}$ is asymptotically approximable by trace polynomials.

Finally, the fact that $c\leq HV_{2}^{(N)}\leq C$ follows from [9, Theorem 4.3], or alternatively by the following reasoning. Let $\mu_{t}^{(N)}$ be the law of $(e^{-t/2}X^{(N)}+(1-e^{-t})^{1/2}S^{(N)},Y^{(N)})$ , where $S^{(N)}$ is an independent GUE tuple. The corresponding potential $\tilde{V}_{t}^{(N)}$ is given by (7.8) and it satisfies

[TABLE]

by direct substitution of (7.8) into Lemma 7.3 (4) and hence

[TABLE]

Now as $t\to\infty$ , the law $\tilde{\mu}_{t}$ converges to the law $\tilde{\mu}_{\infty}$ of $(S^{(N)},Y^{(N)})$ . By applying Lemma 2.7, $\tilde{\mu}_{\infty}$ is given by some potential $W_{2}^{(N)}(x,y)$ satisfying

[TABLE]

However, we know that $W_{2}^{(N)}(x,y)=(1/2)\lVert x\rVert_{2}^{2}+V_{2}^{(N)}(y)+\text{constant}$ because the potential corresponding to a law is unique up to an additive constant. This implies that $c\leq HV_{2}^{(N)}\leq C$ as desired. ∎

Proposition 8.3.

Let $V^{(N)}:M_{N}(\mathbb{C})_{sa}^{m}\to\mathbb{R}$ satisfy Assumption 5.1 for some $0<c\leq C$ , and let $X^{(N)}$ be the corresponding random variable. Let $A$ be an invertible $m\times m$ matrix with real entries and let $A^{(N)}$ denote the linear transformation $M_{N}(\mathbb{C})_{sa}^{m}\to M_{N}(\mathbb{C})_{sa}^{m}$ given by

[TABLE]

Then $\widehat{V}^{(N)}=V^{(N)}((A^{-1})^{(N)})$ is the potential corresponding to $A^{(N)}X^{(N)}$ , and $\widehat{V}^{(N)}$ satisfies Assumption 5.1 with constants $c/\lVert A\rVert$ and $C\lVert A^{-1}\rVert$ .

Proof.

The fact that $\widehat{V}^{(N)}$ is the potential corresponding to $A^{(N)}X^{(N)}$ follows from change of variables. Now it is immediate that the expectation of $(A^{(N)}X^{(N)})_{i}$ is a scalar multiple of identity for each $i$ . Next, by the chain rule

[TABLE]

and from this it follows that $\{D\widehat{V}^{(N)}\}$ is asymptotically approximable by trace polynomials. Similarly, by the chain rule,

[TABLE]

The maximum and minimum singular values of $(A^{-1})^{(N)}$ are the same as those of $A^{-1}$ , which are $\lVert A^{-1}\rVert$ and $1/\lVert A\rVert$ respectively. By a basic linear algebra argument, it follows that $c/\lVert A\rVert\leq H\widehat{V}^{(N)}\leq C\lVert A^{-1}\rVert$ . ∎

Proposition 8.4.

Let $V_{1}^{(N)}$ and $V_{2}^{(N)}$ be two potentials $M_{N}(\mathbb{C})_{sa}^{m}\to\mathbb{R}$ satisfying Assumption 5.1 with constants $c$ and $C$ . Let $X^{(N)}$ and $Y^{(N)}$ be the corresponding random tuples of matrices. Then the law of $X^{(N)}+Y^{(N)}$ is given by another potential $\widehat{V}^{(N)}$ satisfying Assumption 5.1 with constants $\sqrt{2}c$ and $\sqrt{2}C$ . Moreover, the free Gibbs state corresponding to $\{\widehat{V}^{(N)}\}$ is the free convolution of those corresponding to $\{V_{1}^{(N)}\}$ and $\{V_{2}^{(N)}\}$ .

Proof.

Let $V^{(N)}(x,y)=V_{1}^{(N)}(x)+V_{2}^{(N)}(y)$ , which satisfies Assumption 5.1 (with the same constants) by Proposition 8.1. Now let $A$ be the $2m\times 2m$ matrix

[TABLE]

Since $A/\sqrt{2}$ is an isometry, we have $\lVert A\rVert=\sqrt{2}$ and $\lVert A^{-1}\rVert=1/\sqrt{2}$ . Therefore, by Proposition 8.3, the law of $(X^{(N)}+Y^{(N)},-X^{(N)}+Y^{(N)})$ is given by a potential satisfying Assumption 5.1 with constants $\sqrt{2}c$ and $\sqrt{2}C$ . Then by Proposition 8.2, the law of $X^{(N)}+Y^{(N)}$ is given by such a potential with the same constants $\sqrt{2}c$ and $\sqrt{2}C$ .

We showed in Proposition 8.1 that the large $N$ limit of the law of $(X^{(N)},Y^{(N)})$ given a freely independent join of the corresponding marginals. Hence, the large $N$ limit of the law of $X^{(N)}+Y^{(N)}$ is given by the free convolution. ∎

As a consequence, we have additivity of entropy under conditioning.

Corollary 8.5.

Let $V^{(N)}(x,y)$ be a potential satisfying Assumption 5.1 as in the setup of Theorem 6.6. Let $(X,Y)$ be a tuple of non-commutative random variables distributed according to the limiting free Gibbs law associated to $V^{(N)}$ . Then

[TABLE]

Proof.

From standard classical results, we have

[TABLE]

Dividing by $N^{2}$ and adding $\frac{1}{2}(m+n)\log N$ to both sides, we obtain the normalized version

[TABLE]

By the previous theorem, we obtain the desired relation for $\chi^{*}$ in the limit as $N\to\infty$ . More precisely, we apply the theorem as stated to $h^{(N)}(X^{(N)}|Y^{(N)})$ . Meanwhile, for $h^{(N)}(X^{(N)},Y^{(N)})$ and $h^{(N)}(Y^{(N)})$ we apply the special case of the theorem where we condition on [math] variables. ∎

8.2. Entropy and Fisher Information Relative to Gaussian

As background for our discussion of the entropy-cost inequality in §8.3, we review the entropy of one probability measure relative to another. If $\nu$ is a measure on $\mathbb{R}^{m}$ , then the entropy of $\mu$ relative to $\nu$ is

[TABLE]

whenever the integral is well-defined. The standard entropy $h(\mu)=-\int\rho\log\rho$ corresponds to the choice of Lebesgue measure for $\nu$ .

*Remark 8.6**.*

The reader should be careful to distinguish between the relative entropy $h(\mu|\nu)$ and the conditional entropy $h(X|Y)$ . The first changes the ambient measure while the second describes conditioning on $Y$ .

*Remark 8.7**.*

If $\mu$ and $\nu$ are both probability measures, then $h(\mu|\nu)\leq 0$ . For this reason, many authors choose to change the sign. We will keep the sign convention given above to be consistent with our convention for $h(\mu)$ relative to Lebesgue measure, but we will write absolute value signs around relative entropy when it is natural to use the positive version.

For probability measures on $M_{N}(\mathbb{C})_{sa}^{m}$ , we may study entropy relative to the Gaussian measure $\sigma_{m,t}^{(N)}$ on $M_{N}(\mathbb{C})_{sa}^{m}$ . A direct computation shows that if $X\sim\mu$ is a random variable in $M_{N}(\mathbb{C})_{sa}^{m}$ , then we have

[TABLE]

We denote the normalized version by

[TABLE]

Similarly, if $\mu$ is a measure on $M_{N}(\mathbb{C})_{sa}^{m+n}$ which absolutely continuous with respect to Lebesgue measure and $(X,Y)$ is the corresponding random variable, we define

[TABLE]

which is equivalent to

[TABLE]

where $\mu_{X|Y=y}$ is the conditional distribution of $X$ given $Y=y$ , and $\mu_{Y}$ is the marginal law of $Y$ . Similarly, if $(X,Y)$ is an $(m+n)$ -tuple of non-commutative random variables, we define the free entropy $\chi^{*}$ relative to Gaussian by

[TABLE]

We define the normalized conditional Fisher information relative to Gaussian by

[TABLE]

Note that if this Fisher information is finite and if $\xi$ is the normalized score function for $X$ given $Y$ as in §6.2, then

[TABLE]

because

[TABLE]

where we have evaluated the middle term on the right hand side using integration by parts. Similarly, for an $(m+n)$ -tuple $(X,Y)$ of non-commutative random variables, we define

[TABLE]

where the second equality holds provided that $\Phi^{*}$ is finite and $\xi$ is the free score function. We have the following version of (6.6) and Lemma 6.3 for entropy and Fisher information relative to Gaussian.

Lemma 8.8.

Let $(X,Y)$ be a random variable in $M_{N}(\mathbb{C})_{sa}^{m+n}$ with a density and with finite variance and let $S$ be an independent GUE $m$ -tuple. Then

[TABLE]

Similarly, suppose that $(X,Y)$ is an $(m+n)$ -tuple of non-commutative random variables and let $S$ be a freely independent free semicircular $m$ -tuple. Then

[TABLE]

Proof.

The first formula follows from [39, §4, Lemma 1] after renormalization. However, we will give an argument by a change of variables in (6.6) that will apply to both $h_{g}^{(N)}$ and $\chi_{g}^{*}$ . Note that by (6.6)

[TABLE]

and in particular, we know that the integral is well-defined in $[-\infty,+\infty)$ . Now we do a change of variables in the integral $t=e^{u}-1$ , $dt=e^{u}\,du$ and obtain

[TABLE]

where we have applied the scaling relation Lemma 6.2 for Fisher information. On the other hand,

[TABLE]

Therefore, altogether

[TABLE]

which is the desired formula. The statement for $\chi^{*}$ can be proved by exactly the same computation, since the definition of $\chi^{*}$ in (6.7) is completely analogous to (6.6). ∎

Furthermore, the log-Sobolev inequality for the Gaussian measure has the following interpretation for entropy and Fisher’s information. This in fact generalizes to entropy and Fisher’s information relative to any measure $\nu$ satisfying LSI, see [39, Definition 1], but we only use the case where $\nu$ is Gaussian and $\mu$ is sufficiently regular.

Lemma 8.9.

Let $X$ be a random variable in $M_{N}(\mathbb{C})_{sa}^{m}$ that has a $C^{1}$ density with respect to Lebesgue measure. Then

[TABLE]

Proof.

First, it suffices to check the non-conditional version $h_{g}^{(N)}(X)\leq\frac{1}{2}\mathcal{I}_{g}^{(N)}(X)$ . Indeed, in the conditional case, the left hand side is $\int h_{g}^{(N)}(\mu_{X|Y=y})\,d\mu_{Y}(y)$ and the right hand side is $\int\mathcal{I}_{g}^{(N)}(\mu_{X|Y=y})\,d\mu_{Y}(y)$ , and solving the non-conditional case would allow us to compare the integrands pointwise.

Now suppose that $X$ has density $\rho$ with respect to Lebesgue measure and let $\tilde{\rho}$ be the density with respect to Gaussian, so that

[TABLE]

By Corollary 2.11, the measure $\sigma_{m,t}^{(N)}$ satisfies the normalized log-Sobolev inequality (2.4) with $c=1$ , so that

[TABLE]

Let $f=\tilde{\rho}^{1/2}$ . Then $\int f^{2}\,d\sigma_{m,t}^{(N)}$ reduces to $1$ , so the right hand side is $|h_{g}^{(N)}(X)|$ . On the other hand, letting $V(x)=-(1/N^{2})\log\rho$ , we get

[TABLE]

and hence on the support of $f$ , we have

[TABLE]

Thus,

[TABLE]

Hence, the log-Sobolev inequality implies the desired inequality. ∎

8.3. Conditional Transport and the Entropy-Cost Inequality

Now we will show that the transport maps constructed in §7.5 satisfy the Talagrand entropy-cost inequality. It was shown in [39, Theorem 1] that if a measure $\nu$ satisfies the log-Sobolev inequality (2.1) with some constant $c$ (and some regularity conditions), then it satisfies the Talagrand inequality

[TABLE]

where $W_{2}$ is the $L^{2}$ -Wasserstein distance, which is equivalent to the infimum of $\lVert X-Y\rVert_{L^{2}}$ over all coupled random variables $X$ and $Y$ with $X\sim\mu$ and $Y\sim\nu$ .

Adapting Otto and Villani’s argument, we will show that the transport maps constructed in §7.5 witness the (conditional) entropy-cost inequality relative to the GUE law for the $N\times N$ matrix models and the corresponding free entropy-cost inequality for the non-commutative random variables. This is claim (5) below, while the other claims in Theorem 8.10 summarize the results of our earlier construction.

We remark that the free Talagrand inequality for self-adjoint tuples was studied in greater generality in [28] and [13, §3.3]. Although we restricted ourselves to the case where the target measure is Gaussian/semicircular, our goal in this paper was not merely to estimate the Wasserstein distance using some coupling, but rather to exhibit a coupling that arises from a transport map, and to show Lipschitzness of this transport map.

Theorem 8.10.

As in Theorem 7.11, let $V^{(N)}(x,y)$ be a potential on $M_{N}(\mathbb{C})_{sa}^{m+n}$ satisfying Assumption 5.1 for some $0<c\leq C$ , and let $\mu^{(N)}$ and $(X^{(N)},Y^{(N)})$ be the corresponding probability measures and random variables. Let $S^{(N)}$ be an independent GUE $m$ -tuple. Let $(X,Y)$ be a tuple of non-commutative random variables given by the limiting free Gibbs law $\lambda$ and let $S$ be a freely independent free semicircular $m$ -tuple. Let $\pi_{1}(x,y)=x$ and $\pi_{2}(x,y)=y$ . Then there exist functions $F^{(N)}$ , $G^{(N)}:M_{N}(\mathbb{C})_{sa}^{m+n}\to M_{N}(\mathbb{C})_{sa}^{m}$ and $F,G\in(\overline{\operatorname{TrP}}_{m+n}^{1})^{m}$ such that

(1)

We have $(F^{(N)}(X^{(N)},Y^{(N)}),Y^{(N)})\sim(S^{(N)},Y^{(N)})$ and $(G^{(N)}(S^{(N)},Y^{(N)}),Y^{(N)})\sim(X^{(N)},Y^{(N)})$ in law, and $(F(X,Y),Y)\sim(S,Y)$ and $(G(S,Y),Y)\sim(X,Y)$ in non-commutative law. 2. (2)

$(F^{(N)},\pi_{2})\circ(G^{(N)},\pi_{2})=\operatorname{id}=(G^{(N)},\pi_{2})\circ(F^{(N)},\pi_{2})$ * and the same holds for $F$ and $G$ .* 3. (3)

$F^{(N)}\rightsquigarrow F$ * and $G^{(N)}\rightsquigarrow G$ .* 4. (4)

We have $\lVert F^{(N)}-\pi_{1}\rVert_{\operatorname{Lip}}$ and $\lVert G^{(N)}-\pi_{1}\rVert_{\operatorname{Lip}}\leq(\max(C,1/c)^{3}-1)\max(C,1/c)^{1/2}$ , and the same holds for $F$ and $G$ . 5. (5)

We have

[TABLE]

and

[TABLE]

Proof.

Let $\tilde{F}_{s,t}^{(N)}$ and $\tilde{F}_{s,t}$ be as in Theorems 7.11 and 7.13. Then let

[TABLE]

The only property that was not shown in the earlier theorems is (5). First, note that as a consequence of (1),

[TABLE]

The rest of the proof of (5) proceeds as in [39, §4]. As in §7.5, let $\tilde{V}_{t}^{(N)}$ denote the potential corresponding to $(\tilde{X}_{t}^{(N)},Y^{(N)})=(e^{-t/2}X^{(N)}+(1-e^{-t})^{1/2}S^{(N)},Y^{(N)})$ and recall that

[TABLE]

and hence

[TABLE]

Then we apply Minkowski’s inequality with respect to integration $d\mu^{(N)}(x,y)$ to obtain

[TABLE]

which can be rewritten as

[TABLE]

where we have applied the fact that $(\tilde{F}_{s,0}^{(N)}(X^{(N)},Y^{(N)}),Y^{(N)})\sim(\tilde{X}_{s}^{(N)},Y^{(N)})$ . It follows from Lemma 8.8 and a change of variables that

[TABLE]

It is easy to see that $s\mapsto\mathcal{I}_{g}^{(N)}(\tilde{X}_{s}^{(N)}|Y^{(N)})$ is bounded on compact sets because of Lemma 6.3 and (8.1). Therefore, we have for almost every $t$ ,

[TABLE]

Hence, for almost every $t$ ,

[TABLE]

where the last line follows from Lemma 8.9. Therefore,

[TABLE]

where we have employed the fact that $\lim_{t\to\infty}|h_{g}^{(N)}(\tilde{X}_{t}^{(N)}|Y^{(N)})|=0$ by (8.3). This establishes the first claim of (5).

The second claim of (5) follows by taking the large $N$ limit using Corollary 5.3 and Theorem 6.6. More precisely, for the left hand side, we take the limit using Corollary 5.3. Meanwhile, for the right hand side, note that $h_{g}^{(N)}(X^{(N)}|Y^{(N)})\to\chi_{g}^{*}(X|Y)$ because $h^{(N)}(X^{(N)}|Y^{(N)})\to\chi^{*}(X|Y)$ and $E\lVert X^{(N)}\rVert_{2}^{2}\to\lVert X\rVert_{2}^{2}$ by Corollary 5.3. ∎

8.4. Construction of Triangular Transport

Finally, by iterating Theorem 8.10, we obtain the following result concerning “lower-triangular transport.” This is analogous to the classical result [6, Corollary 3.10]. Of course, the challenge in our situation was to understand the large $N$ behavior of the transport maps in a dimension-independent way. Unfortunately, the transport constructed here is not optimal among triangular mappings, since indeed Otto and Villani’s construction does not produce the optimal transport map.

Theorem 8.11.

Let $V^{(N)}:M_{N}(\mathbb{C})_{sa}^{m}\to\mathbb{R}$ be a potential satisfying Assumption 5.1. Let $\mu^{(N)}$ and $X^{(N)}$ be the corresponding law and random variable. Let $\lambda$ be the limiting free Gibbs law, and let $X\sim\mu$ be an $m$ -tuple of non-commutative random variables. Let $S^{(N)}$ be an independent GUE $m$ -tuple and let $S$ be a freely independent free semicircular family. Then there exist functions $\Phi^{(N)}$ , $\Psi^{(N)}:M_{N}(\mathbb{C})_{sa}^{m}\to M_{N}(\mathbb{C})_{sa}^{m}$ and $\Phi,\Psi\in(\overline{\operatorname{TrP}}_{m}^{1})^{m}$ such that

(1)

$\Phi^{(N)}(X^{(N)})\sim S^{(N)}$ * and $\Psi^{(N)}(S^{(N)})\sim X^{(N)}$ in law, and similarly, $\Phi(X)\sim S$ and $\Psi(S)\sim X$ in non-commutative law.* 2. (2)

$\Phi^{(N)}$ * and $\Psi^{(N)}$ are inverse functions of each other, and the same holds for $\Phi$ and $\Psi$ .* 3. (3)

$\Phi^{(N)}\rightsquigarrow\Phi$ * and $\Psi^{(N)}\rightsquigarrow\Psi$ .* 4. (4)

$\Phi^{(N)}$ * is upper triangular in the sense that*

[TABLE]

and the same holds for $\Psi^{(N)}$ , $\Phi$ , and $\Psi$ . In particular, the isomorphism $\mathrm{W}^{*}(X)\to\mathrm{W}^{*}(S)$ induced by $\Phi$ maps $\mathrm{W}^{*}(X_{1},\dots,X_{k})$ onto $\mathrm{W}^{*}(S_{1},\dots,S_{k})$ for each $k=1$ , …, $m$ . 5. (5)

We have $\lVert\Phi^{(N)}-\operatorname{id}\rVert_{\operatorname{Lip}}\leq m^{1/2}(\max(C,1/c)^{3}-1)\max(C,1/c)^{1/2}$ and $\lVert\Psi^{(N)}-\operatorname{id}\rVert_{\operatorname{Lip}}$ is bounded by some constant $L(c,C,m)$ which goes to zero as $c,C\to 1$ . 6. (6)

We have

[TABLE]

and

[TABLE] 7. (7)

We have

[TABLE]

where $\Theta$ is the universal constant from Proposition 3.17.

Proof.

First, by Proposition 8.2, the marginal law of $(X_{1}^{(N)},\dots,X_{j}^{(N)})$ is given by a convex potential satisfying the same assumptions.

For each $j$ , we apply Theorem 8.10 with $X_{j}^{(N)}$ as the first variable and $(X_{1}^{(N)},\dots,X_{j-1}^{(N)})$ as the second variable. We thus obtain maps $\Phi_{j}^{(N)}:M_{N}(\mathbb{C})_{sa}^{j}\to M_{N}(\mathbb{C})_{sa}$ such that

[TABLE]

Let

[TABLE]

Let $Y^{(N)}=\Phi^{(N)}(X^{(N)})$ . Then we can check by backwards induction on $j$ that

[TABLE]

Indeed, the base case $j=m$ is trivial. For the induction step, suppose the claim holds for $j$ . Since $Y_{j+1}^{(N)}$ is a function of $X_{1}^{(N)}$ , …, $X_{j}^{(N)}$ , then the induction hypothesis implies that

[TABLE]

where the last line follows because $(X_{1}^{(N)},\dots,X_{j-1}^{(N)},Y_{j}^{(N)})\sim(X_{1}^{(N)},\dots,X_{j-1}^{(N)},S_{j}^{(N)})$ and because $S_{j+1}^{(N)}$ , …, $S_{m}^{(N)}$ are independent of the other variables. By Theorem 8.10, $\Phi_{j}^{(N)}$ is asymptotic to some $\Phi_{j}\in(\overline{\operatorname{TrP}}_{j}^{1})_{sa}$ , and the objects $\Phi$ , $X$ , and $S$ satisfy the analogous transport relations in the non-commutative setting. Now because each $\Phi_{j}^{(N)}-\pi_{x_{j}}$ is $(\max(C,1/c)^{3}-1)\max(C,1/c)^{1/2}$ -Lipschitz, we see that $\Phi^{(N)}-\operatorname{id}$ is $m^{1/2}(\max(C,1/c)^{3}-1)\max(C,1/c)^{1/2}$ -Lipschitz.

By Theorem 8.10, there is a map $G_{j}^{(N)}:M_{N}(\mathbb{C})_{sa}^{m-j+1}\to M_{N}(\mathbb{C})_{sa}$ such that $(x_{1},\dots,x_{j-1},G_{j}^{(N)}(x_{1},\dots,x_{j}))$ is the inverse of $(x_{1},\dots,x_{j-1},\Phi_{j}(x_{1},\dots,x_{j}))$ . Define $\Psi_{j}^{(N)}$ by induction by

[TABLE]

Then $\Psi^{(N)}=(\Psi_{1}^{(N)},\dots,\Psi_{m}^{(N)})$ is the inverse of $\Phi^{(N)}$ . Since $G_{j}^{(N)}-\operatorname{id}$ is $(\max(C,1/c)^{3}-1)\max(C,1/c)^{1/2}$ -Lipschitz, we can show by induction that $\lVert\Psi_{j}^{(N)}\rVert_{\operatorname{Lip}}$ is bounded by a constant depending only on $c$ , $C$ , and $m-j$ , and which goes to zero as $c,C\to 1$ . Moreover, by Lemma 3.12, $\Psi^{(N)}$ is asymptotic to some $\Psi\in(\overline{\operatorname{TrP}}_{m}^{1})_{sa}^{m}$ .

This concludes the verification of (1) - (5). Now to prove (6), we apply Theorem 8.10 (5) and get

[TABLE]

where we have applied the definition of $h_{g}^{(N)}$ and the classical fact that $h^{(N)}$ is additive under conditioning. As before, because $\Phi^{(N)}(X^{(N)})\sim S^{(N)}$ , we see that $\lVert S^{(N)}-\Psi^{(N)}(S^{(N)})\rVert_{L^{2}}=\lVert\Phi^{(N)}(X^{(N)})-X^{(N)}\rVert_{L^{2}}$ . Finally, the second claim of (6) regarding the free case follows by taking the limit as $N\to\infty$ .

Finally, to prove (7), recall that the map $\Phi_{j}$ is a special case of the map $\tilde{F}_{0,\infty}$ in Theorem 7.13. Thus, by applying Theorem 7.13 (4) in the case where $s=\infty$ and $s^{\prime}=t=0$ , we obtain $\lVert\Phi_{j}(X_{1},\dots,X_{j})-(X_{j}-\tau(X_{j}))\rVert_{\infty}\leq(\max(C,1/c)^{3}-1)\max(C,1/c)\Theta$ . Moreover, the middle quantity in claim (7) equals the left hand side because $\Phi(X)\sim S$ . ∎

Funding

This work was supported by the National Science Foundation [grant DMS-1762360] and the UCLA graduate division.

Acknowledgements

I thank Dima Shlyakhtenko, Ben Hayes, Brent Nelson, Yoann Dabrowski, Yoshimichi Ueda, and Todd Kemp for various useful conversations and comments on drafts of this paper. The results of this paper were motivated in part by discussions with Ben Hayes regarding free entropy and maximal amenable subalgebras. Dima Shlyakhtenko suggested the name “triangular transport.” The anonymous referees suggested several references and improvements to the exposition, including the connection with model theory.

Bibliography55

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] G. W. Anderson, A. Guionnet, and O. Zeitouni , An Introduction to Random Matrices , Cambridge Studies in Advanced Mathematics, Cambridge University Press, 2009.
2[2] C. Antharaman and S. Popa , An Introduction to II 1 Factors , 2017. preprint available at http://www.math.ucla.edu/ popa/Books/I Iun-v 10.pdf.
3[3] P. Biane, M. Capitaine, and A. Guionnet , Large deviation bounds for matrix Brownian motion , Invent. Math., 152 (2003), pp. 433–459.
4[4] S. G. Bobkov , Large deviations via transference plans , Advances in Mathematics Research, 2 (2003), pp. 151-175.
5[5] S. G. Bobkov and M. Ledoux , From Brunn-Minkowski to Braskamp-Lieb and to logarithmic Sobolev inequalities , Geom. Funct. Anal., 10 (2000), pp. 1028–1052.
6[6] V. I. Bogachev, A. V Kolesnikov, and K. V. Medvedev , Triangular transformations of measures , Sb. Math., 196.3 (2005), pp. 309–335.
7[7] R. Boutonnet and A. Carderi , Maximal amenable von neumann subalgebras arising from maximal amenable subgroups , Geometric and Functional Analysis, 25 (2015), pp. 1688–1705.
8[8] R. Boutonnet and C. Houdayer , Amenable absorption in amalgamated free product von neumann algebras , Kyoto J. Math., 58 (2018), pp. 583–593.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Conditional Expectation, Entropy, and Transport for Convex Gibbs Laws in Free Probability

Abstract.

Key words and phrases:

1991 Mathematics Subject Classification:

1. Introduction

1.1. Motivation

1.2. Summary of Main Results

1.3. Notation and Background

1.4. Main Results on Conditional Expectation

1.5. Main Results on Entropy

1.6. Main Results on Transport

1.7. Outline

2. Multi-matrix Models from Convex Potentials

2.1. Semi-convex and Semi-concave Functions

Definition 2.1**.**

Lemma 2.2**.**

Sketch of proof.

Lemma 2.3**.**

Proof.

2.2. Some Basic Lemmas

Lemma 2.4**.**

Proof.

Lemma 2.5**.**

Proof.

Corollary 2.6**.**

Proof.

Lemma 2.7**.**

Proof.

2.3. Log-Sobolev Inequality and Concentration

Definition 2.8**.**

Definition 2.9**.**

Theorem 2.10**.**

Corollary 2.11**.**

Lemma 2.12**.**

Proof.

3. Functional Calculus and Asymptotic Approximation

3.1. The Algebra of Trace Polynomials

3.2. Functions Approximable by Trace Polynomials

Remark 3.1*.*

Remark 3.2*.*

Remark 3.3*.*

3.3. Asymptotic Approximation for Functions of Matrices

Definition 3.4**.**

Definition 3.5**.**

Lemma 3.6**.**

Proof.

Remark 3.7*.*

Remark 3.8*.*

3.4. Algebra, Composition, and Limits

Lemma 3.9**.**

Proof.

Observation 3.10**.**

Lemma 3.11**.**

Proof.

Lemma 3.12**.**

Proof.

Lemma 3.13**.**

Proof.

3.5. Functional Calculus and Operator Norm Bounds

Proposition 3.14**.**

Lemma 3.15**.**

Proof.

Lemma 3.16**.**

Proof.

Proof of Proposition 3.14.

Proposition 3.17**.**

Proof.

Remark 3.18*.*

4. Tools for Differential Equations in TrP⁡‾mj\overline{\operatorname{TrP}}_{m}^{j}TrPmj​

4.1. Flows Along Vector Fields

Assumption 4.1**.**

Observation 4.2**.**

Lemma 4.3**.**

Proof.

Definition 2.1.

Lemma 2.2.

Lemma 2.3.

Lemma 2.4.

Lemma 2.5.

Corollary 2.6.

Lemma 2.7.

Definition 2.8.

Definition 2.9.

Theorem 2.10.

Corollary 2.11.

Lemma 2.12.

*Remark 3.1**.*

*Remark 3.2**.*

*Remark 3.3**.*

Definition 3.4.

Definition 3.5.

Lemma 3.6.

*Remark 3.7**.*

*Remark 3.8**.*

Lemma 3.9.

Observation 3.10.

Lemma 3.11.

Lemma 3.12.

Lemma 3.13.

Proposition 3.14.

Lemma 3.15.

Lemma 3.16.

Proposition 3.17.

*Remark 3.18**.*

4. Tools for Differential Equations in $\overline{\operatorname{TrP}}_{m}^{j}$

Assumption 4.1.

Observation 4.2.

Lemma 4.3.

Lemma 4.4.

Proposition 4.5.

Assumption 4.6.

Proposition 4.7.

Lemma 4.8.

*Remark 4.9**.*

Lemma 4.10.

Definition 4.11.

Lemma 4.12.

*Remark 4.13**.*

Assumption 5.1.

Theorem 5.2.

Corollary 5.3.

Definition 5.4.

*Remark 5.5**.*

Definition 5.6.

*Remark 5.7**.*

*Remark 5.8**.*

Theorem 5.9.

Proposition 5.10.

Lemma 5.11.

Corollary 5.12.

Lemma 5.13.

*Remark 5.14**.*

Lemma 6.1.

Lemma 6.2.

Lemma 6.3.

*Remark 6.4**.*

Lemma 6.5.

Theorem 6.6.

*Remark 6.7**.*

*Remark 6.8**.*

Proposition 7.1.

Theorem 7.2.

Lemma 7.3.

*Remark 7.4**.*

Lemma 7.5.

Lemma 7.6.