On Matrix Rearrangement Inequalities

Rima Alaifari; Xiuyuan Cheng; Lillian B. Pierce; Stefan Steinerberger

arXiv:1904.05239·math.FA·July 3, 2020

On Matrix Rearrangement Inequalities

Rima Alaifari, Xiuyuan Cheng, Lillian B. Pierce, Stefan Steinerberger

PDF

TL;DR

This paper proves that matrix rearrangement inequalities hold for all disordered words in 2x2 matrices and for most small perturbations of the identity in larger matrices, extending previous partial results.

Contribution

It establishes the validity of matrix rearrangement inequalities for all disordered words in 2x2 matrices and for generic small perturbations in larger matrices, improving upon prior characterizations.

Findings

01

Rearrangement inequality holds for all disordered words in 2x2 matrices.

02

For larger matrices, the inequality holds for most small perturbations of the identity.

03

Counterexamples exist only for specific matrix sizes and configurations.

Abstract

Given two symmetric and positive semidefinite square matrices $A, B$ , is it true that any matrix given as the product of $m$ copies of $A$ and $n$ copies of $B$ in a particular sequence must be dominated in the spectral norm by the ordered matrix product $A^{m} B^{n}$ ? For example, is $∥ AA B AA B A B B ∥ \leq ∥ AAAAA B B B B ∥ ?$ Drury has characterized precisely which disordered words have the property that an inequality of this type holds for all matrices $A, B$ . However, the $1$ -parameter family of counterexamples Drury constructs for these characterizations is comprised of $3 \times 3$ matrices, and thus as stated the characterization applies only for $N \times N$ matrices with $N \geq 3$ . In contrast, we prove that for $2 \times 2$ matrices, the general rearrangement inequality holds for all disordered words. We also show that for larger $N \times N$ matrices, the general rearrangement…

Equations168

∥ AA B AA B A B B ∥ \leq ∥ AAAAA B B B B ∥ ?

∥ AA B AA B A B B ∥ \leq ∥ AAAAA B B B B ∥ ?

∥ A B A B A ∥ \leq ∥ AAA B B ∥

∥ A B A B A ∥ \leq ∥ AAA B B ∥

∥ M ∥ = ∥ x ∥_{2} = 1 sup ∥ M x ∥_{2} .

∥ M ∥ = ∥ x ∥_{2} = 1 sup ∥ M x ∥_{2} .

∥ A^{m_{1}} B^{n_{1}} A^{m_{2}} B^{n_{2}} \dots A^{m_{s}} B^{n_{s}} ∥ \leq ∥ A^{m} B^{n} ∥,

∥ A^{m_{1}} B^{n_{1}} A^{m_{2}} B^{n_{2}} \dots A^{m_{s}} B^{n_{s}} ∥ \leq ∥ A^{m} B^{n} ∥,

m = j = 1 \sum s m_{j}, n = j = 1 \sum s n_{j},

m = j = 1 \sum s m_{j}, n = j = 1 \sum s n_{j},

∥ A B A ∥ \leq ∥ AA B ∥ .

∥ A B A ∥ \leq ∥ AA B ∥ .

∣ ⟨ T x, y ⟩ ∣ \leq ∥ A^{α} x ∥ B^{1 - α} y \mbox f or a l l 0 \leq α \leq 1.

∣ ⟨ T x, y ⟩ ∣ \leq ∥ A^{α} x ∥ B^{1 - α} y \mbox f or a l l 0 \leq α \leq 1.

∥ A^{s} B^{s} ∥ \leq ∥ A B ∥^{s} .

∥ A^{s} B^{s} ∥ \leq ∥ A B ∥^{s} .

∥ A X B ∥ \leq ∥ A^{2} X ∥^{1/2} ∥ X B^{2} ∥^{1/2} .

∥ A X B ∥ \leq ∥ A^{2} X ∥^{1/2} ∥ X B^{2} ∥^{1/2} .

∥ A B ∥^{n} \leq ∥ A^{n} B^{n} ∥.

∥ A B ∥^{n} \leq ∥ A^{n} B^{n} ∥.

\frac{1}{n ^{m}} j_{1}, \dots, j_{m} = 1 \sum n A_{j_{1}} \dots A_{j_{m}} \geq \frac{( n - m )!}{n !} \mbox a l l d i s t in c t j _{1} , \dots , j _{m} = 1 \sum A_{j_{1}} A_{j_{2}} \dots A_{j_{m}} .

\frac{1}{n ^{m}} j_{1}, \dots, j_{m} = 1 \sum n A_{j_{1}} \dots A_{j_{m}} \geq \frac{( n - m )!}{n !} \mbox a l l d i s t in c t j _{1} , \dots , j _{m} = 1 \sum A_{j_{1}} A_{j_{2}} \dots A_{j_{m}} .

∥ A^{m_{1}} B^{n_{1}} A^{m_{2}} B^{n_{2}} \dots A^{m_{s}} B^{n_{s}} ∥ \leq ∥ A^{m} B^{n} ∥,

∥ A^{m_{1}} B^{n_{1}} A^{m_{2}} B^{n_{2}} \dots A^{m_{s}} B^{n_{s}} ∥ \leq ∥ A^{m} B^{n} ∥,

∥ AA B B A B B AA B B AA ∥ \leq ∥ A^{7} B^{6} ∥

∥ AA B B A B B AA B B AA ∥ \leq ∥ A^{7} B^{6} ∥

∥ AA B A B B ∥ \leq ∥ A^{3} B^{3} ∥ \mbox c an f ai l f or cer t ain A, B .

∥ AA B A B B ∥ \leq ∥ A^{3} B^{3} ∥ \mbox c an f ai l f or cer t ain A, B .

∥ A^{m_{1}} B^{n_{1}} A^{m_{2}} B^{n_{2}} \dots A^{m_{s}} B^{n_{s}} ∥ \leq ∥ A^{m} B^{n} ∥,

∥ A^{m_{1}} B^{n_{1}} A^{m_{2}} B^{n_{2}} \dots A^{m_{s}} B^{n_{s}} ∥ \leq ∥ A^{m} B^{n} ∥,

∥ (Id + ε A)^{m_{1}} (Id + εB)^{n_{1}} \dots (Id + ε A)^{m_{s}} (Id + εB)^{n_{s}} ∥ \leq ∥ (Id + ε A)^{m} (Id + εB)^{n} ∥,

∥ (Id + ε A)^{m_{1}} (Id + εB)^{n_{1}} \dots (Id + ε A)^{m_{s}} (Id + εB)^{n_{s}} ∥ \leq ∥ (Id + ε A)^{m} (Id + εB)^{n} ∥,

p_{1} + \dots + p_{k} = 1 = q_{1} + \dots + q_{k};

p_{1} + \dots + p_{k} = 1 = q_{1} + \dots + q_{k};

\mbox t r (C^{p_{1}} D^{q_{1}} \dots C^{p_{k}} D^{q_{k}}) \leq \mbox t r (C D) .

\mbox t r (C^{p_{1}} D^{q_{1}} \dots C^{p_{k}} D^{q_{k}}) \leq \mbox t r (C D) .

\mbox t r (A_{1} A_{2} \dots A_{n}) = \mbox t r (A_{2} A_{3} \dots A_{n - 1} A_{n} A_{1}) .

\mbox t r (A_{1} A_{2} \dots A_{n}) = \mbox t r (A_{2} A_{3} \dots A_{n - 1} A_{n} A_{1}) .

∥ A ∥^{2} = ∥ x ∥ = 1 max ⟨ A x, A x ⟩ = ∥ x ∥ = 1 max ⟨ A^{T} A x, x ⟩ = λ_{m a x} (A^{T} A),

∥ A ∥^{2} = ∥ x ∥ = 1 max ⟨ A x, A x ⟩ = ∥ x ∥ = 1 max ⟨ A^{T} A x, x ⟩ = λ_{m a x} (A^{T} A),

W_{m, n} (A, B) := A^{m_{1}} B^{n_{1}} \dots A^{m_{s}} B^{n_{s}}

W_{m, n} (A, B) := A^{m_{1}} B^{n_{1}} \dots A^{m_{s}} B^{n_{s}}

m = m_{1} + \dots + m_{s} \mbox an d n = n_{1} + \dots + n_{s} .

m = m_{1} + \dots + m_{s} \mbox an d n = n_{1} + \dots + n_{s} .

σ (B^{n} A^{2 m} B^{n}) = {λ_{1}, λ_{2}} \mbox an d σ (W_{m, n} (A, B)^{T} W_{m, n} (A, B)) = {μ_{1}, μ_{2}} .

σ (B^{n} A^{2 m} B^{n}) = {λ_{1}, λ_{2}} \mbox an d σ (W_{m, n} (A, B)^{T} W_{m, n} (A, B)) = {μ_{1}, μ_{2}} .

λ_{1} = ∥ A^{m} B^{n} ∥^{2} \mbox an d μ_{1} = ∥ W_{m, n} (A, B) ∥^{2} .

λ_{1} = ∥ A^{m} B^{n} ∥^{2} \mbox an d μ_{1} = ∥ W_{m, n} (A, B) ∥^{2} .

μ_{1} \leq λ_{1} .

μ_{1} \leq λ_{1} .

(p_{1}, q_{1}, \dots, p_{s}, q_{s}) = (\frac{m _{s}}{2 m}, \frac{n _{s - 1}}{2 n}, \dots, \frac{n _{1}}{2 n}, \frac{2 m _{1}}{2 m}, \frac{n _{1}}{2 n}, \dots, \frac{m _{s}}{2 m}, \frac{2 n _{s}}{2 n}),

(p_{1}, q_{1}, \dots, p_{s}, q_{s}) = (\frac{m _{s}}{2 m}, \frac{n _{s - 1}}{2 n}, \dots, \frac{n _{1}}{2 n}, \frac{2 m _{1}}{2 m}, \frac{n _{1}}{2 n}, \dots, \frac{m _{s}}{2 m}, \frac{2 n _{s}}{2 n}),

\mbox t r (W_{m, n} (A, B)^{T} W_{m, n} (A, B))

\mbox t r (W_{m, n} (A, B)^{T} W_{m, n} (A, B))

\leq \mbox t r (C D) = \mbox t r (B^{n} A^{2 m} B^{n}) .

μ_{1} + μ_{2} \leq λ_{1} + λ_{2} .

μ_{1} + μ_{2} \leq λ_{1} + λ_{2} .

λ_{1} \cdot λ_{2} = det (B^{n} A^{2 m} B^{n}) = det (W_{m, n} (A, B)^{T} W_{m, n} (A, B)) = μ_{1} \cdot μ_{2} .

λ_{1} \cdot λ_{2} = det (B^{n} A^{2 m} B^{n}) = det (W_{m, n} (A, B)^{T} W_{m, n} (A, B)) = μ_{1} \cdot μ_{2} .

λ_{1} λ_{2}

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

On Matrix Rearrangement Inequalities

Rima Alaifari, Xiuyuan Cheng, Lillian B. Pierce and Stefan Steinerberger

Rima Alaifari: Department of Mathematics, ETH Zürich, Rämistrasse 101, 8092 Zürich

Xiuyuan Cheng: Department of Mathematics, Duke University, 120 Science Drive, Durham NC 27708

Lillian B. Pierce: Department of Mathematics, Duke University, 120 Science Drive, Durham NC 27708

Stefan Steinerberger: Department of Mathematics, Yale University, 10 Hillhouse Avenue, New Haven, 06511 CT

Abstract.

Given two symmetric and positive semidefinite square matrices $A,B$ , is it true that any matrix given as the product of $m$ copies of $A$ and $n$ copies of $B$ in a particular sequence must be dominated in the spectral norm by the ordered matrix product $A^{m}B^{n}$ ? For example, is

[TABLE]

Drury [10] has characterized precisely which disordered words have the property that an inequality of this type holds for all matrices $A,B$ . However, the $1$ -parameter family of counterexamples Drury constructs for these characterizations is comprised of $3\times 3$ matrices, and thus as stated the characterization applies only for $N\times N$ matrices with $N\geq 3$ . In contrast, we prove that for $2\times 2$ matrices, the general rearrangement inequality holds for all disordered words. We also show that for larger $N\times N$ matrices, the general rearrangement inequality holds for all disordered words, for most $A,B$ (in a sense of full measure) that are sufficiently small perturbations of the identity.

Key words and phrases:

Rearrangement Inequality, Linear Operators, Matrix inequalities.

2010 Mathematics Subject Classification:

15A45, 47A30, 47A63 (primary) and 39B42 (secondary).

R.A. thanks David Gontier for fruitful discussions. X.C. is partially supported by the NSF (DMS-1818945, DMS-1820827). L.P. is partially supported by CAREER grant NSF DMS-1652173 and the Alfred P. Sloan Foundation. S.S. is partially supported by the NSF (DMS-1763179) and the Alfred P. Sloan Foundation.

1. Introduction

1.1. Introduction.

Rearrangement inequalities for functions have a long history; we refer to Lieb and Loss [20] for an introduction and an example of their ubiquity in Analysis, Mathematical Physics, and Partial Differential Equations. A natural question that one could ask is whether there is an operator-theoretic variant of such rearrangement inequalities. For example, given two operators $A:X\rightarrow X$ and $B:X\rightarrow X$ , is there an inequality

[TABLE]

where $\|\cdot\|$ is a norm on operators? In this paper, we will study the question for $A,B$ being symmetric and positive semidefinite square matrices and $\|\cdot\|$ denoting the classical operator norm

[TABLE]

We are interested in whether one could hope for a statement of the general type

[TABLE]

where

[TABLE]

with $m_{j},n_{j}$ positive integers (except that we allow $m_{1}=0$ or $n_{s}=0$ ). Of course, if the operators commute then any such inequality is trivially an equality. A reason why one might hope in general for such a statement to be true is that one could expect the repeated application of only one operator to lead to growth (or at least preservation) of the norms of suitable eigenvectors, while alternating applications of two operators could have the effect of projecting alternately onto two possibly different eigenbases, thus losing size of the eigenvectors.

1.2. Known results.

There are several encouraging results in this direction, some of which are by now classical in Operator Theory, and have been extended in a variety of different ways. We note:

•

Heinz-Löwner inequality (Heinz [16], 1951), (Löwner [21], 1934) stating that

[TABLE]

•

Heinz-Kato inequality (Heinz [16], 1951), (Kato [19], 1952). If $A,B$ are positive operators and $T$ is a linear operator such that $\|Tx\|\leq\|Ax\|$ and $\|T^{*}y\|\leq\|By\|$ for all $x,y$ in a Hilbert space, then

[TABLE]

•

Cordes inequality (Cordes [8], 1987). For all symmetric and positive definite $A,B$ and all $0\leq s\leq 1$

[TABLE]

•

McIntosh’s inequality (McIntosh [23], 1979) generalizes several of the earlier results and shows that for $A,B$ as above and $X$ an arbitrary square matrix of the same size,

[TABLE]

The last author characterized equality for several of these inequalities in [27].

•

Furuta’s inequality [12] (see also [8]) shows that for any $n\geq 1$

[TABLE]

There is a large literature connected to these inequalities; we refer to [3, 6, 9, 11, 14, 17] as well as the books by Bhatia [4, 5], Cordes [8], Furuta [13], Marshall, Olkin & Arnold [22], Simon [26] and Zhan [28]. Many open problems remain. The authors themselves were motivated by a conjecture of Recht and Ré [25] who asked whether, for $n$ positive definite matrices $A_{1},\dots,A_{n}$ , there is an inequality

[TABLE]

Recht and Ré [25] proved the inequality for $n=m=2$ ; Zhang [29] recently gave a proof for $m=3$ and $n\geq 6$ being a multiple of 3. Israel, Krahmer and Ward [18] prove the inequality for $n=3$ ; we also refer to recent work of Albar, Junge and Zhao [1]. One way of interpreting the conjectured inequality of Recht and Ré is that repetition of matrices has a beneficial effect on the operator norm; this leads to asking about matrix rearrangement inequalities, as studied in this paper.

1.3. Statement of results.

Consider a putative inequality

[TABLE]

where $m=\sum_{i}m_{i},n=\sum_{i}n_{i}$ and $m_{i},n_{i}\in\mathbb{N}$ (possibly allowing $m_{1}=0$ or $n_{s}=0$ ) and $A,B$ are symmetric and positive semidefinite square matrices. Is it true that given any “word,” that is, a tuple of exponents $(m_{1},n_{1},\ldots,m_{s},n_{s})$ , the inequality (1) holds for all such $A,B$ ? Drury [10] has shown that at this level of generality, the question has a negative answer. Moreover, he provides a complete characterization of conditions on the exponents $(m_{1},n_{1},\ldots,m_{s},n_{s})$ for which such an inequality holds for all such $A,B$ (of all dimensions). For example, Drury shows that we always have

[TABLE]

while the inequality

[TABLE]

The counterexamples given by Drury to the general rearrangement inequality stem from a 1-parameter family of $3\times 3$ matrices. In contrast, our first main result is that the general rearrangement inequality does indeed hold true for any word, for all $2\times 2$ symmetric positive semidefinite matrices.

Theorem 1 (General Rearrangement Inequality for $2\times 2$ Matrices).

Let $A,B$ be symmetric positive semidefinite matrices of size $2\times 2$ and let $m_{i},n_{i}\in\mathbb{N}$ (possibly allowing $m_{1}=0$ or $n_{s}=0$ ). Then

[TABLE]

where $m=\sum_{i}m_{i}$ and $n=\sum_{i}n_{i}$ .

In light of Drury’s results, there is no hope for such general inequalities in higher dimensions. Nonetheless, one could wonder whether there is hope that, given any word $(m_{1},n_{1},\ldots,m_{s},n_{s})$ , a rearrangement inequality should hold for some (or maybe even most) pairs of $N\times N$ matrices $(A,B)$ . This is the motivation for our second result, which states that given any word, the rearrangement inequality is generically true for $N\times N$ matrices in a sufficiently small neighborhood of the identity, for all $N\geq 2$ .

Theorem 2 (General Rearrangement close to the Identity, arbitrary dimension).

Let $A,B$ be symmetric positive semidefinite matrices and let $m_{i},n_{i}\in\mathbb{N}$ (possibly allowing $m_{1}=0$ or $n_{s}=0$ ). If $\mbox{ker}(AB-BA)=\emptyset$ , then there exists $\varepsilon_{0}=\varepsilon_{0}(A,B,m,n)>0$ such that for all $0<\varepsilon<\varepsilon_{0}$

[TABLE]

where $m=\sum_{i}m_{i}$ and $n=\sum_{i}n_{i}$ .

Thus given any fixed word, this provides a codimension 1 family of $(A,B)$ among all relevant pairs of $N\times N$ matrices in the neighborhood of the identity, which satisfy the rearrangement inequality for that word. We do not know whether the condition $\mbox{ker}(AB-BA)=\emptyset$ is necessary but are inclined to think that it may not be.

There are many other natural questions that come to mind. The rearrangement inequalities are invariant under multiplication with constants, which allows us to compactify the set of matrices: are such inequalities generically true (in, say, the sense that the measure of admissible matrices approaches full measure as the length of the inequality, or the number $s$ , increases)? Another question could be to determine other simple conditions on the matrices (other than assuming that they commute) that would imply the desired rearrangement inequalities hold.

2. Proof of Theorem 1

Our proof uses three different ingredients. The first ingredient is Corollary 4.4 in a paper of Ando, Hiai & Okubo [2] which states the following: let $C,D$ be symmetric positive semidefinite matrices of size $2\times 2$ and for $i=1,\dots,k$ , let $p_{i},q_{i}\geq 0$ satisfy

[TABLE]

then

[TABLE]

We remark that Ando, Hiai & Okubo [2] were motivated by the question whether such a trace inequality might be true in general: they establish the result for general positive semidefinite matrices that have at most two distinct eigenvalues. Plevnik [24] recently constructed an example showing that (2) can fail for $3\times 3$ matrices.

The second ingredient is the invariance of trace with respect to cyclic permutations, i.e.

[TABLE]

The third ingredient is the basic equation

[TABLE]

where $\lambda_{\max}$ denotes the largest eigenvalue of a matrix.

Let $A,B$ be symmetric positive semidefinite matrices of size $2\times 2$ . Consider now a general word

[TABLE]

where

[TABLE]

Assume the symmetric matrices $B^{m}A^{2n}B^{m}$ and $W_{m,n}(A,B)^{T}W_{m,n}(A,B)$ have eigenvalues (not necessarily distinct) given by

[TABLE]

We note that all these eigenvalues are nonnegative. Moreover, assuming the ordering $\lambda_{1}\geq\lambda_{2}$ and $\mu_{1}\geq\mu_{2}$ , we have by (4) that

[TABLE]

Thus to prove Theorem 1, it suffices to show that

[TABLE]

Defining $C=A^{2m},D=B^{2n}$ , we employ the cyclic identity (3) followed by (2) with $k=2s$ and

[TABLE]

followed by a second application of the cyclic identity (3) to obtain

[TABLE]

Since the trace is merely the sum of the eigenvalues, this shows that

[TABLE]

On the other hand, the determinant is multiplicative, and so

[TABLE]

It is simple to deduce from these two relations that (6) must hold.

Indeed, if $\mu_{1}=0$ , we have the desired result (6). If $\mu_{1}\neq 0$ but $\mu_{2}=0$ then either $\lambda_{1}$ or $\lambda_{2}$ must vanish, by (8). If $\lambda_{1}=0$ , then $\lambda_{1}\geq\lambda_{2}$ implies that $\lambda_{2}=0$ and we have a contradiction to (7). Thus in this case we must have $\lambda_{2}=0$ , and then the desired inequality (6) follows from (7). It remains to deal with the case when $\lambda_{1}$ and $\lambda_{2}$ are both nonzero, which implies that $\mu_{1}\geq\mu_{2}>0$ . Suppose contrary to (6) that $\mu_{1}=\lambda_{1}+\delta_{1}$ for some $\delta_{1}>0$ ; then (7) implies that $\mu_{2}=\lambda_{2}-\delta_{2}$ for some $\delta_{2}\geq\delta_{1}>0$ . Then by (8),

[TABLE]

which is the desired contradiction. (Alternatively one can use (7) and (8) to prove, using induction and repeated squaring of both sides of (7), that for any $k\in\mathbb{N}$ , $mu_{1}^{2^{k}}+\mu_{2}^{2^{k}}\leq\lambda_{1}^{2^{k}}+\lambda_{2}^{2^{k}}.$ For such expressions the leading term is asymptotically dominant and this shows $\mu_{1}\leq\lambda_{1}$ .) This verifies (6) and hence completes the proof of Theorem 1.

3. Proof of Theorem 2

Let $A,B$ be fixed symmetric positive semidefinite $N\times N$ matrices, and assume that the tuple of exponents $(m_{1},n_{1},\ldots,m_{s},n_{s})$ is fixed, with $m=\sum m_{i}$ and $n=\sum n_{i}$ . Let $W_{m,n}(\mbox{Id}+\varepsilon A,\mbox{Id}+\varepsilon B)$ denote the corresponding word in terms of $\mbox{Id}+\varepsilon A,\mbox{Id}+\varepsilon B$ , analogous to (5). The proof idea can be summarized as follows. Let $X_{\varepsilon}$ denote $W_{m,n}(1+\varepsilon A,1+\varepsilon B)$ , and let $Z_{\varepsilon}$ denote $(\mbox{Id}+\varepsilon A)^{m}(\mbox{Id}+\varepsilon B)^{n}$ . We will choose a vector $v_{\varepsilon}$ with $\|v_{\varepsilon}\|=1$ that maximizes

[TABLE]

Then as long as we can show that for this $v_{\varepsilon}$ we have

[TABLE]

we can conclude that

[TABLE]

thus proving Theorem 2.

By simply multiplying out $\|X_{\varepsilon}v_{\varepsilon}\|^{2}$ and $\|Z_{\varepsilon}v_{\varepsilon}\|^{2}$ , we will see that the leading order terms (in $\varepsilon$ ) come in both cases from a matrix of the form

[TABLE]

This motivates us to show that a significant proportion of $v_{\varepsilon}$ must lie in the eigenspace of the largest eigenvalue of the matrix $mA+nB$ (Lemma 1 below). This observation will suffice to examine terms up to second order in $\varepsilon$ in the desired inequality (9). Next, to treat the terms of third order and higher in $\varepsilon$ , we will use a second lemma (Lemma 2 below), which shows that if $\ker(AB-BA)=\emptyset$ , for an eigenvector corresponding to the largest eigenvalue of $mA+nB$ , the third order terms provide a strict inequality. This therefore allows us to neglect all higher order terms in $\varepsilon$ (as long as $\varepsilon$ is sufficiently small), and that leads to the desired inequality (9).

3.1. Two Lemmata

Our first lemma states that a one-parameter family of matrices that is approximately given by the identity plus a small linear term $\varepsilon Y$ has the property that the eigenvector corresponding to its largest eigenvalue is necessarily very close to the leading eigenspace of the linear perturbation $Y$ . This statement is certainly not novel, but we provide its simple proof.

Lemma 1.

Let $X_{\varepsilon}=\emph{Id}+\varepsilon Y+\mathcal{O}(\varepsilon^{2})$ , where $Y$ is a symmetric positive semidefinite matrix and $\varepsilon$ varies, giving a one-parameter family. For each $\varepsilon>0$ let $v_{\varepsilon}$ be a vector satisfying $\|v_{\varepsilon}\|=1$ and

[TABLE]

Let $\pi$ be the orthogonal projection onto the eigenspace of the largest eigenvalue of $Y$ . Then there exists a constant $C_{1}=C_{1}(Y)>0$ and also $\varepsilon_{0}=\varepsilon_{0}(Y)>0$ such that for every $0<\varepsilon<\varepsilon_{0}$ ,

[TABLE]

Proof.

Let us simplify notation and write $v=\pi v_{\varepsilon}$ and $w=v_{\varepsilon}-v$ . Observe that they are orthogonal and thus

[TABLE]

We have, expanding up to first order,

[TABLE]

in which the implicit constant depends on $Y$ . We will now see that several terms simplify. If $Y$ has only one eigenvalue, then the projection $\pi$ is merely the identity and the result follows. From now on we may suppose that $Y$ has at least two distinct eigenvalues and we use $\lambda_{1}$ to denote the largest eigenvalue of $Y$ and $\lambda_{2}<\lambda_{1}$ to denote the next largest. Then

[TABLE]

Altogether we have

[TABLE]

We recall that $v_{\varepsilon}$ was chosen to maximize $\|X_{\varepsilon}u\|^{2}$ over all $\|u\|=1$ . In particular, if $u$ is an eigenvector of $Y$ for $\lambda_{1}$ with $\|u\|=1$ , then

[TABLE]

Applying this in (11) shows that there is a constant $c>0$ (depending on the implicit constants in the $\mathcal{O}(\varepsilon^{2})$ terms, and hence on $Y$ ) such that as long as $\varepsilon$ is sufficiently small (again relative to the implicit constants in the $\mathcal{O}(\varepsilon^{2})$ terms),

[TABLE]

Using $\|w\|^{2}=1-\|v\|^{2}$ , we obtain

[TABLE]

where $c^{\prime}=c/(\lambda_{1}-\lambda_{2})$ and therefore

[TABLE]

where $C_{1}=c^{\prime}(1-\varepsilon/4)\geq c^{\prime}/2$ for all $\varepsilon<\varepsilon_{0}(Y)$ sufficiently small, for a parameter $\varepsilon_{0}(Y)$ depending only on $c,\lambda_{1},\lambda_{2}$ , and hence only on $Y$ . ∎

Our second lemma states rearrangement inequalities for an eigenvector of the largest eigenvalue of $mA+nB$ (motivated by Lemma 1). The argument is again elementary but the statement itself is so specific that it is presumably new.

Lemma 2.

Let $A,B$ be symmetric and positive semidefinite square matrices such that $\ker(AB-BA)=\emptyset$ . Fix $m,n\in\mathbb{N}$ and let $\lambda_{1}$ denote the largest eigenvalue of $mA+nB$ . Then there exists a constant $C_{2}=C_{2}(n,m,A,B)>0$ such that for all vectors $\|v\|=1$ satisfying

[TABLE]

we have the inequalities

[TABLE]

Proof.

We start by showing the first inequality. We claim

[TABLE]

in the sense that

[TABLE]

is positive semidefinite. Indeed, we have that

[TABLE]

with equality if and only if $x$ is an eigenvector of $mA+nB$ corresponding to eigenvalue $\lambda_{1}$ . This holds since $mA+nB$ is symmetric and positive semidefinite and its operator norm thus coincides with its largest eigenvalue. We now suppose $v$ with $\|v\|=1$ satisfies (12). Solving for $Bv$ in $(mA+nB)v=\lambda_{1}v$ , we can rewrite

[TABLE]

We now need to compare this to $\left\langle ABAv,v\right\rangle$ which we can rewrite as

[TABLE]

subtracting this from (14) we see by (13) that

[TABLE]

Now we aim to show that this inequality is strict if $v$ satisfies (12). From our previous observation about (13), we know that equality holds in this last inequality precisely when $Av$ is an eigenvector of $mA+nB$ corresponding to eigenvalue $\lambda_{1}$ . Suppose this is true. Then

[TABLE]

while on the other hand, multiplying our assumption (12) by $A$ on the left-hand side shows that

[TABLE]

Subtracting these two identities shows that $ABv=BAv$ , violating our assumption $\ker(AB-BA)=\emptyset$ . We conclude that $Av$ cannot be an eigenvector for $mA+nB$ corresponding to $\lambda_{1}$ , and hence the inequality in (15) is strict, for any $\|v\|=1$ satisfying (12). By compactness of the unit ball $\left\{v:\|v\|=1\right\}$ , there exists a constant $C_{2}=C_{2}(n,m,A,B)>0$ such that

[TABLE]

concluding the proof of the first claim. As for the second inequality, we relabel $A$ and $B$ , obtain from the first case that

[TABLE]

and note that $\left\langle BBAv,v\right\rangle=\left\langle Av,BBv\right\rangle=\left\langle ABBv,v\right\rangle$ .

∎

3.2. Conclusion of the proof of Theorem 2

We are now ready to prove Theorem 2. We recall from the beginning of §3 that we consider a particular word $X_{\varepsilon}=W_{m,n}(1+\varepsilon A,1+\varepsilon B)$ , with sequences of exponents $(m_{1},n_{1},\ldots,m_{s},n_{s})$ and $m=\sum_{i}m_{i}$ , $n=\sum_{i}n_{i}$ . We let $Z_{\varepsilon}$ denote $(1+\varepsilon A)^{m}(1+\varepsilon B)^{n}$ . We choose a vector $v_{\varepsilon}$ with $\|v_{\varepsilon}\|=1$ that maximizes $\|X_{\varepsilon}v_{\varepsilon}\|^{2}=\langle X_{\varepsilon}v_{\varepsilon},X_{\varepsilon}v_{\varepsilon}\rangle,$ and it suffices to show that $\|X_{\varepsilon}v_{\varepsilon}\|^{2}\leq\|Z_{\varepsilon}v_{\varepsilon}\|^{2}$ , as explained in (10). We will expand the unordered product $\|X_{\varepsilon}v_{\varepsilon}\|^{2}$ and the ordered product $\|Z_{\varepsilon}v_{\varepsilon}\|^{2}$ up to the third term, with respect to $\varepsilon$ . Lemma 1 will restrict the types of vectors we will have to study, Lemma 2 will give us a strict inequality in the third order terms, and the desired inequality will follow from that.

Precisely, in the above setting we will prove that there exist positive constants $C_{1},C_{2},\varepsilon_{0}$ depending on $A,B,m,n$ such that for all $\varepsilon<\varepsilon_{0}$ ,

[TABLE]

Here the implicit constant depends on $A,B,m,n$ . Consequently, for all sufficiently small $\varepsilon$ , the left-hand side is in fact strictly positive, and Theorem 2 follows.

A simple expansion shows that

[TABLE]

where

[TABLE]

and, for combinatorial coefficients $a_{i}\in\mathbb{N}$ depending only on the sequences of exponents $m_{i}$ and $n_{i}$ ,

[TABLE]

and

[TABLE]

Thus an expansion up to third order shows that for any $u$ ,

[TABLE]

and

[TABLE]

We now use $v_{\varepsilon}$ to denote the vector maximizing $\|X_{\varepsilon}u\|$ among all $\|u\|=1$ , and we aim to show the inequality (16) for

[TABLE]

using the above expansions (17) and (18).

We will first see that terms in this difference that are at most second order in $\varepsilon$ cancel exactly, in fact for any $u$ . Indeed, the term of order 0 in $\varepsilon$ , that is $\|u\|^{2}$ , cancels and, since $X_{1}=Z_{1}$ , so does the term of order $\varepsilon$ . Next, for the second order terms, for any vector $u$ ,

[TABLE]

The other terms of second order, $\left\langle Z_{1}u,Z_{1}u\right\rangle$ and $\left\langle X_{1}u,X_{1}u\right\rangle$ again coincide trivially (and hence cancel in the difference) because $X_{1}=mA+nB=Z_{1}$ . This shows that for any $u$ , the terms of at most second order (with respect to $\varepsilon$ ) cancel in the difference (19).

We now analyze the third order terms in the difference (19), which include terms of two types, namely

[TABLE]

and

[TABLE]

For the first type of term, we can use the fact that $\langle AABu,u\rangle=\langle ABu,Au\rangle=\langle Au,ABu\rangle=\langle BAAu,u\rangle$ to see the terms corresponding to $a_{8}$ vanish, and similarly for $a_{11}$ , so that

[TABLE]

Altogether, we obtain that for any $u$ , the third order contributions of the difference (19) are given by

[TABLE]

Now we specialize to considering $u=v_{\varepsilon}$ with $\|v_{\varepsilon}\|=1$ that maximizes $\|X_{\varepsilon}v_{\varepsilon}\|^{2}$ . We apply Lemma 1 to conclude that there is an $\varepsilon_{0}=\varepsilon_{0}(mA+nB)>0$ and a constant $C_{1}=C_{1}(mA+nB)>0$ such that for every $\varepsilon<\varepsilon_{0}$ , we can write

[TABLE]

where $v$ is the projection of $v_{\varepsilon}$ onto the eigenspace corresponding to the largest eigenvalue of $mA+nB$ , and $v,w$ have the following properties: $\|v\|\geq 1-C_{1}\varepsilon$ and $v$ is orthogonal to $w$ , so that $\|w\|\leq C_{1}\varepsilon$ . (We note that both $v$ and $w$ also depend on $\varepsilon$ but suppress this for simplicity of notation). In (20) we see that

[TABLE]

since the first term vanishes; thus this type of term contributes

[TABLE]

to (22). A similar expansion for the other terms (21) shows that

[TABLE]

with an implicit constant depending on $A,B,m,n$ . Now that we have restricted to an inner product involving only $v$ , we apply (21) for the vector $v$ and note that Lemma 2 implies that

[TABLE]

with the constant $C_{2}$ provided by the lemma. This is strictly positive for all $\varepsilon$ sufficiently small with respect to $C_{1}$ . To conclude, we have proved (16), and this completes the proof of Theorem 2.

Bibliography29

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] W. Albar, M. Junge and M. Zhao, On the symmetrized arithmetic-geometric mean inequality for operators, ar Xiv:1803.02435
2[2] T. Ando, F. Hiai and K. Okubo, Trace inequalities for multiple products of two matrices. Math. Inequal. Appl. 3 (2000), no. 3, 307–318.
3[3] E. Andruchow, G. Corach and D. Stojanoff, Geometrical significance of Löwner-Heinz inequality. Proc. Amer. Math. Soc. 128 (2000), no. 4, 1031–1037.
4[4] R. Bhatia, Matrix analysis. Graduate Texts in Mathematics, 169. Springer-Verlag, New York, 1997.
5[5] R. Bhatia, Positive definite matrices. Princeton Series in Applied Mathematics. Princeton University Press, Princeton, NJ, 2007.
6[6] G. Corach, H. Porta and L. Recht, An operator inequality. Linear Algebra Appl. 142 (1990), 153–158.
7[7] H. Cordes, A matrix inequality. Proc. Amer. Math. Soc. 11 (1960) 206–210.
8[8] H.O. Cordes, Spectral Theory of Linear Differential Operators and Comparison Algebras, London Mathematical Society Lecture Note Series, vol. 76, Cambridge University Press, Cambridge, 1987

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

On Matrix Rearrangement Inequalities

Abstract.

Key words and phrases:

2010 Mathematics Subject Classification:

1. Introduction

1.1. Introduction.

1.2. Known results.

1.3. Statement of results.

Theorem 1** (General Rearrangement Inequality for 2×22\times 22×2 Matrices).**

Theorem 2** (General Rearrangement close to the Identity, arbitrary dimension).**

2. Proof of Theorem 1

3. Proof of Theorem 2

3.1. Two Lemmata

Lemma 1**.**

Proof.

Lemma 2**.**

Proof.

3.2. Conclusion of the proof of Theorem 2

Theorem 1 (General Rearrangement Inequality for $2\times 2$ Matrices).

Theorem 2 (General Rearrangement close to the Identity, arbitrary dimension).

Lemma 1.

Lemma 2.