Matrix scaling and explicit doubly stochastic limits

Melvyn B. Nathanson

arXiv:1905.09426·math.RA·October 1, 2019

Matrix scaling and explicit doubly stochastic limits

Melvyn B. Nathanson

PDF

Open Access

TL;DR

This paper derives exact formulas for the Sinkhorn limits of specific symmetric positive 3x3 matrices, enhancing understanding of matrix scaling convergence to doubly stochastic matrices.

Contribution

It provides explicit formulas for Sinkhorn limits of certain symmetric 3x3 matrices, a novel contribution to matrix scaling theory.

Findings

01

Exact formulas for Sinkhorn limits of specific matrices

02

Enhanced understanding of convergence in matrix scaling

03

Explicit characterization of symmetric 3x3 cases

Abstract

The process of alternately row scaling and column scaling a positive $n \times n$ matrix $A$ converges to a doubly stochastic positive $n \times n$ matrix $S (A)$ , often called the \emph{Sinkhorn limit} of $A$ . The main result in this paper is the computation of exact formulae for the Sinkhorn limits of certain symmetric positive $3 \times 3$ matrices.

Equations285

rowsum_{i} (A) = j = 1 \sum n a_{i, j} .

rowsum_{i} (A) = j = 1 \sum n a_{i, j} .

colsum_{j} (A) = i = 1 \sum n a_{i, j} .

colsum_{j} (A) = i = 1 \sum n a_{i, j} .

X A Y = x_{1} a_{1, 1} y_{1} x_{2} a_{2, 1} y_{1} ⋮ x_{n} a_{n, 1} y_{1} x_{1} a_{1, 2} y_{2} x_{2} a_{2, 2} y_{2} x_{n} a_{n, 2} y_{2} x_{1} a_{1, 3} y_{3} x_{2} a_{2, 3} y_{3} x_{n} a_{n, 3} y_{3} \dots \dots \dots x_{1} a_{1, n} y_{n} x_{2} a_{2, n} y_{n} ⋮ x_{n} a_{n, n} y_{n} .

X A Y = x_{1} a_{1, 1} y_{1} x_{2} a_{2, 1} y_{1} ⋮ x_{n} a_{n, 1} y_{1} x_{1} a_{1, 2} y_{2} x_{2} a_{2, 2} y_{2} x_{n} a_{n, 2} y_{2} x_{1} a_{1, 3} y_{3} x_{2} a_{2, 3} y_{3} x_{n} a_{n, 3} y_{3} \dots \dots \dots x_{1} a_{1, n} y_{n} x_{2} a_{2, n} y_{n} ⋮ x_{n} a_{n, n} y_{n} .

X (A) = diag (\frac{1}{rowsum _{1} ( A )}, \dots, \frac{1}{rowsum _{n} ( A )})

X (A) = diag (\frac{1}{rowsum _{1} ( A )}, \dots, \frac{1}{rowsum _{n} ( A )})

R (A) = X (A) A .

R (A) = X (A) A .

R (A)_{i, j} = \frac{a _{i, j}}{rowsum _{i} ( A )}

R (A)_{i, j} = \frac{a _{i, j}}{rowsum _{i} ( A )}

rowsum_{i} (R (A)) = j = 1 \sum n R (A)_{i, j} = j = 1 \sum n \frac{a _{i, j}}{rowsum _{i} ( A )} = \frac{rowsum _{i} ( A )}{rowsum _{i} ( A )} = 1

rowsum_{i} (R (A)) = j = 1 \sum n R (A)_{i, j} = j = 1 \sum n \frac{a _{i, j}}{rowsum _{i} ( A )} = \frac{rowsum _{i} ( A )}{rowsum _{i} ( A )} = 1

Y (A) = diag (\frac{1}{colsum _{1} ( A )}, \dots, \frac{1}{colsum _{n} ( A )})

Y (A) = diag (\frac{1}{colsum _{1} ( A )}, \dots, \frac{1}{colsum _{n} ( A )})

C (A) = A Y (A),

C (A) = A Y (A),

C (A)_{i, j} = \frac{a _{i, j}}{colsum _{j} ( A )}

C (A)_{i, j} = \frac{a _{i, j}}{colsum _{j} ( A )}

colsum_{j} (C (A)) = i = 1 \sum n C (A)_{i, j} = i = 1 \sum n \frac{a _{i, j}}{colsum _{j} ( A )} = \frac{colsum _{j} ( A )}{colsum _{j} ( A )} = 1

colsum_{j} (C (A)) = i = 1 \sum n C (A)_{i, j} = i = 1 \sum n \frac{a _{i, j}}{colsum _{j} ( A )} = \frac{colsum _{j} ( A )}{colsum _{j} ( A )} = 1

A_{0} = A .

A_{0} = A .

X_{ℓ} = X (A_{ℓ})

X_{ℓ} = X (A_{ℓ})

A_{ℓ}^{'} = R (A_{ℓ}) = X_{ℓ} A_{ℓ} .

A_{ℓ}^{'} = R (A_{ℓ}) = X_{ℓ} A_{ℓ} .

Y_{ℓ} = Y (A_{ℓ}^{'})

Y_{ℓ} = Y (A_{ℓ}^{'})

A_{ℓ + 1} = C (A_{ℓ}^{'}) = A_{ℓ}^{'} Y_{ℓ} .

A_{ℓ + 1} = C (A_{ℓ}^{'}) = A_{ℓ}^{'} Y_{ℓ} .

S (A) = ℓ \to \infty lim A_{ℓ} = ℓ \to \infty lim A_{ℓ}^{'} .

S (A) = ℓ \to \infty lim A_{ℓ} = ℓ \to \infty lim A_{ℓ}^{'} .

B = λ P A Q .

B = λ P A Q .

S (B) = λ P S (A) Q .

S (B) = λ P S (A) Q .

A_{1} = K 11 1 K 1 11 K S (A_{1}) = a b b b a b b b a

A_{1} = K 11 1 K 1 11 K S (A_{1}) = a b b b a b b b a

A_{2} = K 11 111111 S (A_{2}) = a b b b c c b c c

A_{2} = K 11 111111 S (A_{2}) = a b b b c c b c c

A_{3} = 111 1 K K 1 K K S (A_{3}) = a b b b c c b c c

A_{3} = 111 1 K K 1 K K S (A_{3}) = a b b b c c b c c

A_{4} = 1 K K K 11 K 11 S (A_{4}) = a b b b c c b c c

A_{4} = 1 K K K 11 K 11 S (A_{4}) = a b b b c c b c c

A_{5} = K 11 1 K 1 111 S (A_{5}) = a b c b a c c c d

A_{5} = K 11 1 K 1 111 S (A_{5}) = a b c b a c c c d

A_{6} = K K 1 K 11 111 S (A_{6}) = a b c b c a c a b

A_{6} = K K 1 K 11 111 S (A_{6}) = a b c b c a c a b

A_{7} = K K 1 K 11 11 K S (A_{7}) = a b c b d e c e f

A_{7} = K K 1 K 11 11 K S (A_{7}) = a b c b d e c e f

n = k + ℓ .

n = k + ℓ .

A = M M ⋮ M B B ⋮ B M M M B B B \dots \dots \dots \dots \dots \dots M M ⋮ M B B ⋮ B B B ⋮ B N N ⋮ N B B B N N N \dots \dots \dots \dots \dots \dots B B ⋮ B N N ⋮ N

A = M M ⋮ M B B ⋮ B M M M B B B \dots \dots \dots \dots \dots \dots M M ⋮ M B B ⋮ B B B ⋮ B N N ⋮ N B B B N N N \dots \dots \dots \dots \dots \dots B B ⋮ B N N ⋮ N

(k M, M, \dots, M, ℓ B, B, \dots, B)

(k M, M, \dots, M, ℓ B, B, \dots, B)

(k B, B, \dots, B, ℓ N, N, \dots, N) .

(k B, B, \dots, B, ℓ N, N, \dots, N) .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMarkov Chains and Monte Carlo Methods · Random Matrices and Applications · Point processes and geometric inequalities

Full text

Matrix scaling and explicit doubly stochastic limits

Melvyn B. Nathanson

Department of Mathematics

Lehman College (CUNY)

Bronx, NY 10468

[email protected]

Abstract.

The process of alternately row scaling and column scaling a positive $n\times n$ matrix $A$ converges to a doubly stochastic positive $n\times n$ matrix $S(A)$ , often called the Sinkhorn limit of $A$ . The main result in this paper is the computation of exact formulae for the Sinkhorn limits of certain symmetric positive $3\times 3$ matrices.

Key words and phrases:

Matrix scaling, iterative scaling, Sinkhorn limits, Gröbner bases.

2010 Mathematics Subject Classification:

11C20, 11B75, 11J68, 11J70.

Supported in part by a grant from the PSC-CUNY Research Award Program.

1. Doubly stochastic matrices and scaling

Let $A=(a_{i,j})$ be an $n\times n$ matrix. For $i\in\{1,\ldots,n\}$ , the $i$ th row sum of $A$ is

[TABLE]

For $j\in\{1,\ldots,n\}$ , the $j$ th column sum of $A$ is

[TABLE]

The matrix $A=(a_{i,j})$ is positive if $a_{i,j}>0$ for all $i$ and $j$ , and nonnegative if $a_{i,j}\geq 0$ for all $i$ and $j$ . The matrix $A=(a_{i,j})$ is row stochastic if $A$ is nonnegative and $\operatorname{\text{rowsum}}_{i}(A)=1$ for all $i\in\{1,\ldots,n\}$ . The matrix $A$ is column stochastic if $A$ is nonnegative and $\operatorname{\text{colsum}}_{j}(A)=1$ for all $j\in\{1,\ldots,n\}$ . The matrix $A$ is doubly stochastic if it is both row and column stochastic.

Let $\operatorname{\text{diag}}(x_{1},\ldots,x_{n})$ denote the $n\times n$ diagonal matrix whose $(i,i)$ th coordinate is $x_{i}$ for all $i\in\{1,2,\ldots,n\}$ . The matrix $\operatorname{\text{diag}}(x_{1},x_{2},\ldots,x_{n})$ is positive diagonal if $x_{i}>0$ for all $i$ .

Let $A=(a_{i,j})$ be an $n\times n$ matrix. The process of multiplying the rows of $A$ by scalars, or, equivalently, multiplying $A$ on the left by a diagonal matrix $X$ , is called row-scaling, and $X$ is called a row-scaling matrix.

The process of multiplying the columns of $A$ by scalars, or, equivalently, multiplying $A$ on the right by a diagonal matrix $Y$ , is called column-scaling, and $Y$ is called a column-scaling matrix.

If $X=\operatorname{\text{diag}}(x_{1},x_{2},\ldots,x_{n})$ and $Y=\operatorname{\text{diag}}(y_{1},y_{2},\ldots,y_{n})$ , then

[TABLE]

Let $A=(a_{i,j})$ be an $n\times n$ matrix with positive row sums, that is, $\operatorname{\text{rowsum}}_{i}(A)>0$ for all $i\in\{1,\ldots,n\}$ . Let

[TABLE]

and let

[TABLE]

We have

[TABLE]

and so

[TABLE]

for all $i\in\{1,\ldots,n\}$ . Therefore, $\mathcal{R}(A)$ is a row stochastic matrix.

Similarly, if $A=(a_{i,j})$ is an $n\times n$ matrix with positive column sums and if

[TABLE]

and

[TABLE]

then

[TABLE]

and

[TABLE]

for all $j\in\{1,\ldots,n\}$ . Therefore, $\mathcal{C}(A)$ is a column stochastic matrix.

The following two theorems were stated by Sinkhorn [20], and subsequently proved by Brualdi, Parter, and Schneider [2], Djoković [3], Knopp-Sinkhorn [21], Menon [17], Letac [15], and Tverberg [22].

Theorem 1.

Let $A=(a_{i,j})$ be a positive $n\times n$ matrix.

(i)

There exist positive diagonal $n\times n$ matrices $X$ and $Y$ such that $XAY$ is doubly stochastic. 2. (ii)

If $X$ , $X^{\prime}$ , $Y$ , and $Y^{\prime}$ are positive diagonal $n\times n$ matrices such that both $XAY$ and $X^{\prime}AY^{\prime}$ are doubly stochastic, then $XAY=X^{\prime}AY^{\prime}$ and there exists $\lambda>0$ such that $X^{\prime}=\lambda X$ and $Y^{\prime}=\lambda^{-1}Y$ . 3. (iii)

Let A be a positive symmetric $n\times n$ matrix. There exists a unique positive diagonal matrix X such that $XAX$ is doubly stochastic.

The unique doubly stochastic matrix $XAY$ in Theorem 1 is called the Sinkhorn limit of A, and denoted $S(A)$ .

Theorem 2.

Let A be a positive $n\times n$ matrix, and let $S(A)$ be the Sinkhorn limit of $A$ . Construct sequences of positive matrices $(A_{\ell})_{\ell=0}^{\infty}$ and $(A^{\prime}_{\ell})_{\ell=0}^{\infty}$ and sequences of positive diagonal matrices $(X_{\ell})_{\ell=0}^{\infty}$ and $(Y_{\ell})_{\ell=0}^{\infty}$ as follows: Let

[TABLE]

Given the matrix $A_{\ell}$ , let

[TABLE]

be the row-scaling matrix of $A_{\ell}$ defined by (1). The matrix

[TABLE]

is row stochastic. Let

[TABLE]

be the column-scaling matrix of $A^{\prime}_{\ell}$ defined by (2), and let

[TABLE]

The matrix $A_{\ell+1}$ is column stochastic.

The Sinkhorn limit is obtained by alternately row-scaling and column-scaling:

[TABLE]

It is an open problem to compute explicitly the Sinkhorn limit of a positive $n\times n$ matrix. This is known for $2\times 2$ matrices (Nathanson [18]). The goal of this paper is the explicit computation of Sinkhorn limits for certain $3\times 3$ matrices.

2. Sinkhorn limits of $3\times 3$ symmetric matrices and

their doubly stochastic shapes

Let $A$ and $B$ be positive $n\times n$ matrices. We write $A\sim B$ if there exist $n\times n$ permutation matrices $P$ and $Q$ and $\lambda>0$ such that

[TABLE]

This is an equivalence relation. Moreover, $A\sim B$ implies

[TABLE]

Thus, it suffices to determine the Sinkhorn limit of only one matrix in an equivalence class.

We shall compute the Sinkhorn limit of every symmetric positive $3\times 3$ matrix whose set of coordinates consists of two distinct real numbers.

Let $A$ be such a matrix with coordinates $M$ and $N$ with $M\neq N$ . There are 9 coordinate positions in the matrix, and so exactly one of the numbers $M$ and $N$ occurs at least five times. Suppose that the coordinate $M$ occurs five or more times. Let $\lambda=1/M$ and $K=N/M$ . The matrix $\lambda A$ has two distinct positive coordinates $1$ and $K$ , and $K$ occurs at most four times. There are seven equivalence classes of such matrices with respect to permutations and dilations. The main result of this paper is the calculation of the Sinkhorn limits of these matrices.

Theorem 3.

Let $K>0$ and $K\neq 1$ . The matrices $A_{1},\ldots,A_{7}$ below are a complete set of representatives of the seven equivalence classes of symmetric $3\times 3$ matrices with coordinates 1 and $K$ . The matrix $S(A_{i})$ gives the shape of the Sinkhorn limit of $A_{i}$ for $i=1,\ldots,7$ . The coordinates of the Sinkhorn limits as explicit functions of 1 and $K$ are computed in Sections 4–8.

(1)

[TABLE] 2. (2)

[TABLE] 3. (3)

[TABLE] 4. (4)

[TABLE] 5. (5)

[TABLE] 6. (6)

[TABLE] 7. (7)

[TABLE]

3. The $MBN$ matrix

Let $k$ , $\ell$ , and $n$ be positive integers such that

[TABLE]

Let $M$ , $B$ , and $N$ be positive real numbers. Consider the $n\times n$ symmetric matrix

[TABLE]

in which the first $k$ rows are equal to

[TABLE]

and the last $\ell$ rows are equal to

[TABLE]

Let $X=\operatorname{\text{diag}}(x_{1},x_{2},x_{3},\ldots,x_{n})$ be the unique positive $n\times n$ diagonal matrix such that the alternate scaling limit $S(A)=XAX$ is doubly stochastic. Thus, the matrix

[TABLE]

satisfies

[TABLE]

and

[TABLE]

It follows that $x_{i}=x_{1}$ for $i=1,2,\ldots k$ and $x_{i}=x_{n}$ for $i=k+1,k+2,\ldots k+\ell$ . Let $x_{1}=x$ and $x_{k+1}=y$ . Define the diagonal matrix

[TABLE]

We obtain

[TABLE]

where

[TABLE]

Because $S(A)$ is row stochastic, we have

[TABLE]

and

[TABLE]

Equation (11) gives

[TABLE]

Inserting this into equation (12) and rearranging gives

[TABLE]

If $MN-B^{2}=0$ , then

[TABLE]

and $Mx^{2}=a=b=c=1/n$ . Thus, $S(A)$ is the $n\times n$ doubly stochastic matrix with every coordinate equal to $1/n$ .

If $MN-B^{2}\neq 0$ , then (13) is a quadratic equation in $x^{2}$ . Let

[TABLE]

We obtain

[TABLE]

and

[TABLE]

Recall that $ka+\ell b=1$ and so

[TABLE]

If $MN>B^{2}$ , then $L>1$ and

[TABLE]

The inequality $a<1/k$ implies that

[TABLE]

If $MN<B^{2}$ , then $0<L<1$ and

[TABLE]

Because

[TABLE]

the inequality $a>0$ implies (14).

We have proved the following.

Theorem 4.

The Sinkhorn limit of the $MBN$ matrix (6) is a doubly stochastic matrix $S(A)$ with shape (7). If $L=MN/B^{2}=1$ , then $a=b=c=1/n$ . If $L\neq 1$ , then equations (14), (9), and (10) define the coordinates $a$ , $b$ , and $c$ . The matrix $S(A)$ depends only on the ratio $MN/B^{2}$ .

For example, the matrices

[TABLE]

have the same Sinkhorn limit with

[TABLE]

Let $\left(A^{(r)}\right)_{r=1}^{\infty}$ be a sequence of $MBN$ matrices such that $\lim_{r\rightarrow\infty}MN/B^{2}=\infty$ . Let

[TABLE]

We have

[TABLE]

and

[TABLE]

Similarly, let $\left(A^{(r)}\right)_{r=1}^{\infty}$ be a sequence of $MBN$ matrices such that $\lim_{r\rightarrow\infty}MN/B^{2}=0$ . It follows from (8) that

[TABLE]

If $k\leq\ell$ , then

[TABLE]

If $k>\ell$ , then

[TABLE]

4. The matrix $A_{1}$

The matrix

[TABLE]

is the simplest. Just one row scaling or one column scaling produces the doubly stochastic matrix

[TABLE]

We have $S(A_{1})=XA_{1}X$ , where

[TABLE]

We have the asymptotic limits

[TABLE]

5. The matrices $A_{2}$ , $A_{3}$ , and $A_{4}$

These are $MBN$ matrices. The matrix

[TABLE]

is an $MBN$ matrix with $k=1$ , $\ell=2$ , $M=K$ , $B=N=1$ , and $L=K$ .

The matrix

[TABLE]

is an $MBN$ matrix with $k=1$ , $\ell=2$ , $M=B=1$ , $N=K$ , and $L=K$ . Both matrices satisfy $L=MN/B^{2}=K\neq 1$ , and so they have the same Sinkhorn limit

[TABLE]

with

[TABLE]

We have the asymptotic limits

[TABLE]

The matrix

[TABLE]

is an $MBN$ matrix with $k=1$ , $\ell=2$ , $M=N=1$ , and $B=K$ . We have $L=MN/B^{2}=1/K^{2}\neq 0$ , and the Sinkhorn limit

[TABLE]

with

[TABLE]

We have the asymptotic limits

[TABLE]

6. The matrix $A_{5}$

The construction of the Sinkhorn limit of the $3\times 3$ matrix

[TABLE]

requires only high school algebra. There exists a unique positive diagonal matrix $X=\operatorname{\text{diag}}(x,y,z)$ such that $XA_{5}X$ is doubly stochastic and positive. We have

[TABLE]

and so

[TABLE]

We have

[TABLE]

Rearranging, we obtain

[TABLE]

Note that $0<xy<1$ . If $K>1$ , then $(K-1)xy+1>1$ . If $0<K<1$ , then

[TABLE]

and $(K-1)xy+1>0$ . Therefore, $x=y$ , and so

[TABLE]

We obtain

[TABLE]

Applying (19) and eliminating $xz$ from (20) and (21) gives

[TABLE]

Therefore,

[TABLE]

and so

[TABLE]

The inequality $Kx^{2}<1$ implies

[TABLE]

and

[TABLE]

Thus, the Sinkhorn limit has the shape

[TABLE]

where

[TABLE]

We have the asymptotic limits

[TABLE]

7. The matrix $A_{6}$

The construction of the Sinkhorn limit of the $3\times 3$ matrix

[TABLE]

also requires only high school algebra. There exists a unique positive diagonal matrix $X=\operatorname{\text{diag}}(x,y,z)$ such that

[TABLE]

is a doubly stochastic matrix, and so

[TABLE]

From (23) and (24) we obtain

[TABLE]

and so

[TABLE]

and

[TABLE]

Inserting (26) and (27) into (25) and simplifying, we obtain

[TABLE]

and so

[TABLE]

and

[TABLE]

Inserting this into (26) gives

[TABLE]

and then (27) gives

[TABLE]

This determines the scaling matrix X. The Sinkhorn limit is the circulant matrix

[TABLE]

with

[TABLE]

The asymptotic limits are

[TABLE]

8. The matrix $A_{7}$

Consider the symmetric $3\times 3$ matrix

[TABLE]

There exists a unique positive diagonal matrix $X=\operatorname{\text{diag}}(x,y,z)$ such that

[TABLE]

is doubly stochastic. Therefore,

[TABLE]

Because equations (28) and (23) are identical, and equations (29) and (24) are identical, we obtain (26) and (27). Inserting these formulae for $x$ and $z$ into (30) gives the octic polynomial

[TABLE]

By Theorem 1, this polynomial has at least one solution $y\in(0,1)$ . If $K>1$ , then, by Descartes’s rule of signs, this polynomial has exactly two positive solutions. If $0<K<1$ , then this polynomial has one or three positive solutions. For matrices of the form $A_{7}$ , we do not have explicit formulae for the coordinates of the Sinkhorn limit as functions of $K$ . Computer calculations suggest that the asymptotic limits of $S(A_{7})$ as $K\rightarrow\infty$ and $K\rightarrow 0$ are

[TABLE]

9. Gröbner bases and algebraic numbers

I like solving problems using high school algebra. However, it is important to note that the previous calculations are also easily done using Gröbner bases.

For every $n\times n$ matrix $A=(a_{i,j})$ and diagonal matrix $X=\operatorname{\text{diag}}(x_{1},\ldots,x_{n})$ , we have the matrix

[TABLE]

If $A$ is positive and symmetric, then, by Theorems 1 and 2, the $n$ quadratic equations

[TABLE]

have a unique positive solution, and the diagonal matrix $X=\operatorname{\text{diag}}(x_{1},\ldots,x_{n})$ is the unique scaling matrix in the Sinkhorn limit $S(A)=XAX$ . Equivalently, $(x_{1},\ldots,x_{n})$ is the unique positive vector in the affine variety of the ideal in $\mathbf{R}[x_{1},\ldots,x_{n}]$ generated by the set of polynomials $\{q_{1},\ldots,q_{n}\}$ . For each lexicographical ordering of the variables $x_{1},\ldots,x_{n}$ , Maple (and other computer algebra programs) can compute a Gröbner basis for the ideal. The Gröbner basis for this ideal shows that if the coordinates of the matrix $A=(a_{i,j})$ are rational numbers, then $x_{1},\ldots,x_{n}$ are algebraic numbers of degrees bounded in terms of $n$ .

Here is an example. Let $n=3$ and $X=\operatorname{\text{diag}}(x,y,z)$ . Consider the matrices

[TABLE]

with $K>0$ and $K\neq 1$ . There exist unique positive real numbers $x,y,z$ that satisfy the quadratic equations

[TABLE]

Equivalently, $(x,y,z)$ is the unique positive vector in the affine variety $V(I)$ , where $I$ is the ideal in $\mathbf{R}[x,y,z]$ generated by the polynomials

[TABLE]

Let $K=2$ . Using the Groebner package in Maple with the lexicographical order $(x,y,z)$ , we obtain the Gröbner basis

[TABLE]

Applying Maple with the lexicographical order $(y,z,x)$ , we obtain the Gröbner basis

[TABLE]

Applying Maple with the lexicographical order $(z,x,y)$ , we obtain the Gröbner basis

[TABLE]

Thus, $x^{2}$ , $y^{2}$ , and $z^{2}$ are algebraic numbers of degree at most 4, and we have explicit polynomial representations of each variable $x$ , $y$ , $z$ in terms of the other two variables.

For arbitrary $K$ , applying Maple with the lexicographical order $(y,z,x)$ , we obtain the Gröbner basis

[TABLE]

For each of the 8 roots of $h_{1}(y)$ , the polynomials $h_{2}(z,y)$ and $h_{3}(x,y)$ determine unique numbers $x$ and $z$ . Exactly one of the triples $(x,y,z)$ will be positive.

10. Rationality and finite length

For what positive $n\times n$ matrices does the alternate scaling algorithm converge in finitely many steps? This problem has been solved for $2\times 2$ matrices (Nathanson [18]), but it is open for all dimensions $n\geq 3$ . In dimension 3, matrices equivalent to $A_{1}$ become doubly stochastic in one step, that is, after one row or one column scaling. Ekhad and Zeilberger [5] computed a positive $3\times 3$ matrix that becomes doubly stochastic in exactly two steps, and Nathanson [19] generalized this construction. It is not know if there exists a positive $3\times 3$ matrix that becomes doubly stochastic in exactly $s$ steps for some $s\geq 3$ .

Consider the matrix $A_{2}=\left(\begin{matrix}K&1&1\\ 1&1&1\\ 1&1&1\end{matrix}\right)$ with parameter $K$ . If $K$ is a rational number, then every matrix generated by iterated row and column scalings has rational coordinates. If the Sinkhorn limit contains an irrational coordinate, then the alternate scaling algorithm cannot terminate in finitely many steps.

Let $K$ be an integer, $K\geq 2$ . In Section 5 we proved that the Sinkhorn limit $S(A_{2})$ has coordinates in the quadratic field $\mathbf{Q}(\sqrt{8K+1})$ . For example, from (15), the $(1,1)$ coordinate of $S(A_{2})$ is

[TABLE]

This number is rational if and only if the odd integer $8K+1$ is the square of an odd integer, that is, if and only if $8K+1=(2r+1)^{2}$ for some positive integer $r$ and so $K=r(r+1)/2$ is a triangular number. From (15), (16), and (17), we obtain

[TABLE]

Moreover, $S(A_{2})=XA_{2}X$ , where $X=\operatorname{\text{diag}}(x,y,y)$ with $Kx^{2}=a$ and $y^{2}=c$ . Thus,

[TABLE]

For example, if $K=3$ , then $r=2$ and

[TABLE]

where

[TABLE]

Note that $A_{2}$ also has a scaling by rational matrices

[TABLE]

where

[TABLE]

It is not known if there exists a triangular number $K$ for which the alternate scaling algorithm terminates in a finite number of steps.

11. Open problems

(1)

Compute explicit formulas for the Sinkhorn limits of matrices of the form $A_{7}$ . More generally, compute explicit formulas for the Sinkhorn limits of all positive symmetric $3\times 3$ matrices. This is a central problem. 2. (2)

Here is a special case. Let $K,L,M$ and 1 be pairwise distinct positive numbers. Compute the Sinkhorn limits of the matrices

[TABLE] 3. (3)

For what positive $n\times n$ matrices does the alternate scaling algorithm converge in finitely many steps? This is the problem discussed in Section 10. 4. (4)

It is not known what algebraic numbers appear as coordinates of Sinkhorn limits of matrices with positive integral coordinates. It would be interesting to have an example of an algebraic number in the unit interval that is not a coordinate of the Sinkhorn limit of a positive integral matrix. 5. (5)

Does every possible shape of a doubly stochastic $3\times 3$ matrix $A$ appear as the nontrivial Sinkhorn limit of some $3\times 3$ matrix? 6. (6)

Why does the shape of the Sinkhorn limit $S(A)$ seem to depend only on the shape of the matrix $A$ and not on the numerical values of the coordinates of $A$ ? 7. (7)

Let A be a nonnegative $m\times n$ matrix. Let $\mathbf{r}=(r_{1},r_{2},\ldots,r_{m})\in\mathbf{R}^{m}$ and let $\mathbf{c}=(c_{1},c_{2},\ldots,c_{n})\in\mathbf{R}^{n}$ . The matrix A is $\mathbf{r}$ -row stochastic if $\operatorname{\text{rowsum}}_{i}(A)=r_{i}$ for all $i\in\{1,2,\ldots,m\}$ . The matrix A is $\mathbf{c}$ -column stochastic if $\operatorname{\text{colsum}}_{j}(A)=c_{j}$ for all $j\in\{1,2,\ldots,n\}$ . The matrix $A$ is $(\mathbf{r},\mathbf{c})$ -stochastic if it is both $\mathbf{r}$ -row stochastic and $\mathbf{c}$ -column stochastic.

Let A be a positive matrix. Let $X$ be the $m\times m$ diagonal matrix whose $i$ th coordinate is $r_{i}/\operatorname{\text{rowsum}}_{i}(A)$ , and let $Y$ be the $n\times n$ diagonal matrix whose $j$ th coordinate is $c_{j}/\operatorname{\text{colsum}}_{j}(A)$ . The matrix $XA$ is $\mathbf{r}$ -row stochastic and the matrix $AY$ is $\mathbf{c}$ -column stochastic. A simple modification of the alternate scaling algorithm produces an $(\mathbf{r},\mathbf{c})$ -stochastic Sinkhorn limit. It is an open problem to compute explicit Sinkhorn limits in the $(\mathbf{r},\mathbf{c})$ -stochastic setting. 8. (8)

It is a old problem in number theory to understand the continued fractions of the cube roots of integers, and, in particular, to understand the approximation of $\sqrt[3]{2}$ by rationals. One coordinate of the Sinkhorn limit of the matrix $A_{6}$ with $K=2$ is $\sqrt[3]{2}-1$ . The matrix $A_{6}$ with $K=2$ has rational coordinates, and so the matrices constructed by the alternate scaling algorithm also have rational coordinates, and generate explicit sequences of rational approximations to $\sqrt[3]{2}$ . The nature of these approximations remains mysterious.

12. Notes

The computational complexity of Sinkhorn’s alternate scaling algorithm is investigated in Kalantari and Khachiyan [12, 13], Kalantari, Lari, Ricca, and Simeone [14], Linial, Samorodnitsky and Wigderson [16] and Allen-Zhu, Li, Oliveira, and Wigderson [1]. An extension of matrix scaling to operator scaling began with Gurvits [8], and is developed in Garg, Gurvits, Oliveira, and Wigderson [6, 7], Gurvits [9], and Gurvits and Samorodnitsky [10]. Motivating some of this recent work are the classical papers of Edmonds [4] and Valient [23, 24].

The literature on matrix scaling is vast. See the recent survey paper of Idel [11]. For the early history of matrix scaling, see Allen-Zhu, Li, Oliveira, and Wigderson [1, Section 1.1].

Acknowledgements. The alternate scaling algorithm was discussed in several lectures in the New York Number Theory Seminar, and I thank the participants for their useful remarks. In particular, I thank David Newman for making the initial computations that suggested some of the problems considered in this paper. I also benefitted from a careful and thoughtful referee’s report.

Bibliography24

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Z. Allen-Zhu, Y. Li, R. Oliveira, and A. Wigderson, Much faster algorithms for matrix scaling , 58th Annual IEEE Symposium on Foundations of Computer Science—FOCS 2017, IEEE Computer Soc., Los Alamitos, CA, 2017, pp. 890–901.
2[2] R.A. Brualdi, S. V. Parter, and H. Schneider, The diagonal equivalence of a nonnegative matrix to a stochastic matrix , J. Math. Anal. Appl. 16 (1966), 31–50.
3[3] D. Ž. Djoković, Note on nonnegative matrices , Proc. Amer. Math. Soc. 25 (1970), 80–82.
4[4] J. Edmonds, Systems of distinct representatives and linear algebra , J. Res. Nat. Bur. Standards Sect. B 71B (1967), 241–245.
5[5] S. B. Ekhad and D. Zeilberger, Answers to some questions about explicit Sinkhorn limits posed by Mel Nathanson , ar Xiv:1902.10783, 2019.
6[6] A. Garg, L. Gurvits, R. Oliveira, and A. Wigderson, A deterministic polynomial time algorithm for non-commutative rational identity testing , 57th Annual IEEE Symposium on Foundations of Computer Science—FOCS 2016, IEEE Computer Soc., Los Alamitos, CA, 2016, pp. 109–117.
7[7] by same author, Algorithmic and optimization aspects of Brascamp-Lieb inequalities, via operator scaling , STOC’17—Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, ACM, New York, 2017, pp. 397–409.
8[8] L. Gurvits, Classical complexity and quantum entanglement , J. Comput. System Sci. 69 (2004), no. 3, 448–484.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Matrix scaling and explicit doubly stochastic limits

Abstract.

Key words and phrases:

2010 Mathematics Subject Classification:

1. Doubly stochastic matrices and scaling

Theorem 1**.**

Theorem 2**.**

2. Sinkhorn limits of 3×33\times 33×3 symmetric matrices and

Theorem 3**.**

3. The MBNMBNMBN matrix

Theorem 4**.**

4. The matrix A1A_{1}A1​

5. The matrices A2A_{2}A2​, A3A_{3}A3​, and A4A_{4}A4​

6. The matrix A5A_{5}A5​

7. The matrix A6A_{6}A6​

8. The matrix A7A_{7}A7​

9. Gröbner bases and algebraic numbers

10. Rationality and finite length

11. Open problems

12. Notes

Theorem 1.

Theorem 2.

2. Sinkhorn limits of $3\times 3$ symmetric matrices and

Theorem 3.

3. The $MBN$ matrix

Theorem 4.

4. The matrix $A_{1}$

5. The matrices $A_{2}$ , $A_{3}$ , and $A_{4}$

6. The matrix $A_{5}$

7. The matrix $A_{6}$

8. The matrix $A_{7}$