Matrix scaling, explicit Sinkhorn limits, and arithmetic

Melvyn B. Nathanson

arXiv:1902.04544·math.NT·February 13, 2019

Matrix scaling, explicit Sinkhorn limits, and arithmetic

Melvyn B. Nathanson

PDF

Open Access

TL;DR

This paper explores the convergence of matrix scaling to doubly stochastic matrices, providing explicit formulas for certain symmetric 3x3 matrices and connecting the results to diophantine approximation.

Contribution

It offers explicit formulas for Sinkhorn limits of specific symmetric 3x3 matrices and links matrix scaling to diophantine approximation problems.

Findings

01

Explicit formulas for Sinkhorn limits of symmetric 3x3 matrices.

02

Connections established between matrix scaling and diophantine approximation.

03

Analysis of convergence properties in matrix scaling processes.

Abstract

The process of alternately row scaling and column scaling a positive $n \times n$ matrix $A$ converges to a doubly stochastic positive $n \times n$ matrix $S (A)$ , called the \emph{Sinkhorn limit} of $A$ . Exact formulae for the Sinkhorn limits of certain symmetric positive $3 \times 3$ matrices are computed, and related problems in diophantine approximation are considered.

Equations525

row_{i} (A) = j = 1 \sum n a_{i, j} .

row_{i} (A) = j = 1 \sum n a_{i, j} .

col_{j} (A) = i = 1 \sum m a_{i, j} .

col_{j} (A) = i = 1 \sum m a_{i, j} .

1 - 1 1 - 1 4 - 2 1 - 2 2 and 2 - 9 8 - 5 7 - 1 43 - 6

1 - 1 1 - 1 4 - 2 1 - 2 2 and 2 - 9 8 - 5 7 - 1 43 - 6

X A Y = x_{1} a_{1, 1} y_{1} x_{2} a_{2, 1} y_{1} ⋮ x_{m} a_{m, 1} y_{1} x_{1} a_{1, 2} y_{2} x_{2} a_{2, 2} y_{2} x_{m} a_{m, 2} y_{2} x_{1} a_{1, 3} y_{3} x_{2} a_{2, 3} y_{3} x_{m} a_{m, 3} y_{3} \dots \dots \dots x_{1} a_{1, n} y_{n} x_{2} a_{2, n} y_{n} ⋮ x_{m} a_{m, n} y_{n} .

X A Y = x_{1} a_{1, 1} y_{1} x_{2} a_{2, 1} y_{1} ⋮ x_{m} a_{m, 1} y_{1} x_{1} a_{1, 2} y_{2} x_{2} a_{2, 2} y_{2} x_{m} a_{m, 2} y_{2} x_{1} a_{1, 3} y_{3} x_{2} a_{2, 3} y_{3} x_{m} a_{m, 3} y_{3} \dots \dots \dots x_{1} a_{1, n} y_{n} x_{2} a_{2, n} y_{n} ⋮ x_{m} a_{m, n} y_{n} .

1/3 1/3 1/3 1/3 1/3 1/3 1/3 1/3 1/3 and 1/2 1/6 1/3 1/3 1/2 1/6 1/6 1/3 1/2,

1/3 1/3 1/3 1/3 1/3 1/3 1/3 1/3 1/3 and 1/2 1/6 1/3 1/3 1/2 1/6 1/6 1/3 1/2,

m = i = 1 \sum m row_{i} (A) = i = 1 \sum m j = 1 \sum n a_{i, j} = j = 1 \sum n i = 1 \sum m a_{i, j} = j = 1 \sum n col_{j} (A) = n

m = i = 1 \sum m row_{i} (A) = i = 1 \sum m j = 1 \sum n a_{i, j} = j = 1 \sum n i = 1 \sum m a_{i, j} = j = 1 \sum n col_{j} (A) = n

R (A) = X (A) A .

R (A) = X (A) A .

R (A)_{i, j} = \frac{a _{i, j}}{row _{i} ( A )}

R (A)_{i, j} = \frac{a _{i, j}}{row _{i} ( A )}

row_{i} (R (A)) = j = 1 \sum n R (A)_{i, j} = j = 1 \sum n \frac{a _{i, j}}{row _{i} ( A )} = \frac{row _{i} ( A )}{row _{i} ( A )} = 1

row_{i} (R (A)) = j = 1 \sum n R (A)_{i, j} = j = 1 \sum n \frac{a _{i, j}}{row _{i} ( A )} = \frac{row _{i} ( A )}{row _{i} ( A )} = 1

C (A) = A Y (A) .

C (A) = A Y (A) .

C (A)_{i, j} = \frac{a _{i, j}}{col _{j} ( A )}

C (A)_{i, j} = \frac{a _{i, j}}{col _{j} ( A )}

col_{j} (C (A)) = j = 1 \sum n C (A)_{i, j} = i = 1 \sum m \frac{a _{i, j}}{col _{j} ( A )} = \frac{col _{j} ( A )}{col _{j} ( A )} = 1

col_{j} (C (A)) = j = 1 \sum n C (A)_{i, j} = i = 1 \sum m \frac{a _{i, j}}{col _{j} ( A )} = \frac{col _{j} ( A )}{col _{j} ( A )} = 1

A = (142536)

A = (142536)

R (A) = X (A) A

R (A) = X (A) A

C (A) = A Y (A)

C (A) = A Y (A)

Ω = R_{> 0}^{n} \times S_{n} \times R_{> 0}^{n - 1}

Ω = R_{> 0}^{n} \times S_{n} \times R_{> 0}^{n - 1}

x_{1} ⋮ x_{n - 1} x_{n}, S, y_{1} ⋮ y_{n - 1} 1 \mapsto diag (x_{1}, \dots, x_{n - 1}, x_{n}) S diag (y_{1}, \dots, y_{n - 1}, 1)

x_{1} ⋮ x_{n - 1} x_{n}, S, y_{1} ⋮ y_{n - 1} 1 \mapsto diag (x_{1}, \dots, x_{n - 1}, x_{n}) S diag (y_{1}, \dots, y_{n - 1}, 1)

A_{0} = A .

A_{0} = A .

X_{ℓ} = X (A_{ℓ}) = diag (\frac{1}{row _{1} ( A _{ℓ} )}, \frac{1}{row _{2} ( A _{ℓ} )}, \dots, \frac{1}{row _{n} ( A _{ℓ} )})

X_{ℓ} = X (A_{ℓ}) = diag (\frac{1}{row _{1} ( A _{ℓ} )}, \frac{1}{row _{2} ( A _{ℓ} )}, \dots, \frac{1}{row _{n} ( A _{ℓ} )})

A_{ℓ}^{'} = X_{ℓ} A_{ℓ} .

A_{ℓ}^{'} = X_{ℓ} A_{ℓ} .

Y_{ℓ} = Y (A_{ℓ}^{'}) = diag (\frac{1}{col _{1} ( A )}, \frac{1}{col _{2} ( A )}, \dots, \frac{1}{col _{n} ( A )})

Y_{ℓ} = Y (A_{ℓ}^{'}) = diag (\frac{1}{col _{1} ( A )}, \frac{1}{col _{2} ( A )}, \dots, \frac{1}{col _{n} ( A )})

A_{ℓ + 1} = A_{ℓ}^{'} Y_{ℓ} .

A_{ℓ + 1} = A_{ℓ}^{'} Y_{ℓ} .

ℓ \to \infty lim X_{ℓ} = X, ℓ \to \infty lim Y_{ℓ} = Y

ℓ \to \infty lim X_{ℓ} = X, ℓ \to \infty lim Y_{ℓ} = Y

S (A) = X A Y = ℓ \to \infty lim A_{ℓ} = ℓ \to \infty lim A_{ℓ}^{'}

S (A) = X A Y = ℓ \to \infty lim A_{ℓ} = ℓ \to \infty lim A_{ℓ}^{'}

211111111 \to 0.4384471874 0.2807764064 0.2807764064 0.2807764064 0.3596117968 0.3596117968 0.2807764064 0.3596117968 0.3596117968

211111111 \to 0.4384471874 0.2807764064 0.2807764064 0.2807764064 0.3596117968 0.3596117968 0.2807764064 0.3596117968 0.3596117968

111122122 \to 0.4384471873 0.2807764064 0.2807764064 0.2807764065 0.3596117968 0.3596117968 0.2807764065 0.3596117968 0.3596117968

111122122 \to 0.4384471873 0.2807764064 0.2807764064 0.2807764065 0.3596117968 0.3596117968 0.2807764065 0.3596117968 0.3596117968

211121111 \to 0.4648162417 0.2324081208 0.3027756380 0.2324081208 0.4648162417 0.3027756380 0.3027756377 0.3027756377 0.3944487245

211121111 \to 0.4648162417 0.2324081208 0.3027756380 0.2324081208 0.4648162417 0.3027756380 0.3027756377 0.3027756377 0.3944487245

221211111 \to 0.3274800021 0.4125989480 0.2599210499 0.4125989480 0.2599210499 0.3274800021 0.2599210499 0.3274800021 0.4125989480

221211111 \to 0.3274800021 0.4125989480 0.2599210499 0.4125989480 0.2599210499 0.3274800021 0.2599210499 0.3274800021 0.4125989480

221211112 = 0.3451802671 0.4435474272 0.2112723057 0.4435474272 0.2849733008 0.2714792720 0.2112723057 0.2714792720 0.5172484223 .

221211112 = 0.3451802671 0.4435474272 0.2112723057 0.4435474272 0.2849733008 0.2714792720 0.2112723057 0.2714792720 0.5172484223 .

a b b b c c b c c, a b c b a c c c d, a b c b c a c a b, a b c b d e c e f .

a b b b c c b c c, a b c b a c c c d, a b c b c a c a b, a b c b d e c e f .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMarkov Chains and Monte Carlo Methods · Topological and Geometric Data Analysis · Random Matrices and Applications

Full text

Matrix scaling, explicit Sinkhorn limits, and arithmetic

Melvyn B. Nathanson

Department of Mathematics

Lehman College (CUNY)

Bronx, NY 10468

[email protected]

Abstract.

The process of alternately row scaling and column scaling a positive $n\times n$ matrix $A$ converges to a doubly stochastic positive $n\times n$ matrix $S(A)$ , called the Sinkhorn limit of $A$ . Exact formulae for the Sinkhorn limits of certain symmetric positive $3\times 3$ matrices are computed, and related problems in diophantine approximation are considered.

Key words and phrases:

Matrix scaling, alternate minimization, Sinkhorn limits, diophantine approximation, Gröbner bases.

2010 Mathematics Subject Classification:

11C20, 11B75, 11J68, 11J70.

1. Doubly stochastic matrices and scaling

Let $A=(a_{i,j})$ be an $m\times n$ matrix. For $i\in\{1,\ldots,m\}$ , the $i$ th row sum of $A$ is

[TABLE]

For $j\in\{1,\ldots,n\}$ , the $j$ th column sum of $A$ is

[TABLE]

For example, the matrices

[TABLE]

have row and column sums equal to 1.

An $n\times n$ matrix $(u_{i,j})$ is diagonal if $u_{i,j}=0$ for all $i\neq j$ . Let $\operatorname{\text{diag}}(x_{1},\ldots,x_{n})$ denote the diagonal matrix whose $(i,i)$ th coordinate is $x_{i}$ for all $i\in\{1,2,\ldots,n\}$ . The diagonal matrix $\operatorname{\text{diag}}(x_{1},x_{2},\ldots,x_{n})$ is positive diagonal if $x_{i}>0$ for all $i$ .

The process of multiplying the rows of a matrix $A$ by scalars, or, equivalently, multiplying $A$ on the left by a diagonal matrix $X$ , is called row-scaling, and $X$ is called a row-scaling matrix.

The process of multiplying the columns of a matrix $A$ by scalars, or, equivalently, multiplying $A$ on the right by a diagonal matrix $Y$ , is called column-scaling, and $Y$ is called a column-scaling matrix.

Let $A=(a_{i,j})$ be an $m\times n$ matrix. If $X=\operatorname{\text{diag}}(x_{1},x_{2},\ldots,x_{m})$ and $Y=\operatorname{\text{diag}}(y_{1},y_{2},\ldots,y_{n})$ , then

[TABLE]

The $m\times n$ matrix $A=(a_{i,j})$ is positive if $a_{i,j}>0$ for all $i$ and $j$ , and nonnegative if $a_{i,j}\geq 0$ for all $i$ and $j$ . The matrix $A=(a_{i,j})$ is row stochastic if $A$ is nonnegative and $\operatorname{\text{row}}_{i}(A)=1$ for all $i\in\{1,\ldots,m\}$ . The matrix $A$ is column stochastic if $A$ is nonnegative and $\operatorname{\text{col}}_{j}(A)=1$ for all $j\in\{1,\ldots,n\}$ . The matrix $A$ is doubly stochastic if it is both row and column stochastic. For example, the matrices

[TABLE]

are doubly stochastic.

If the $m\times n$ matrix $A$ is doubly stochastic, then

[TABLE]

and so $A$ is a square matrix.

Let $A=(a_{i,j})$ be an $m\times n$ matrix with positive row sums, that is, $\operatorname{\text{row}}_{i}(A)>0$ for all $i\in\{1,\ldots,m\}$ . Let $X(A)=\operatorname{\text{diag}}(1/\operatorname{\text{row}}_{1}(A),\ldots,1/\operatorname{\text{row}}_{m}(A))$ denote the $m\times m$ diagonal matrix whose $i$ th diagonal coordinate is $1/\operatorname{\text{row}}_{i}(A)$ , and let

[TABLE]

We have

[TABLE]

and so

[TABLE]

for all $i\in\{1,\ldots,m\}$ . Therefore, $\mathcal{R}(A)$ is a row stochastic matrix.

Similarly, let $Y(A)=\operatorname{\text{diag}}(1/\operatorname{\text{col}}_{1}(A),\ldots,1/\operatorname{\text{col}}_{n}(A))$ denote the $n\times n$ diagonal matrix whose $j$ th diagonal coordinate is $1/\operatorname{\text{col}}_{j}(A)$ , and let

[TABLE]

We have

[TABLE]

and so

[TABLE]

for all $j\in\{1,\ldots,n\}$ . Therefore, $\mathcal{C}(A)$ is a column stochastic matrix.

For example, if

[TABLE]

then the matrix

[TABLE]

is row stochastic, and the matrix

[TABLE]

is column stochastic.

In this paper we study doubly stochastic matrices.

The following results (due to Sinkhorn [16], Knopp-Sinkhorn [17], Menon [14], Letac [12], Tverberg [18], and others) are classical.

Theorem 1.

Let $A=(a_{i,j})$ be an $n\times n$ matrix with $a_{i,j}>0$ for all $i,j\in\{1,\ldots,n\}$ .

(i)

There exist positive diagonal $n\times n$ matrices $X$ and $Y$ such that $XAY$ is doubly stochastic. 2. (1)

If $X$ , $X^{\prime}$ , $Y$ , and $Y^{\prime}$ are positive diagonal $n\times n$ matrices such that both $XAY$ and $X^{\prime}AY^{\prime}$ are doubly stochastic, then $XAY=X^{\prime}AY^{\prime}$ and there exists $\lambda>0$ such that $X^{\prime}=\lambda X$ and $Y^{\prime}=\lambda^{-1}Y$ .

The unique doubly stochastic matrix $XAY$ is called the Sinkhorn limit of A, and denoted $S(A)$ . 3. (2)

Let A be a positive symmetric $n\times n$ matrix. There exists a unique positive diagonal matrix X such that $XAX$ is doubly stochastic.

Theorem 2.

Let $\mathcal{S}_{n}$ be the set of positive doubly stochastic matrices. Let $\mathbf{R}^{n}_{>0}$ (resp. $\mathbf{R}^{n-1}_{>0}$ ) be the set of positive $n$ -dimensional (resp. $(n-1)$ -dimensional) vectors. Consider

[TABLE]

as a subset of $\mathbf{R}^{n^{2}+2n-1}$ with the subspace topology. Consider the set $M_{n}^{+}$ of positive $n\times n$ matrices as a subset of $\mathbf{R}^{n^{2}}$ with the subspace topology. The function from $\Omega$ to $M_{n}^{+}$ defined by

[TABLE]

is a homeomorphism.

Theorem 3.

Let A be a positive $n\times n$ matrix. Construct sequences of positive matrices $(A_{\ell})_{\ell=0}^{\infty}$ and $(A^{\prime}_{\ell})_{\ell=0}^{\infty}$ and sequences of positive diagonal matrices $(X_{\ell})_{\ell=0}^{\infty}$ and $(Y_{\ell})_{\ell=0}^{\infty}$ as follows: Let

[TABLE]

Given the matrix $A_{\ell}$ , let

[TABLE]

be the row-scaling matrix of $A_{\ell}$ , and let

[TABLE]

The matrix $A^{\prime}_{\ell}$ is row stochastic. Let

[TABLE]

be the column-scaling matrix of $A^{\prime}_{\ell}$ , and let

[TABLE]

The matrix $A_{\ell+1}$ is column stochastic. There exist positive diagonal matrices X and Y such that

[TABLE]

and the $n\times n$ matrix

[TABLE]

is doubly stochastic.

This process of obtaining a doubly stochastic matrix $S(A)$ from a positive matrix $A$ by row and column scaling is called alternate minimization.

It is an open problem to compute explicitly the Sinkhorn limit of a positive $n\times n$ matrix. This is known for $2\times 2$ matrices (Nathanson [15]). In this paper we compute explicit Sinkhorn limits for certain symmetric $3\times 3$ matrices, and discuss connections with diophantine approximation.

2. Experimental data

Here are some computational results. Using Maple, we row scale and then column scale the matrix, iterate this process 20 times, and print the resulting matrix.

[TABLE]

In these calculations, the alternate minimization algorithm generates approximately doubly stochastic matrices of four different shapes:

[TABLE]

3. Permutation matrices

Let $S_{n}$ be the group of permutations of the set $\{1,2,\ldots,n\}$ . For every $\sigma\in S_{n}$ , define the $n\times n$ permutation matrix $P_{\sigma}$ as follows:

[TABLE]

Equivalently,

[TABLE]

Thus,

[TABLE]

where $\delta_{i,j}$ is the Kronecker delta. The $i$ th row of $P_{\sigma}$ is row $\sigma(i)$ of the $n\times n$ identity matrix $I_{n}$ , and the $j$ th column of $P_{\sigma^{-1}}$ is column $\sigma(j)$ of $I_{n}$ .

For every $n\times n$ matrix $A$ , the $i$ th row of the matrix $P_{\sigma}A$ is row $\sigma(i)$ of $A$ , and the $j$ th column of the matrix $AP_{\sigma^{-1}}$ is column $\sigma(j)$ of $A$ . Thus, $P_{\sigma}A$ is a matrix constructed from $A$ by the $\sigma$ -permutation of the rows of $A$ , and $AP_{\sigma^{-1}}$ is a matrix constructed from $A$ by the $\sigma$ -permutation of the columns of $A$ .

For example, if $\sigma=(1,2,3)$ , then

[TABLE]

and

[TABLE]

Lemma 1.

For all permutations $\sigma,\tau\in S_{n}$ ,

[TABLE]

and

[TABLE]

Proof.

Let $i,j\in\{1,2,\ldots,n\}$ . Applying (1) with $j=k$ , we obtain

[TABLE]

This proves (2).

For the transpose of $P_{\sigma}$ , we have

[TABLE]

This proves (3). ∎

For example, if $\sigma=(1,2,3)$ and $\tau=(1,2)$ , then $\tau\sigma=(2,3)$ . We have

[TABLE]

and

[TABLE]

For $k,\ell\in\{1,2,\ldots,m\}$ with $k\neq\ell$ , let $\tau\in S_{m}$ be the transposition defined by

[TABLE]

and

[TABLE]

Let $A=(a_{i,j})$ be an $m\times n$ matrix. The $m\times m$ permutation matrix $P_{\tau}$ interchanges rows $k$ and $\ell$ of $A$ , as follows: For all $i\in\{1,\ldots,m\}$ and $j\in\{1,\ldots,n\}$ ,

[TABLE]

It follows that

[TABLE]

and so

[TABLE]

Let $\sigma$ be a permutation in $S_{m}$ , and let $P_{\sigma}$ be the corresponding $m\times m$ permutation matrix. Every permutation $\sigma\in S_{m}$ is a product of transpositions, and so there is a sequence of transpositions $\tau_{1},\ldots,\tau_{q-1},\tau_{q}$ such that

[TABLE]

and

[TABLE]

Applying identity (4) recursively, we obtain

[TABLE]

This proves that, for all permutations $\sigma\in S_{m}$ ,

[TABLE]

Similarly,

[TABLE]

For example, let

[TABLE]

Consider the permutation $\sigma=(3,2,1)\in S_{3}$ and its associated permutation matrix

[TABLE]

We have

[TABLE]

and

[TABLE]

Theorem 4.

Let $A$ be an $m\times n$ matrix. If $P$ and $Q$ are permutation matrices, then

[TABLE]

Proof.

It suffices to prove this for transpositions.

Interchanging two rows of a matrix and row scaling is the same as row scaling and then interchanging the rows.

Interchanging two rows of a matrix and column scaling is the same as column scaling and then interchanging the rows.

Interchanging two columns of a matrix and row scaling is the same as row scaling and then interchanging the columns.

Interchanging two columns of a matrix and column scaling is the same as column scaling and then interchanging the columns. ∎

Theorem 5.

Let $A$ be an $n\times n$ positive matrix. For all permutation matrices $P$ and $Q$ ,

[TABLE]

Proof.

Let $\left(A^{(\ell)}\right)_{\ell=0}^{\infty}$ be the alternate minimization sequence of matrices constructed from $A=A^{(0)}$ . For all $\ell\geq 0$ , we have

[TABLE]

and

[TABLE]

For every permutation matrix $P$ , we have

[TABLE]

Continuing inductively, we obtain

[TABLE]

for all $\ell\in\mathbf{N}_{0}$ , and so

[TABLE]

Similarly, for every permutation matrix $Q$ , we have

[TABLE]

Therefore,

[TABLE]

This completes the proof. ∎

Theorem 6.

For every positive $n\times n$ matrix $A$ ,

[TABLE]

Proof.

Let $X$ and $Y$ be diagonal matrices such that

[TABLE]

We have $X^{t}=X$ , $Y^{t}=Y$ , and

[TABLE]

If $S(A)$ is doubly stochastic, then $S(A)^{t}$ is doubly stochastic. The uniqueness theorem implies that

[TABLE]

This completes the proof. ∎

Theorem 7.

Let $\lambda>0$ . For every positive $n\times n$ matrix $A$ ,

[TABLE]

and

[TABLE]

Proof.

Klar. ∎

Here is an example of permutation and dilation equivalence. Let

[TABLE]

Dilating $A$ by $\lambda=1/2$ , we obtain

[TABLE]

Multiplying by the permutation matrices

[TABLE]

we obtain

[TABLE]

with $K=3/2$ . Equivalently,

[TABLE]

and

[TABLE]

Thus, the Sinkhorn limit of $B$ determines the Sinkhorn limit of A.

4. The $MBN$ matrix

Let $k$ , $\ell$ , and $n$ be positive integers such that $k+\ell=n$ . Let $M$ , $B$ , and $N$ be positive real numbers. Consider the $n\times n$ symmetric matrix

[TABLE]

in which the first $k$ rows are equal to

[TABLE]

and the last $\ell$ rows are equal to

[TABLE]

Let $X=\operatorname{\text{diag}}(x_{1},x_{2},x_{3},\ldots,x_{n})$ be the unique positive $n\times n$ diagonal matrix such that the alternate minimization limit $S(A)=XAX$ is doubly stochastic. Thus, the matrix

[TABLE]

satisfies

[TABLE]

and

[TABLE]

It follows that $x_{i}=x_{1}$ for $i=1,2,\ldots k$ and $x_{i}=x_{n}$ for $i=k+1,k+2,\ldots n$ . Let $x_{1}=x$ and $x_{n}=y$ . Define the diagonal matrix

[TABLE]

We obtain

[TABLE]

where

[TABLE]

Because $S(A)$ is row stochastic, we have

[TABLE]

and

[TABLE]

Equation (14) gives

[TABLE]

Inserting this into equation (15) and rearranging gives

[TABLE]

If $MN-B^{2}=0$ , then

[TABLE]

and $Mx^{2}=a=b=c=1/n$ . Thus, $S(A)$ is the $n\times n$ doubly stochastic matrix with every coordinate equal to $1/n$ .

If $MN-B^{2}\neq 0$ , then (16) is a quadratic equation in $x^{2}$ . We obtain

[TABLE]

and

[TABLE]

Recall that $ka+\ell b=1$ and so $a<1/k$ . If $MN>B^{2}$ , then

[TABLE]

If $MN<B^{2}$ , then

[TABLE]

In both cases, we obtain

[TABLE]

We obtain $b$ from (12) and $c$ from (13).

Theorem 8.

The Sinkhorn limit of the $MBN$ matrix (9) is the doubly stochastic matrix $S(A)$ defined by (10). The matrix $S(A)$ depends only on the ratio $MN/B^{2}$ .

Proof.

This follows immediately from (11), (12), and (13). ∎

For example, the matrices

[TABLE]

have the same Sinkhorn limit with $a=-37/38+5\sqrt{73}/38$ .

Theorem 8 explains why, in Section 2, the matrices $\left(\begin{matrix}2&1&1\\ 1&1&1\\ 1&1&1\end{matrix}\right)$ and $\left(\begin{matrix}1&1&1\\ 1&2&2\\ 1&2&2\end{matrix}\right)$ have the same Sinkhorn limits.

Let $\left(A^{(r)}\right)_{r=1}^{\infty}$ be a sequence of $MBN$ matrices such that $\lim_{r\rightarrow\infty}MN/B^{2}=\infty$ . Let

[TABLE]

We have

[TABLE]

and

[TABLE]

Similarly, let $\left(A^{(r)}\right)_{r=1}^{\infty}$ be a sequence of $MBN$ matrices such that $\lim_{r\rightarrow\infty}MN/B^{2}=0$ . It follows from (11) that

[TABLE]

If $k\leq\ell$ , then

[TABLE]

If $k>\ell$ , then

[TABLE]

5. $3\times 3$ symmetric matrices and

their doubly stochastic shapes

Let $A$ and $B$ be $n\times n$ positive matrices. We write $A\sim B$ if there exist $n\times n$ permutation matrices $P$ and $Q$ and $\lambda>0$ such that

[TABLE]

It is straightforward to check that this is an equivalence relation. If $A\sim B$ , then

[TABLE]

Thus, it suffices to compute the Sinkhorn limit of only one matrix in an equivalence class.

The goal is to compute the Sinkhorn limit of every $3\times 3$ symmetric positive matrix whose set of coordinates consists of two distinct real numbers.

Let A be such a matrix with coordinates $a$ and $b$ . There are 9 coordinate positions in the matrix, and so exactly one of the numbers $a$ and $b$ occurs at least five times. Suppose that the coordinate $a$ occurs five or more times. Let $\lambda=1/a$ and $K=b/a$ . The matrix $\lambda A$ has two distinct positive coordinates $1$ and $K$ , and $K$ occurs at most four times. There are seven equivalence classes of such matrices with respect to permutations and dilations. Here is the list, and, for each matrix, the shape of its Sinkhorn limit. Note that $K$ is a positive real number and $K\neq 1$ .

(1)

[TABLE] 2. (2)

[TABLE] 3. (3)

[TABLE] 4. (4)

[TABLE] 5. (5)

[TABLE] 6. (6)

[TABLE] 7. (7)

[TABLE]

6. The matrix $A_{1}$

The matrix

[TABLE]

is the simplest. Just one row scaling or one column scaling produces the doubly stochastic matrix

[TABLE]

We have $S(A_{1})=XA_{1}X$ , where

[TABLE]

Moreover,

[TABLE]

7. The matrices $A_{2}$ , $A_{3}$ , and $A_{4}$

These are $MBN$ matrices. The matrix

[TABLE]

is an $MBN$ matrix with $k=1$ , $\ell=2$ , $M=K$ , and $B=N=1$ .

The matrix

[TABLE]

is an $MBN$ matrix with $k=1$ , $\ell=2$ , $M=B=1$ , and $N=K$ . Both matrices satisfy $MN/B^{2}=K\neq 1$ , and so they have the same Sinkhorn limit

[TABLE]

with

[TABLE]

For example, if $K=2$ , then

[TABLE]

and

[TABLE]

both have limits with coordinates

[TABLE]

Moreover,

[TABLE]

The matrix

[TABLE]

is an $MBN$ matrix with $k=1$ , $\ell=2$ , $M=N=1$ , and $B=K$ . We have $MN/B^{2}=1/K^{2}\neq 0$ , and

[TABLE]

with

[TABLE]

For example, with $K=2$ , we have

[TABLE]

Moreover,

[TABLE]

8. The matrix $A_{5}$

The construction of the Sinkhorn limit of the $3\times 3$ matrix

[TABLE]

requires only high school algebra. There exists a unique positive diagonal matrix $X=\operatorname{\text{diag}}(x,y,z)$ such that $XA_{5}X$ is doubly stochastic. We have

[TABLE]

and so

[TABLE]

We have

[TABLE]

Rearranging, we obtain

[TABLE]

Note that $0<xy<1$ . If $K>1$ , then $(K-1)xy+1>1$ . If $0<K<1$ , then

[TABLE]

and $(K-1)xy+1>0$ . Therefore, $x=y$ , and so

[TABLE]

We obtain

[TABLE]

Equivalently,

[TABLE]

and so

[TABLE]

Eliminating $xz$ from (21) and (22) gives

[TABLE]

The inequalities $Kx^{2}<1$ and $z^{2}<1$ imply

[TABLE]

and

[TABLE]

Thus,

[TABLE]

where

[TABLE]

For example, with $K=2$ , we obtain

[TABLE]

We have the asymptotic limit

[TABLE]

9. The matrix $A_{6}$

The construction of the Sinkhorn limit of the $3\times 3$ matrix

[TABLE]

also requires only high school algebra. There exists a unique positive diagonal matrix $X=\operatorname{\text{diag}}(x,y,z)$ such that

[TABLE]

is a doubly stochastic matrix, and so

[TABLE]

From (24), we obtain

[TABLE]

Inserting (27) into (25) gives

[TABLE]

Inserting (28) into (27) gives

[TABLE]

Inserting (28) and (29) into (26) and rearranging gives

[TABLE]

Equivalently,

[TABLE]

and so

[TABLE]

and

[TABLE]

Inserting this into (28) gives

[TABLE]

and then (27) gives

[TABLE]

Thus,

[TABLE]

and

[TABLE]

This determines the scaling matrix X. The Sinkhorn limit is the circulant matrix

[TABLE]

with

[TABLE]

The asymptotic limit is

[TABLE]

Let

[TABLE]

be the $\ell$ th matrix in the alternate minimization algorithm for the matrix (23). We have

[TABLE]

and so alternate minimization generates sequences of rational numbers that converges to $K^{1/3}$ .

For example, with $K=2$ , we obtain

[TABLE]

10. The matrix $A_{7}$

Consider the symmetric $3\times 3$ matrix

[TABLE]

There exists a unique positive diagonal matrix $X=\operatorname{\text{diag}}(x,y,z)$ such that

[TABLE]

is doubly stochastic. Therefore,

[TABLE]

Observe that equations (30) and (24) are identical, and that equations (31) and (25) are identical. Therefore,

[TABLE]

and

[TABLE]

Substituting (33) and (34) into the third equation gives a polynomial in one variable:

[TABLE]

By Sinkhorn’s theorem, this polynomial has at least one positive solution. If $K>1$ , then, by Descartes’s rule of signs, this polynomial has exactly two positive solutions. If $0<K<1$ , then this polynomial has two, four, or six positive solutions.

For example, let $K=2$ . Let $X=\operatorname{\text{diag}}(x,y,z)$ be the unique positive diagonal matrix such that the matrix

[TABLE]

is doubly stochastic, and

[TABLE]

The number $y$ is a solution of the octic polynomial

[TABLE]

According to Maple, the unique solution of this polynomial in the interval $(0,1)$ is

[TABLE]

From equations (33) and (34), we obtain

[TABLE]

and

[TABLE]

We obtain

[TABLE]

This agrees with the calculation in Section 2.

Let $K=3$ . Let $X=\operatorname{\text{diag}}(x,y,z)$ be the unique positive diagonal matrix such that the matrix

[TABLE]

is doubly stochastic, and

[TABLE]

The number $y$ is a solution of the octic polynomial

[TABLE]

According to Maple, the solutions of this polynomial in the interval $(0,1)$ are

[TABLE]

Choosing $y=0.5083028225$ , we obtain from equations (33) and (34) the numbers

[TABLE]

and

[TABLE]

and so

[TABLE]

This agrees with the calculation in Section 2.

It is interesting to observe that if we choose the the second root of the polynomial (33), we obtain

[TABLE]

and

[TABLE]

For matrices of the form $A_{7}$ , we do not explicit formulae for the coordinates of the Sinkhorn limit as explict functions of $K$ . Computer calculations suggest that the asymptotic limit of $S(A_{6})$ as $K\rightarrow\infty$ is

[TABLE]

11. Gröbner bases and algebraic numbers

I like solving problems using high school algebra. However, it is important to note that the previous calculations are also easily done using Gröbner bases.

Here is an example. Consider the $A_{7}$ matrix

[TABLE]

with $K>0$ and $K\neq 1$ . There exist unique positive real numbers $x,y,z$ that satisfy the polynomial equations

[TABLE]

Equivalently, $(x,y,z)$ is the unique positive vector in $\mathbf{R}^{3}$ that is in the affine variety $V(I)$ , where $I$ is the ideal in $\mathbf{R}[x,y,z]$ generated by the polynomials

[TABLE]

Let $K=2$ . Using the Groebner package in Maple with the lexicographical order $(x,y,z)$ , we obtain the Gröbner basis

[TABLE]

Applying Maple with the lexicographical order $(y,z,x)$ , we obtain the Gröbner basis

[TABLE]

Applying Maple with the lexicographical order $(z,x,y)$ , we obtain the Gröbner basis

[TABLE]

Thus, $x^{2}$ , $y^{2}$ , and $z^{2}$ are algebraic numbers of degree at most 4, and we have explicit polynomial representations of each variable $x$ , $y$ , $z$ in terms of the others.

For arbitrary $K$ , applying Maple with the lexicographical order $(y,z,x)$ , we obtain the Gröbner basis

[TABLE]

For each of the 8 roots of $h_{1}(y)$ ,the polynomials $g_{2}(z,y)$ and $g_{3}(x,y)$ determine unique numbers $x$ and $z$ . Exactly one of the triples $(x,y,z)$ will be positive.

For every positive symmetric $n\times n$ matrix $A=(a_{i,j})$ , the Sinkhorn limit $S(A)=XAX$ with scaling matrix $X=\operatorname{\text{diag}}(x_{1},\ldots,x_{n})$ is the unique positive solution of a set $Q=\{q_{i}:i=1,\ldots,n\}$ of $n$ quadratic equations of the form

[TABLE]

Equivalently, $(x_{1},\ldots,x_{n})$ is the unique positive vector in the affine variety of the ideal generated by $Q$ . A Gröbner basis for this ideal shows that if the coordinates of the matrix $A=(a_{i,j})$ are rational numbers, then $x_{1},\ldots,x_{n}$ are algebraic numbers of degrees bounded in terms of $n$ .

12. Diophantine approximation

Let $A$ be a an $n\times n$ matrix with positive rational coordinates, and let $d$ be the least common multiple of the denominators of the coordinates of $A$ . The matrix $dA$ has positive integral coordinates, and the matrix obtained by row scaling (or column scaling) $A$ is equal to the matrix obtained by row scaling (or column scaling) $dA$ . Thus, the Sinkhorn limit obtained from the rational matrix $A$ equals the Sinkhorn limit obtained from the integral matrix $dA$ . The sequence of matrices generated by alternate row and column scalings are rational matrices. If $A^{(\ell)}=\left(a_{i,j}^{(\ell)}\right)$ is the $\ell$ th matrix obtained in the alternate minimization algorithm, and if the Sinkhorn limit is $S(A)=\left(s_{i,j}\right)$ , then

[TABLE]

for all $i,j=1,\ldots n$ . If the coordinate $s_{i,j}$ is irrational for some pair $(i,j)$ , then the alternate minimization cannot terminate in a finite number of steps. It is an open problem to the matrices $A$ for which the alternate minimization does terminate in a finite number of steps.

The Sinkhorn limit coordinates $s_{i,j}$ are algebraic numbers for all rational matrices A. If the coordinate $s_{i,j}$ is irrational for some $i$ and $j$ , then the alternate minimization algorithm constructs a sequence of rational approximations to $s_{i,j}$ . For example, alternate minimization provides a sequence (in fact, several sequences) of rational numbers that converge to $K^{1/3}$ for every positive integer $K$ . The matrix

[TABLE]

has Sinkhorn limit

[TABLE]

with

[TABLE]

If $A_{6}^{\ell)}=\left(a_{i,j}^{\ell)}\right)$ , then

[TABLE]

For example, for $K=2$ , we have

[TABLE]

Here are the rational numbers in the first six iterations of the Sinkhorn algorithm, and their decimal representations:

[TABLE]

where

[TABLE]

Note that

[TABLE]

The continued fraction for $2^{1/3}-1$ is $[0,3,1,5,1,1,4,1,1,8,1,14,1,10,\ldots].$ For comparison, here are the first ten convergents of the continued fraction for $2^{1/3}-1$ :

[TABLE]

13. Rationality and finite length

For what positive $n\times n$ matrices does the alternate minimization algorithm converge in finitely many steps? This problem has been solved for $2\times 2$ matrices (Nathanson [15]), but it is open for all dimensions $n\geq 3$ . In dimension 3, matrices equivalent to $A_{1}$ become doubly stochastic in one step, that is, after one row or one column scaling. It is not know if there exists a positive $3\times 3$ matrix that becomes doubly stochastic in exactly two steps. More generally, it is not know if there exists a positive $3\times 3$ matrix that becomes doubly stochastic in exactly $s$ steps for some $s\geq 2$ .

Consider the matrix $A_{2}=\left(\begin{matrix}K&1&1\\ 1&1&1\\ 1&1&1\end{matrix}\right)$ with parameter $K$ . If $K$ is a rational number, then every matrix generated by iterated row and column scalings has rational coordinates. If the Sinkhorn limit contains an irrational coordinate, then the alternate minimization algorithm cannot terminate in finitely many steps.

If $K$ is an integer and $K\geq 2$ , then the Sinkhorn limit $S(A_{2})$ has coordinates in the quadratic field $\mathbf{Q}(\sqrt{8K+1})$ . For example, from (17), the $(1,1)$ coordinate of $S(A_{2})$ is

[TABLE]

This number is rational if and only if the odd integer $8K+1$ is the square of an odd integer, that is, if and only if $8K+1=(2r+1)^{2}$ for some positive integer $r$ and so $K=r(r+1)/2$ is a triangular number. From (17), (18), and (19), we obtain

[TABLE]

Moreover, $S(A_{2})=XA_{2}X$ , where $X=\operatorname{\text{diag}}(x,y,y)$ with $Kx^{2}=a$ and $y^{2}=c$ . Thus,

[TABLE]

For example, if $K=3$ , then $r=2$ and

[TABLE]

where

[TABLE]

Note that $A_{2}$ also has a scaling by rational matrices

[TABLE]

where

[TABLE]

It is not known if there exists a triangular number $K$ for which the alternate minimization algorithm terminates in a finite number of steps.

14. Open problems

(1)

Compute explicit formulas for the Sinkhorn limits of all positive symmetric $3\times 3$ matrices. This is a central problem. 2. (2)

Here is a special case. Let $K,L,M$ and 1 be pairwise distinct positive numbers. Compute the Sinkhorn limits of the matrices

[TABLE] 3. (3)

For what positive $n\times n$ matrices does the alternate minimization algorithm converge in finitely many steps? This is the problem discussed in the previous section. 4. (4)

It is not known what algebraic numbers appear as coordinates of the Sinkhorn limit of a positive integral matrix. It would be interesting to have an example of an algebraic number in the unit interval that is not a coordinate of the Sinkhorn limit of a rational matrix. 5. (5)

Does there exist a $3\times 3$ matrix $A$ such that $A$ is row stochastic but not column stochastic, and $AY(A)$ is doubly stochastic? 6. (6)

Does every possible shape of a doubly stochastic $3\times 3$ matrix $A$ appear as the nontrivial limit of some $3\times 3$ matrix? 7. (7)

Why does the shape of the Sinkhorn limit $S(A)$ seem to depend only on the shape of the matrix $A$ and not on the numerical values of the coordinates of $A$ ? 8. (8)

What does the Sinkhorn limit $S(A)$ tell us about the matrix $A$ ? What information does it convey? 9. (9)

The matrix $A$ is positive if $a_{i,j}>0$ for all $i$ and $j$ . The matrix $A$ is nonnegative if $a_{i,j}\geq 0$ for all $i$ and $j$ .

Let A be a nonnegative $m\times n$ matrix. Let $\mathbf{r}=(r_{1},r_{2},\ldots,r_{m})\in\mathbf{R}^{m}$ and let $\mathbf{c}=(c_{1},c_{2},\ldots,c_{n})\in\mathbf{R}^{n}$ . The matrix A is $\mathbf{r}$ -row stochastic if $\operatorname{\text{row}}_{i}(A)=r_{i}$ for all $i\in\{1,2,\ldots,m\}$ . The matrix A is $\mathbf{c}$ -column stochastic if $\operatorname{\text{col}}_{j}(A)=c_{j}$ for all $j\in\{1,2,\ldots,n\}$ . The matrix $A$ is $(\mathbf{r},\mathbf{c})$ -stochastic if it is both $\mathbf{r}$ -row stochastic and $\mathbf{c}$ -column stochastic. Note that if A is $(\mathbf{r},\mathbf{c})$ -stochastic, then

[TABLE]

Let A be a positive matrix. Let $X$ be the $m\times m$ diagonal matrix whose $i$ th coordinate is $r_{i}/\operatorname{\text{row}}_{i}(A)$ , and let $Y$ be the $n\times$ diagonal matrix whose $j$ th coordinate is $c_{j}/\operatorname{\text{col}}_{j}(A)$ . The matrix $XA$ is $\mathbf{r}$ -row stochastic and the matrix $AY$ is $\mathbf{c}$ -column stochastic.

A simple modification of the alternate minimization algorithm applied to a positive matrix satisfying (36) produces an $(\mathbf{r},\mathbf{c})$ -stochastic Sinkhorn limit. It is an open problem to compute explicit Sinkhorn limits in the $(\mathbf{r},\mathbf{c})$ -stochastic setting.

15. Notes

In his 1964 paper, Richard Sinkhorn [16, p.877] wrote:

The iterative process of alternately normalizing the rows and columns of a strictly positive $N\times N$ matrix is convergent to a strictly positive doubly stochastic matrix.

Sinkhorn did not prove this result. The proof of convergence of the alternate minimization algorithm appears in Knopp and Sinkhorn [17], and in Letac [12]. Geometric existence proofs of exact scaling appear in Menon [14], and in Tverberg [18].

The computational complexity of Sinkhorn’s alternate scaling algorithm is investigated in Kalantari and Khachiyan [9, 10], Kalantari, Lari, Ricca, and Simeone [11], Linial, Samorodnitsky and Wigderson [13] and Allen-Zhu, Li, Oliveira, and Wigderson [1]. An extension of matrix scaling to operator scaling began with Gurvits [5], and is developed in Garg, Gurvits, Oliveira, and Wigderson [3, 4], Gurvits [6], and Gurvits and Samorodnitsky [7]. Motivating some of this recent work are the classical papers of Edmonds [2] and Valient [19, 20].

The literature on matrix scaling is vast. See the recent survey paper of Idel [8]. For the early history of matrix scaling, see Allen-Zhu, Li, Oliveira, and Wigderson [1, Section 1.1].

Acknowledgements. The alternate minimization algorithm was discussed in several lectures in the New York Number Theory Seminar, and I thank the participants for their useful remarks. In particular, I thank David Newman for making the initial computations that suggested some of the problems considered in this paper.

Bibliography20

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Z. Allen-Zhu, Y. Li, R. Oliveira, and A. Wigderson, Much faster algorithms for matrix scaling , 58th Annual IEEE Symposium on Foundations of Computer Science—FOCS 2017, IEEE Computer Soc., Los Alamitos, CA, 2017, pp. 890–901.
2[2] J. Edmonds, Systems of distinct representatives and linear algebra , J. Res. Nat. Bur. Standards Sect. B 71B (1967), 241–245.
3[3] A. Garg, L. Gurvits, R. Oliveira, and A. Wigderson, A deterministic polynomial time algorithm for non-commutative rational identity testing , 57th Annual IEEE Symposium on Foundations of Computer Science—FOCS 2016, IEEE Computer Soc., Los Alamitos, CA, 2016, pp. 109–117.
4[4] by same author, Algorithmic and optimization aspects of Brascamp-Lieb inequalities, via operator scaling , STOC’17—Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, ACM, New York, 2017, pp. 397–409.
5[5] L. Gurvits, Classical complexity and quantum entanglement , J. Comput. System Sci. 69 (2004), no. 3, 448–484.
6[6] by same author, Boolean matrices with prescribed row/column sums and stable homogeneous polynomials: combinatorial and algorithmic applications , Inform. and Comput. 240 (2015), 42–55.
7[7] L. Gurvits and A. Samorodnitsky, Bounds on the permanent and some applications , 55th Annual IEEE Symposium on Foundations of Computer Science—FOCS 2014, IEEE Computer Soc., Los Alamitos, CA, 2014, pp. 90–99.
8[8] M. Idel, A review of matrix scaling and Sinkhorn’s normal form for matrices and positive maps , ar Xiv:1609.06349, 2016.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Matrix scaling, explicit Sinkhorn limits, and arithmetic

Abstract.

Key words and phrases:

2010 Mathematics Subject Classification:

1. Doubly stochastic matrices and scaling

Theorem 1**.**

Theorem 2**.**

Theorem 3**.**

2. Experimental data

3. Permutation matrices

Lemma 1**.**

Proof.

Theorem 4**.**

Proof.

Theorem 5**.**

Proof.

Theorem 6**.**

Proof.

Theorem 7**.**

Proof.

4. The MBNMBNMBN matrix

Theorem 8**.**

Proof.

5. 3×33\times 33×3 symmetric matrices and

6. The matrix A1A_{1}A1​

7. The matrices A2A_{2}A2​, A3A_{3}A3​, and A4A_{4}A4​

8. The matrix A5A_{5}A5​

9. The matrix A6A_{6}A6​

10. The matrix A7A_{7}A7​

11. Gröbner bases and algebraic numbers

12. Diophantine approximation

13. Rationality and finite length

14. Open problems

15. Notes

Theorem 1.

Theorem 2.

Theorem 3.

Lemma 1.

Theorem 4.

Theorem 5.

Theorem 6.

Theorem 7.

4. The $MBN$ matrix

Theorem 8.

5. $3\times 3$ symmetric matrices and

6. The matrix $A_{1}$

7. The matrices $A_{2}$ , $A_{3}$ , and $A_{4}$

8. The matrix $A_{5}$

9. The matrix $A_{6}$

10. The matrix $A_{7}$