Matrix scaling and explicit doubly stochastic limits
Melvyn B. Nathanson

TL;DR
This paper derives exact formulas for the Sinkhorn limits of specific symmetric positive 3x3 matrices, enhancing understanding of matrix scaling convergence to doubly stochastic matrices.
Contribution
It provides explicit formulas for Sinkhorn limits of certain symmetric 3x3 matrices, a novel contribution to matrix scaling theory.
Findings
Exact formulas for Sinkhorn limits of specific matrices
Enhanced understanding of convergence in matrix scaling
Explicit characterization of symmetric 3x3 cases
Abstract
The process of alternately row scaling and column scaling a positive matrix converges to a doubly stochastic positive matrix , often called the \emph{Sinkhorn limit} of . The main result in this paper is the computation of exact formulae for the Sinkhorn limits of certain symmetric positive matrices.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMarkov Chains and Monte Carlo Methods · Random Matrices and Applications · Point processes and geometric inequalities
Matrix scaling and explicit doubly stochastic limits
Melvyn B. Nathanson
Department of Mathematics
Lehman College (CUNY)
Bronx, NY 10468
Abstract.
The process of alternately row scaling and column scaling a positive matrix converges to a doubly stochastic positive matrix , often called the Sinkhorn limit of . The main result in this paper is the computation of exact formulae for the Sinkhorn limits of certain symmetric positive matrices.
Key words and phrases:
Matrix scaling, iterative scaling, Sinkhorn limits, Gröbner bases.
2010 Mathematics Subject Classification:
11C20, 11B75, 11J68, 11J70.
Supported in part by a grant from the PSC-CUNY Research Award Program.
1. Doubly stochastic matrices and scaling
Let be an matrix. For , the th row sum of is
[TABLE]
For , the th column sum of is
[TABLE]
The matrix is positive if for all and , and nonnegative if for all and . The matrix is row stochastic if is nonnegative and for all . The matrix is column stochastic if is nonnegative and for all . The matrix is doubly stochastic if it is both row and column stochastic.
Let denote the diagonal matrix whose th coordinate is for all . The matrix is positive diagonal if for all .
Let be an matrix. The process of multiplying the rows of by scalars, or, equivalently, multiplying on the left by a diagonal matrix , is called row-scaling, and is called a row-scaling matrix.
The process of multiplying the columns of by scalars, or, equivalently, multiplying on the right by a diagonal matrix , is called column-scaling, and is called a column-scaling matrix.
If and , then
[TABLE]
Let be an matrix with positive row sums, that is, for all . Let
[TABLE]
and let
[TABLE]
We have
[TABLE]
and so
[TABLE]
for all . Therefore, is a row stochastic matrix.
Similarly, if is an matrix with positive column sums and if
[TABLE]
and
[TABLE]
then
[TABLE]
and
[TABLE]
for all . Therefore, is a column stochastic matrix.
The following two theorems were stated by Sinkhorn [20], and subsequently proved by Brualdi, Parter, and Schneider [2], Djoković [3], Knopp-Sinkhorn [21], Menon [17], Letac [15], and Tverberg [22].
Theorem 1**.**
Let be a positive matrix.
- (i)
There exist positive diagonal matrices and such that is doubly stochastic. 2. (ii)
If , , , and are positive diagonal matrices such that both and are doubly stochastic, then and there exists such that and . 3. (iii)
Let A be a positive symmetric matrix. There exists a unique positive diagonal matrix X such that is doubly stochastic.
The unique doubly stochastic matrix in Theorem 1 is called the Sinkhorn limit of A, and denoted .
Theorem 2**.**
Let A be a positive matrix, and let be the Sinkhorn limit of . Construct sequences of positive matrices and and sequences of positive diagonal matrices and as follows: Let
[TABLE]
Given the matrix , let
[TABLE]
be the row-scaling matrix of defined by (1). The matrix
[TABLE]
is row stochastic. Let
[TABLE]
be the column-scaling matrix of defined by (2), and let
[TABLE]
The matrix is column stochastic.
The Sinkhorn limit is obtained by alternately row-scaling and column-scaling:
[TABLE]
It is an open problem to compute explicitly the Sinkhorn limit of a positive matrix. This is known for matrices (Nathanson [18]). The goal of this paper is the explicit computation of Sinkhorn limits for certain matrices.
2. Sinkhorn limits of symmetric matrices and
their doubly stochastic shapes
Let and be positive matrices. We write if there exist permutation matrices and and such that
[TABLE]
This is an equivalence relation. Moreover, implies
[TABLE]
Thus, it suffices to determine the Sinkhorn limit of only one matrix in an equivalence class.
We shall compute the Sinkhorn limit of every symmetric positive matrix whose set of coordinates consists of two distinct real numbers.
Let be such a matrix with coordinates and with . There are 9 coordinate positions in the matrix, and so exactly one of the numbers and occurs at least five times. Suppose that the coordinate occurs five or more times. Let and . The matrix has two distinct positive coordinates and , and occurs at most four times. There are seven equivalence classes of such matrices with respect to permutations and dilations. The main result of this paper is the calculation of the Sinkhorn limits of these matrices.
Theorem 3**.**
Let and . The matrices below are a complete set of representatives of the seven equivalence classes of symmetric matrices with coordinates 1 and . The matrix gives the shape of the Sinkhorn limit of for . The coordinates of the Sinkhorn limits as explicit functions of 1 and are computed in Sections 4–8.
- (1)
[TABLE] 2. (2)
[TABLE] 3. (3)
[TABLE] 4. (4)
[TABLE] 5. (5)
[TABLE] 6. (6)
[TABLE] 7. (7)
[TABLE]
3. The matrix
Let , , and be positive integers such that
[TABLE]
Let , , and be positive real numbers. Consider the symmetric matrix
[TABLE]
in which the first rows are equal to
[TABLE]
and the last rows are equal to
[TABLE]
Let be the unique positive diagonal matrix such that the alternate scaling limit is doubly stochastic. Thus, the matrix
[TABLE]
satisfies
[TABLE]
and
[TABLE]
It follows that for and for . Let and . Define the diagonal matrix
[TABLE]
We obtain
[TABLE]
where
[TABLE]
Because is row stochastic, we have
[TABLE]
and
[TABLE]
Equation (11) gives
[TABLE]
Inserting this into equation (12) and rearranging gives
[TABLE]
If , then
[TABLE]
and . Thus, is the doubly stochastic matrix with every coordinate equal to .
If , then (13) is a quadratic equation in . Let
[TABLE]
We obtain
[TABLE]
and
[TABLE]
Recall that and so
[TABLE]
If , then and
[TABLE]
The inequality implies that
[TABLE]
If , then and
[TABLE]
Because
[TABLE]
the inequality implies (14).
We have proved the following.
Theorem 4**.**
The Sinkhorn limit of the matrix (6) is a doubly stochastic matrix with shape (7). If , then . If , then equations (14), (9), and (10) define the coordinates , , and . The matrix depends only on the ratio .
For example, the matrices
[TABLE]
have the same Sinkhorn limit with
[TABLE]
Let be a sequence of matrices such that . Let
[TABLE]
We have
[TABLE]
and
[TABLE]
Similarly, let be a sequence of matrices such that . It follows from (8) that
[TABLE]
If , then
[TABLE]
If , then
[TABLE]
4. The matrix
The matrix
[TABLE]
is the simplest. Just one row scaling or one column scaling produces the doubly stochastic matrix
[TABLE]
We have , where
[TABLE]
We have the asymptotic limits
[TABLE]
5. The matrices , , and
These are matrices. The matrix
[TABLE]
is an matrix with , , , , and .
The matrix
[TABLE]
is an matrix with , , , , and . Both matrices satisfy , and so they have the same Sinkhorn limit
[TABLE]
with
[TABLE]
We have the asymptotic limits
[TABLE]
The matrix
[TABLE]
is an matrix with , , , and . We have , and the Sinkhorn limit
[TABLE]
with
[TABLE]
We have the asymptotic limits
[TABLE]
6. The matrix
The construction of the Sinkhorn limit of the matrix
[TABLE]
requires only high school algebra. There exists a unique positive diagonal matrix such that is doubly stochastic and positive. We have
[TABLE]
and so
[TABLE]
We have
[TABLE]
Rearranging, we obtain
[TABLE]
Note that . If , then . If , then
[TABLE]
and . Therefore, , and so
[TABLE]
[TABLE]
[TABLE]
We obtain
[TABLE]
Applying (19) and eliminating from (20) and (21) gives
[TABLE]
Therefore,
[TABLE]
and so
[TABLE]
The inequality implies
[TABLE]
and
[TABLE]
Thus, the Sinkhorn limit has the shape
[TABLE]
where
[TABLE]
We have the asymptotic limits
[TABLE]
7. The matrix
The construction of the Sinkhorn limit of the matrix
[TABLE]
also requires only high school algebra. There exists a unique positive diagonal matrix such that
[TABLE]
is a doubly stochastic matrix, and so
[TABLE]
[TABLE]
and so
[TABLE]
and
[TABLE]
Inserting (26) and (27) into (25) and simplifying, we obtain
[TABLE]
and so
[TABLE]
and
[TABLE]
Inserting this into (26) gives
[TABLE]
and then (27) gives
[TABLE]
This determines the scaling matrix X. The Sinkhorn limit is the circulant matrix
[TABLE]
with
[TABLE]
The asymptotic limits are
[TABLE]
8. The matrix
Consider the symmetric matrix
[TABLE]
There exists a unique positive diagonal matrix such that
[TABLE]
is doubly stochastic. Therefore,
[TABLE]
Because equations (28) and (23) are identical, and equations (29) and (24) are identical, we obtain (26) and (27). Inserting these formulae for and into (30) gives the octic polynomial
[TABLE]
By Theorem 1, this polynomial has at least one solution . If , then, by Descartes’s rule of signs, this polynomial has exactly two positive solutions. If , then this polynomial has one or three positive solutions. For matrices of the form , we do not have explicit formulae for the coordinates of the Sinkhorn limit as functions of . Computer calculations suggest that the asymptotic limits of as and are
[TABLE]
9. Gröbner bases and algebraic numbers
I like solving problems using high school algebra. However, it is important to note that the previous calculations are also easily done using Gröbner bases.
For every matrix and diagonal matrix , we have the matrix
[TABLE]
If is positive and symmetric, then, by Theorems 1 and 2, the quadratic equations
[TABLE]
have a unique positive solution, and the diagonal matrix is the unique scaling matrix in the Sinkhorn limit . Equivalently, is the unique positive vector in the affine variety of the ideal in generated by the set of polynomials . For each lexicographical ordering of the variables , Maple (and other computer algebra programs) can compute a Gröbner basis for the ideal. The Gröbner basis for this ideal shows that if the coordinates of the matrix are rational numbers, then are algebraic numbers of degrees bounded in terms of .
Here is an example. Let and . Consider the matrices
[TABLE]
with and . There exist unique positive real numbers that satisfy the quadratic equations
[TABLE]
Equivalently, is the unique positive vector in the affine variety , where is the ideal in generated by the polynomials
[TABLE]
Let . Using the Groebner package in Maple with the lexicographical order , we obtain the Gröbner basis
[TABLE]
Applying Maple with the lexicographical order , we obtain the Gröbner basis
[TABLE]
Applying Maple with the lexicographical order , we obtain the Gröbner basis
[TABLE]
Thus, , , and are algebraic numbers of degree at most 4, and we have explicit polynomial representations of each variable , , in terms of the other two variables.
For arbitrary , applying Maple with the lexicographical order , we obtain the Gröbner basis
[TABLE]
For each of the 8 roots of , the polynomials and determine unique numbers and . Exactly one of the triples will be positive.
10. Rationality and finite length
For what positive matrices does the alternate scaling algorithm converge in finitely many steps? This problem has been solved for matrices (Nathanson [18]), but it is open for all dimensions . In dimension 3, matrices equivalent to become doubly stochastic in one step, that is, after one row or one column scaling. Ekhad and Zeilberger [5] computed a positive matrix that becomes doubly stochastic in exactly two steps, and Nathanson [19] generalized this construction. It is not know if there exists a positive matrix that becomes doubly stochastic in exactly steps for some .
Consider the matrix with parameter . If is a rational number, then every matrix generated by iterated row and column scalings has rational coordinates. If the Sinkhorn limit contains an irrational coordinate, then the alternate scaling algorithm cannot terminate in finitely many steps.
Let be an integer, . In Section 5 we proved that the Sinkhorn limit has coordinates in the quadratic field . For example, from (15), the coordinate of is
[TABLE]
This number is rational if and only if the odd integer is the square of an odd integer, that is, if and only if for some positive integer and so is a triangular number. From (15), (16), and (17), we obtain
[TABLE]
Moreover, , where with and . Thus,
[TABLE]
For example, if , then and
[TABLE]
where
[TABLE]
Note that also has a scaling by rational matrices
[TABLE]
where
[TABLE]
It is not known if there exists a triangular number for which the alternate scaling algorithm terminates in a finite number of steps.
11. Open problems
- (1)
Compute explicit formulas for the Sinkhorn limits of matrices of the form . More generally, compute explicit formulas for the Sinkhorn limits of all positive symmetric matrices. This is a central problem. 2. (2)
Here is a special case. Let and 1 be pairwise distinct positive numbers. Compute the Sinkhorn limits of the matrices
[TABLE] 3. (3)
For what positive matrices does the alternate scaling algorithm converge in finitely many steps? This is the problem discussed in Section 10. 4. (4)
It is not known what algebraic numbers appear as coordinates of Sinkhorn limits of matrices with positive integral coordinates. It would be interesting to have an example of an algebraic number in the unit interval that is not a coordinate of the Sinkhorn limit of a positive integral matrix. 5. (5)
Does every possible shape of a doubly stochastic matrix appear as the nontrivial Sinkhorn limit of some matrix? 6. (6)
Why does the shape of the Sinkhorn limit seem to depend only on the shape of the matrix and not on the numerical values of the coordinates of ? 7. (7)
Let A be a nonnegative matrix. Let and let . The matrix A is -row stochastic if for all . The matrix A is -column stochastic if for all . The matrix is -stochastic if it is both -row stochastic and -column stochastic.
Let A be a positive matrix. Let be the diagonal matrix whose th coordinate is , and let be the diagonal matrix whose th coordinate is . The matrix is -row stochastic and the matrix is -column stochastic. A simple modification of the alternate scaling algorithm produces an -stochastic Sinkhorn limit. It is an open problem to compute explicit Sinkhorn limits in the -stochastic setting. 8. (8)
It is a old problem in number theory to understand the continued fractions of the cube roots of integers, and, in particular, to understand the approximation of by rationals. One coordinate of the Sinkhorn limit of the matrix with is . The matrix with has rational coordinates, and so the matrices constructed by the alternate scaling algorithm also have rational coordinates, and generate explicit sequences of rational approximations to . The nature of these approximations remains mysterious.
12. Notes
The computational complexity of Sinkhorn’s alternate scaling algorithm is investigated in Kalantari and Khachiyan [12, 13], Kalantari, Lari, Ricca, and Simeone [14], Linial, Samorodnitsky and Wigderson [16] and Allen-Zhu, Li, Oliveira, and Wigderson [1]. An extension of matrix scaling to operator scaling began with Gurvits [8], and is developed in Garg, Gurvits, Oliveira, and Wigderson [6, 7], Gurvits [9], and Gurvits and Samorodnitsky [10]. Motivating some of this recent work are the classical papers of Edmonds [4] and Valient [23, 24].
The literature on matrix scaling is vast. See the recent survey paper of Idel [11]. For the early history of matrix scaling, see Allen-Zhu, Li, Oliveira, and Wigderson [1, Section 1.1].
Acknowledgements. The alternate scaling algorithm was discussed in several lectures in the New York Number Theory Seminar, and I thank the participants for their useful remarks. In particular, I thank David Newman for making the initial computations that suggested some of the problems considered in this paper. I also benefitted from a careful and thoughtful referee’s report.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Z. Allen-Zhu, Y. Li, R. Oliveira, and A. Wigderson, Much faster algorithms for matrix scaling , 58th Annual IEEE Symposium on Foundations of Computer Science—FOCS 2017, IEEE Computer Soc., Los Alamitos, CA, 2017, pp. 890–901.
- 2[2] R.A. Brualdi, S. V. Parter, and H. Schneider, The diagonal equivalence of a nonnegative matrix to a stochastic matrix , J. Math. Anal. Appl. 16 (1966), 31–50.
- 3[3] D. Ž. Djoković, Note on nonnegative matrices , Proc. Amer. Math. Soc. 25 (1970), 80–82.
- 4[4] J. Edmonds, Systems of distinct representatives and linear algebra , J. Res. Nat. Bur. Standards Sect. B 71B (1967), 241–245.
- 5[5] S. B. Ekhad and D. Zeilberger, Answers to some questions about explicit Sinkhorn limits posed by Mel Nathanson , ar Xiv:1902.10783, 2019.
- 6[6] A. Garg, L. Gurvits, R. Oliveira, and A. Wigderson, A deterministic polynomial time algorithm for non-commutative rational identity testing , 57th Annual IEEE Symposium on Foundations of Computer Science—FOCS 2016, IEEE Computer Soc., Los Alamitos, CA, 2016, pp. 109–117.
- 7[7] by same author, Algorithmic and optimization aspects of Brascamp-Lieb inequalities, via operator scaling , STOC’17—Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, ACM, New York, 2017, pp. 397–409.
- 8[8] L. Gurvits, Classical complexity and quantum entanglement , J. Comput. System Sci. 69 (2004), no. 3, 448–484.
