Matrix scaling, explicit Sinkhorn limits, and arithmetic
Melvyn B. Nathanson

TL;DR
This paper explores the convergence of matrix scaling to doubly stochastic matrices, providing explicit formulas for certain symmetric 3x3 matrices and connecting the results to diophantine approximation.
Contribution
It offers explicit formulas for Sinkhorn limits of specific symmetric 3x3 matrices and links matrix scaling to diophantine approximation problems.
Findings
Explicit formulas for Sinkhorn limits of symmetric 3x3 matrices.
Connections established between matrix scaling and diophantine approximation.
Analysis of convergence properties in matrix scaling processes.
Abstract
The process of alternately row scaling and column scaling a positive matrix converges to a doubly stochastic positive matrix , called the \emph{Sinkhorn limit} of . Exact formulae for the Sinkhorn limits of certain symmetric positive matrices are computed, and related problems in diophantine approximation are considered.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMarkov Chains and Monte Carlo Methods · Topological and Geometric Data Analysis · Random Matrices and Applications
Matrix scaling, explicit Sinkhorn limits, and arithmetic
Melvyn B. Nathanson
Department of Mathematics
Lehman College (CUNY)
Bronx, NY 10468
Abstract.
The process of alternately row scaling and column scaling a positive matrix converges to a doubly stochastic positive matrix , called the Sinkhorn limit of . Exact formulae for the Sinkhorn limits of certain symmetric positive matrices are computed, and related problems in diophantine approximation are considered.
Key words and phrases:
Matrix scaling, alternate minimization, Sinkhorn limits, diophantine approximation, Gröbner bases.
2010 Mathematics Subject Classification:
11C20, 11B75, 11J68, 11J70.
1. Doubly stochastic matrices and scaling
Let be an matrix. For , the th row sum of is
[TABLE]
For , the th column sum of is
[TABLE]
For example, the matrices
[TABLE]
have row and column sums equal to 1.
An matrix is diagonal if for all . Let denote the diagonal matrix whose th coordinate is for all . The diagonal matrix is positive diagonal if for all .
The process of multiplying the rows of a matrix by scalars, or, equivalently, multiplying on the left by a diagonal matrix , is called row-scaling, and is called a row-scaling matrix.
The process of multiplying the columns of a matrix by scalars, or, equivalently, multiplying on the right by a diagonal matrix , is called column-scaling, and is called a column-scaling matrix.
Let be an matrix. If and , then
[TABLE]
The matrix is positive if for all and , and nonnegative if for all and . The matrix is row stochastic if is nonnegative and for all . The matrix is column stochastic if is nonnegative and for all . The matrix is doubly stochastic if it is both row and column stochastic. For example, the matrices
[TABLE]
are doubly stochastic.
If the matrix is doubly stochastic, then
[TABLE]
and so is a square matrix.
Let be an matrix with positive row sums, that is, for all . Let denote the diagonal matrix whose th diagonal coordinate is , and let
[TABLE]
We have
[TABLE]
and so
[TABLE]
for all . Therefore, is a row stochastic matrix.
Similarly, let denote the diagonal matrix whose th diagonal coordinate is , and let
[TABLE]
We have
[TABLE]
and so
[TABLE]
for all . Therefore, is a column stochastic matrix.
For example, if
[TABLE]
then the matrix
[TABLE]
is row stochastic, and the matrix
[TABLE]
is column stochastic.
In this paper we study doubly stochastic matrices.
The following results (due to Sinkhorn [16], Knopp-Sinkhorn [17], Menon [14], Letac [12], Tverberg [18], and others) are classical.
Theorem 1**.**
Let be an matrix with for all .
- (i)
There exist positive diagonal matrices and such that is doubly stochastic. 2. (1)
If , , , and are positive diagonal matrices such that both and are doubly stochastic, then and there exists such that and .
The unique doubly stochastic matrix is called the Sinkhorn limit of A, and denoted . 3. (2)
Let A be a positive symmetric matrix. There exists a unique positive diagonal matrix X such that is doubly stochastic.
Theorem 2**.**
Let be the set of positive doubly stochastic matrices. Let (resp. ) be the set of positive -dimensional (resp. -dimensional) vectors. Consider
[TABLE]
as a subset of with the subspace topology. Consider the set of positive matrices as a subset of with the subspace topology. The function from to defined by
[TABLE]
is a homeomorphism.
Theorem 3**.**
Let A be a positive matrix. Construct sequences of positive matrices and and sequences of positive diagonal matrices and as follows: Let
[TABLE]
Given the matrix , let
[TABLE]
be the row-scaling matrix of , and let
[TABLE]
The matrix is row stochastic. Let
[TABLE]
be the column-scaling matrix of , and let
[TABLE]
The matrix is column stochastic. There exist positive diagonal matrices X and Y such that
[TABLE]
and the matrix
[TABLE]
is doubly stochastic.
This process of obtaining a doubly stochastic matrix from a positive matrix by row and column scaling is called alternate minimization.
It is an open problem to compute explicitly the Sinkhorn limit of a positive matrix. This is known for matrices (Nathanson [15]). In this paper we compute explicit Sinkhorn limits for certain symmetric matrices, and discuss connections with diophantine approximation.
2. Experimental data
Here are some computational results. Using Maple, we row scale and then column scale the matrix, iterate this process 20 times, and print the resulting matrix.
[TABLE]
[TABLE]
[TABLE]
[TABLE]
[TABLE]
In these calculations, the alternate minimization algorithm generates approximately doubly stochastic matrices of four different shapes:
[TABLE]
3. Permutation matrices
Let be the group of permutations of the set . For every , define the permutation matrix as follows:
[TABLE]
Equivalently,
[TABLE]
Thus,
[TABLE]
where is the Kronecker delta. The th row of is row of the identity matrix , and the th column of is column of .
For every matrix , the th row of the matrix is row of , and the th column of the matrix is column of . Thus, is a matrix constructed from by the -permutation of the rows of , and is a matrix constructed from by the -permutation of the columns of .
For example, if , then
[TABLE]
and
[TABLE]
Lemma 1**.**
For all permutations ,
[TABLE]
and
[TABLE]
Proof.
Let . Applying (1) with , we obtain
[TABLE]
This proves (2).
For the transpose of , we have
[TABLE]
This proves (3). ∎
For example, if and , then . We have
[TABLE]
and
[TABLE]
For with , let be the transposition defined by
[TABLE]
and
[TABLE]
Let be an matrix. The permutation matrix interchanges rows and of , as follows: For all and ,
[TABLE]
It follows that
[TABLE]
and so
[TABLE]
Let be a permutation in , and let be the corresponding permutation matrix. Every permutation is a product of transpositions, and so there is a sequence of transpositions such that
[TABLE]
and
[TABLE]
Applying identity (4) recursively, we obtain
[TABLE]
This proves that, for all permutations ,
[TABLE]
Similarly,
[TABLE]
[TABLE]
[TABLE]
For example, let
[TABLE]
Consider the permutation and its associated permutation matrix
[TABLE]
We have
[TABLE]
and
[TABLE]
Theorem 4**.**
Let be an matrix. If and are permutation matrices, then
[TABLE]
Proof.
It suffices to prove this for transpositions.
Interchanging two rows of a matrix and row scaling is the same as row scaling and then interchanging the rows.
Interchanging two rows of a matrix and column scaling is the same as column scaling and then interchanging the rows.
Interchanging two columns of a matrix and row scaling is the same as row scaling and then interchanging the columns.
Interchanging two columns of a matrix and column scaling is the same as column scaling and then interchanging the columns. ∎
Theorem 5**.**
Let be an positive matrix. For all permutation matrices and ,
[TABLE]
Proof.
Let be the alternate minimization sequence of matrices constructed from . For all , we have
[TABLE]
[TABLE]
and
[TABLE]
For every permutation matrix , we have
[TABLE]
[TABLE]
[TABLE]
Continuing inductively, we obtain
[TABLE]
for all , and so
[TABLE]
Similarly, for every permutation matrix , we have
[TABLE]
Therefore,
[TABLE]
This completes the proof. ∎
Theorem 6**.**
For every positive matrix ,
[TABLE]
Proof.
Let and be diagonal matrices such that
[TABLE]
We have , , and
[TABLE]
If is doubly stochastic, then is doubly stochastic. The uniqueness theorem implies that
[TABLE]
This completes the proof. ∎
Theorem 7**.**
Let . For every positive matrix ,
[TABLE]
and
[TABLE]
Proof.
Klar. ∎
Here is an example of permutation and dilation equivalence. Let
[TABLE]
Dilating by , we obtain
[TABLE]
Multiplying by the permutation matrices
[TABLE]
we obtain
[TABLE]
with . Equivalently,
[TABLE]
and
[TABLE]
Thus, the Sinkhorn limit of determines the Sinkhorn limit of A.
4. The matrix
Let , , and be positive integers such that . Let , , and be positive real numbers. Consider the symmetric matrix
[TABLE]
in which the first rows are equal to
[TABLE]
and the last rows are equal to
[TABLE]
Let be the unique positive diagonal matrix such that the alternate minimization limit is doubly stochastic. Thus, the matrix
[TABLE]
satisfies
[TABLE]
and
[TABLE]
It follows that for and for . Let and . Define the diagonal matrix
[TABLE]
We obtain
[TABLE]
where
[TABLE]
Because is row stochastic, we have
[TABLE]
and
[TABLE]
Equation (14) gives
[TABLE]
Inserting this into equation (15) and rearranging gives
[TABLE]
If , then
[TABLE]
and . Thus, is the doubly stochastic matrix with every coordinate equal to .
If , then (16) is a quadratic equation in . We obtain
[TABLE]
and
[TABLE]
Recall that and so . If , then
[TABLE]
If , then
[TABLE]
In both cases, we obtain
[TABLE]
We obtain from (12) and from (13).
Theorem 8**.**
The Sinkhorn limit of the matrix (9) is the doubly stochastic matrix defined by (10). The matrix depends only on the ratio .
Proof.
This follows immediately from (11), (12), and (13). ∎
For example, the matrices
[TABLE]
have the same Sinkhorn limit with .
Theorem 8 explains why, in Section 2, the matrices and have the same Sinkhorn limits.
Let be a sequence of matrices such that . Let
[TABLE]
We have
[TABLE]
and
[TABLE]
Similarly, let be a sequence of matrices such that . It follows from (11) that
[TABLE]
If , then
[TABLE]
If , then
[TABLE]
5. symmetric matrices and
their doubly stochastic shapes
Let and be positive matrices. We write if there exist permutation matrices and and such that
[TABLE]
It is straightforward to check that this is an equivalence relation. If , then
[TABLE]
Thus, it suffices to compute the Sinkhorn limit of only one matrix in an equivalence class.
The goal is to compute the Sinkhorn limit of every symmetric positive matrix whose set of coordinates consists of two distinct real numbers.
Let A be such a matrix with coordinates and . There are 9 coordinate positions in the matrix, and so exactly one of the numbers and occurs at least five times. Suppose that the coordinate occurs five or more times. Let and . The matrix has two distinct positive coordinates and , and occurs at most four times. There are seven equivalence classes of such matrices with respect to permutations and dilations. Here is the list, and, for each matrix, the shape of its Sinkhorn limit. Note that is a positive real number and .
- (1)
[TABLE] 2. (2)
[TABLE] 3. (3)
[TABLE] 4. (4)
[TABLE] 5. (5)
[TABLE] 6. (6)
[TABLE] 7. (7)
[TABLE]
6. The matrix
The matrix
[TABLE]
is the simplest. Just one row scaling or one column scaling produces the doubly stochastic matrix
[TABLE]
We have , where
[TABLE]
Moreover,
[TABLE]
7. The matrices , , and
These are matrices. The matrix
[TABLE]
is an matrix with , , , and .
The matrix
[TABLE]
is an matrix with , , , and . Both matrices satisfy , and so they have the same Sinkhorn limit
[TABLE]
with
[TABLE]
For example, if , then
[TABLE]
and
[TABLE]
both have limits with coordinates
[TABLE]
Moreover,
[TABLE]
The matrix
[TABLE]
is an matrix with , , , and . We have , and
[TABLE]
with
[TABLE]
For example, with , we have
[TABLE]
Moreover,
[TABLE]
8. The matrix
The construction of the Sinkhorn limit of the matrix
[TABLE]
requires only high school algebra. There exists a unique positive diagonal matrix such that is doubly stochastic. We have
[TABLE]
and so
[TABLE]
We have
[TABLE]
Rearranging, we obtain
[TABLE]
Note that . If , then . If , then
[TABLE]
and . Therefore, , and so
[TABLE]
[TABLE]
We obtain
[TABLE]
Equivalently,
[TABLE]
and so
[TABLE]
Eliminating from (21) and (22) gives
[TABLE]
The inequalities and imply
[TABLE]
and
[TABLE]
Thus,
[TABLE]
where
[TABLE]
For example, with , we obtain
[TABLE]
We have the asymptotic limit
[TABLE]
9. The matrix
The construction of the Sinkhorn limit of the matrix
[TABLE]
also requires only high school algebra. There exists a unique positive diagonal matrix such that
[TABLE]
is a doubly stochastic matrix, and so
[TABLE]
From (24), we obtain
[TABLE]
Inserting (27) into (25) gives
[TABLE]
Inserting (28) into (27) gives
[TABLE]
Inserting (28) and (29) into (26) and rearranging gives
[TABLE]
Equivalently,
[TABLE]
and so
[TABLE]
and
[TABLE]
Inserting this into (28) gives
[TABLE]
and then (27) gives
[TABLE]
Thus,
[TABLE]
and
[TABLE]
This determines the scaling matrix X. The Sinkhorn limit is the circulant matrix
[TABLE]
with
[TABLE]
The asymptotic limit is
[TABLE]
Let
[TABLE]
be the th matrix in the alternate minimization algorithm for the matrix (23). We have
[TABLE]
and so alternate minimization generates sequences of rational numbers that converges to .
For example, with , we obtain
[TABLE]
10. The matrix
Consider the symmetric matrix
[TABLE]
There exists a unique positive diagonal matrix such that
[TABLE]
is doubly stochastic. Therefore,
[TABLE]
Observe that equations (30) and (24) are identical, and that equations (31) and (25) are identical. Therefore,
[TABLE]
and
[TABLE]
Substituting (33) and (34) into the third equation gives a polynomial in one variable:
[TABLE]
By Sinkhorn’s theorem, this polynomial has at least one positive solution. If , then, by Descartes’s rule of signs, this polynomial has exactly two positive solutions. If , then this polynomial has two, four, or six positive solutions.
For example, let . Let be the unique positive diagonal matrix such that the matrix
[TABLE]
is doubly stochastic, and
[TABLE]
The number is a solution of the octic polynomial
[TABLE]
According to Maple, the unique solution of this polynomial in the interval is
[TABLE]
From equations (33) and (34), we obtain
[TABLE]
and
[TABLE]
We obtain
[TABLE]
This agrees with the calculation in Section 2.
Let . Let be the unique positive diagonal matrix such that the matrix
[TABLE]
is doubly stochastic, and
[TABLE]
The number is a solution of the octic polynomial
[TABLE]
According to Maple, the solutions of this polynomial in the interval are
[TABLE]
Choosing , we obtain from equations (33) and (34) the numbers
[TABLE]
and
[TABLE]
and so
[TABLE]
This agrees with the calculation in Section 2.
It is interesting to observe that if we choose the the second root of the polynomial (33), we obtain
[TABLE]
and
[TABLE]
For matrices of the form , we do not explicit formulae for the coordinates of the Sinkhorn limit as explict functions of . Computer calculations suggest that the asymptotic limit of as is
[TABLE]
11. Gröbner bases and algebraic numbers
I like solving problems using high school algebra. However, it is important to note that the previous calculations are also easily done using Gröbner bases.
Here is an example. Consider the matrix
[TABLE]
with and . There exist unique positive real numbers that satisfy the polynomial equations
[TABLE]
Equivalently, is the unique positive vector in that is in the affine variety , where is the ideal in generated by the polynomials
[TABLE]
Let . Using the Groebner package in Maple with the lexicographical order , we obtain the Gröbner basis
[TABLE]
Applying Maple with the lexicographical order , we obtain the Gröbner basis
[TABLE]
Applying Maple with the lexicographical order , we obtain the Gröbner basis
[TABLE]
Thus, , , and are algebraic numbers of degree at most 4, and we have explicit polynomial representations of each variable , , in terms of the others.
For arbitrary , applying Maple with the lexicographical order , we obtain the Gröbner basis
[TABLE]
For each of the 8 roots of ,the polynomials and determine unique numbers and . Exactly one of the triples will be positive.
For every positive symmetric matrix , the Sinkhorn limit with scaling matrix is the unique positive solution of a set of quadratic equations of the form
[TABLE]
Equivalently, is the unique positive vector in the affine variety of the ideal generated by . A Gröbner basis for this ideal shows that if the coordinates of the matrix are rational numbers, then are algebraic numbers of degrees bounded in terms of .
12. Diophantine approximation
Let be a an matrix with positive rational coordinates, and let be the least common multiple of the denominators of the coordinates of . The matrix has positive integral coordinates, and the matrix obtained by row scaling (or column scaling) is equal to the matrix obtained by row scaling (or column scaling) . Thus, the Sinkhorn limit obtained from the rational matrix equals the Sinkhorn limit obtained from the integral matrix . The sequence of matrices generated by alternate row and column scalings are rational matrices. If is the th matrix obtained in the alternate minimization algorithm, and if the Sinkhorn limit is , then
[TABLE]
for all . If the coordinate is irrational for some pair , then the alternate minimization cannot terminate in a finite number of steps. It is an open problem to the matrices for which the alternate minimization does terminate in a finite number of steps.
The Sinkhorn limit coordinates are algebraic numbers for all rational matrices A. If the coordinate is irrational for some and , then the alternate minimization algorithm constructs a sequence of rational approximations to . For example, alternate minimization provides a sequence (in fact, several sequences) of rational numbers that converge to for every positive integer . The matrix
[TABLE]
has Sinkhorn limit
[TABLE]
with
[TABLE]
If , then
[TABLE]
For example, for , we have
[TABLE]
Here are the rational numbers in the first six iterations of the Sinkhorn algorithm, and their decimal representations:
[TABLE]
where
[TABLE]
Note that
[TABLE]
The continued fraction for is For comparison, here are the first ten convergents of the continued fraction for :
[TABLE]
13. Rationality and finite length
For what positive matrices does the alternate minimization algorithm converge in finitely many steps? This problem has been solved for matrices (Nathanson [15]), but it is open for all dimensions . In dimension 3, matrices equivalent to become doubly stochastic in one step, that is, after one row or one column scaling. It is not know if there exists a positive matrix that becomes doubly stochastic in exactly two steps. More generally, it is not know if there exists a positive matrix that becomes doubly stochastic in exactly steps for some .
Consider the matrix with parameter . If is a rational number, then every matrix generated by iterated row and column scalings has rational coordinates. If the Sinkhorn limit contains an irrational coordinate, then the alternate minimization algorithm cannot terminate in finitely many steps.
If is an integer and , then the Sinkhorn limit has coordinates in the quadratic field . For example, from (17), the coordinate of is
[TABLE]
This number is rational if and only if the odd integer is the square of an odd integer, that is, if and only if for some positive integer and so is a triangular number. From (17), (18), and (19), we obtain
[TABLE]
Moreover, , where with and . Thus,
[TABLE]
For example, if , then and
[TABLE]
where
[TABLE]
Note that also has a scaling by rational matrices
[TABLE]
where
[TABLE]
It is not known if there exists a triangular number for which the alternate minimization algorithm terminates in a finite number of steps.
14. Open problems
- (1)
Compute explicit formulas for the Sinkhorn limits of all positive symmetric matrices. This is a central problem. 2. (2)
Here is a special case. Let and 1 be pairwise distinct positive numbers. Compute the Sinkhorn limits of the matrices
[TABLE] 3. (3)
For what positive matrices does the alternate minimization algorithm converge in finitely many steps? This is the problem discussed in the previous section. 4. (4)
It is not known what algebraic numbers appear as coordinates of the Sinkhorn limit of a positive integral matrix. It would be interesting to have an example of an algebraic number in the unit interval that is not a coordinate of the Sinkhorn limit of a rational matrix. 5. (5)
Does there exist a matrix such that is row stochastic but not column stochastic, and is doubly stochastic? 6. (6)
Does every possible shape of a doubly stochastic matrix appear as the nontrivial limit of some matrix? 7. (7)
Why does the shape of the Sinkhorn limit seem to depend only on the shape of the matrix and not on the numerical values of the coordinates of ? 8. (8)
What does the Sinkhorn limit tell us about the matrix ? What information does it convey? 9. (9)
The matrix is positive if for all and . The matrix is nonnegative if for all and .
Let A be a nonnegative matrix. Let and let . The matrix A is -row stochastic if for all . The matrix A is -column stochastic if for all . The matrix is -stochastic if it is both -row stochastic and -column stochastic. Note that if A is -stochastic, then
[TABLE]
Let A be a positive matrix. Let be the diagonal matrix whose th coordinate is , and let be the diagonal matrix whose th coordinate is . The matrix is -row stochastic and the matrix is -column stochastic.
A simple modification of the alternate minimization algorithm applied to a positive matrix satisfying (36) produces an -stochastic Sinkhorn limit. It is an open problem to compute explicit Sinkhorn limits in the -stochastic setting.
15. Notes
In his 1964 paper, Richard Sinkhorn [16, p.877] wrote:
The iterative process of alternately normalizing the rows and columns of a strictly positive matrix is convergent to a strictly positive doubly stochastic matrix.
Sinkhorn did not prove this result. The proof of convergence of the alternate minimization algorithm appears in Knopp and Sinkhorn [17], and in Letac [12]. Geometric existence proofs of exact scaling appear in Menon [14], and in Tverberg [18].
The computational complexity of Sinkhorn’s alternate scaling algorithm is investigated in Kalantari and Khachiyan [9, 10], Kalantari, Lari, Ricca, and Simeone [11], Linial, Samorodnitsky and Wigderson [13] and Allen-Zhu, Li, Oliveira, and Wigderson [1]. An extension of matrix scaling to operator scaling began with Gurvits [5], and is developed in Garg, Gurvits, Oliveira, and Wigderson [3, 4], Gurvits [6], and Gurvits and Samorodnitsky [7]. Motivating some of this recent work are the classical papers of Edmonds [2] and Valient [19, 20].
The literature on matrix scaling is vast. See the recent survey paper of Idel [8]. For the early history of matrix scaling, see Allen-Zhu, Li, Oliveira, and Wigderson [1, Section 1.1].
Acknowledgements. The alternate minimization algorithm was discussed in several lectures in the New York Number Theory Seminar, and I thank the participants for their useful remarks. In particular, I thank David Newman for making the initial computations that suggested some of the problems considered in this paper.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Z. Allen-Zhu, Y. Li, R. Oliveira, and A. Wigderson, Much faster algorithms for matrix scaling , 58th Annual IEEE Symposium on Foundations of Computer Science—FOCS 2017, IEEE Computer Soc., Los Alamitos, CA, 2017, pp. 890–901.
- 2[2] J. Edmonds, Systems of distinct representatives and linear algebra , J. Res. Nat. Bur. Standards Sect. B 71B (1967), 241–245.
- 3[3] A. Garg, L. Gurvits, R. Oliveira, and A. Wigderson, A deterministic polynomial time algorithm for non-commutative rational identity testing , 57th Annual IEEE Symposium on Foundations of Computer Science—FOCS 2016, IEEE Computer Soc., Los Alamitos, CA, 2016, pp. 109–117.
- 4[4] by same author, Algorithmic and optimization aspects of Brascamp-Lieb inequalities, via operator scaling , STOC’17—Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, ACM, New York, 2017, pp. 397–409.
- 5[5] L. Gurvits, Classical complexity and quantum entanglement , J. Comput. System Sci. 69 (2004), no. 3, 448–484.
- 6[6] by same author, Boolean matrices with prescribed row/column sums and stable homogeneous polynomials: combinatorial and algorithmic applications , Inform. and Comput. 240 (2015), 42–55.
- 7[7] L. Gurvits and A. Samorodnitsky, Bounds on the permanent and some applications , 55th Annual IEEE Symposium on Foundations of Computer Science—FOCS 2014, IEEE Computer Soc., Los Alamitos, CA, 2014, pp. 90–99.
- 8[8] M. Idel, A review of matrix scaling and Sinkhorn’s normal form for matrices and positive maps , ar Xiv:1609.06349, 2016.
