Lower Bounds for Matrix Factorization
Mrinal Kumar, Ben Lee Volk

TL;DR
This paper constructs explicit families of matrices that cannot be factored into a small number of sparse matrices, establishing stronger lower bounds for matrix factorization and linear circuit complexity.
Contribution
It provides the first subexponential-time deterministic construction of matrices with high sparsity lower bounds for fixed depth circuits, improving previous super-linear bounds.
Findings
Constructed matrices with lower bounds of n^{1+1/(2d)} for depth-d circuits.
Improved lower bounds over previous super-linear results.
Outlined a derandomization approach for stronger bounds.
Abstract
We study the problem of constructing explicit families of matrices which cannot be expressed as a product of a few sparse matrices. In addition to being a natural mathematical question on its own, this problem appears in various incarnations in computer science; the most significant being in the context of lower bounds for algebraic circuits which compute linear transformations, matrix rigidity and data structure lower bounds. We first show, for every constant , a deterministic construction in subexponential time of a family of matrices which cannot be expressed as a product where the total sparsity of is less than . In other words, any depth- linear circuit computing the linear transformation has size at least . This improves upon the prior best lower bounds for thisā¦
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Lower Bounds for Matrix Factorization
Mrinal Kumar [email protected]. Department of Computer Science, University of Toronto, Canada. A part of this work was done during the semester on Lower Bounds in Computational Complexity at Simons Institute for the Theory of Computing, Berkeley, USA.
āā
Ben Lee Volk [email protected]. Center for the Mathematics of Information, California Institute of Technology, USA.
Abstract
We study the problem of constructing explicit families of matrices which cannot be expressed as a product of a few sparse matrices. In addition to being a natural mathematical question on its own, this problem appears in various incarnations in computer science; the most significant being in the context of lower bounds for algebraic circuits which compute linear transformations, matrix rigidity and data structure lower bounds.
We first show, for every constant , a deterministic construction in subexponential time of a family of matrices which cannot be expressed as a product where the total sparsity of is less than . In other words, any depth- linear circuit computing the linear transformation has size at least . This improves upon the prior best lower bounds for this problem, which are barely super-linear, and were obtained by a long line of research based on the study of super-concentrators (albeit at the cost of a blow up in the time required to construct these matrices).
We then outline an approach for proving improved lower bounds through a certain derandomization problem, and use this approach to prove asymptotically optimal quadratic lower bounds for natural special cases, which generalize many of the common matrix decompositions.
1 Introduction
This work concerns the following (informally stated) very natural problem:
Open Problem 1**.**
Exhibit an explicit matrix , such that cannot be written as , where and are sparse matrices.
Before bothering ourselves with the precise meaning of the words āexplicitā and āsparseā in the above problem, we discuss the various contexts in which this problem presents itself.
1.1 Linear circuits and matrix factorization
Algebraic complexity theory studies the complexity of computing polynomials using arithmetic operations: addition, subtraction, multiplication and division. An algebraic circuit over a field is an acyclic directed graph whose vertices of in-degree 0, also called inputs, are labeled by indetermeinates or field element from , and every internal node is labeled with an arithmetic operation. The circuit computes rational functions in the natural way, and the polynomials (or rational functions) computed by the circuit are those computed by its vertices of out-degree 0, called the outputs. This framework is general enough to encompass virtually all the known algorithms for algebraic computational problems. The size of the circuit is defined to be the number of edges in it. For a more detailed background on algebraic circuits, see [SY10].
Perhaps the simplest non-trivial class of of polynomials is the class of linear (or affine) functions. Accordingly, such polynomials can be computed by a very simple class of circuits called linear circuits: these are algebraic circuits which are only allowed to use addition and multiplication by a scalar. It is often convenient to consider graphs with labels on the edges as well: every internal node is an addition gate, and for , an edged labeled from a vertex to a vertex denotes that the output of is multiplied by when feeding into . Thus, every node computes a linear combination of its inputs.
It is not hard to show that any arithmetic circuit for computing a set of linear functions can be converted into a linear circuit with only a constant blow-up in size (see [BCS97], Theorem 13.1; eliminating division gates requires that the field in question is large enough. In this paper we will always makes this assumption when needed).
Clearly, every set of linear functions on variables (represented by a matrix ) can be computed by a linear circuit of size . Using counting arguments (over finite fields) or dimension arguments (over infinite fields), it can be shown that for a random or generic matrix this upper bound is fairly tight. Thus, a central open problem in algebraic complexity theory is to prove any super-linear lower bound for an explicit family of matrices where . The standard notion of explicitness in complexity theory is that there is a deterministic algorithm that outputs the matrix in time, although more or less stringent definitions can be considered as well.
Despite decades of research and partial results, such lower bounds are not known.111We remark that super-linear lower bounds for general arithmetic circuits are known, but for polynomials of high degree [Str73, BS83]. In order to gain insight into the general model of computation, research has focused on limited models of linear circuits, such as monotone circuits, circuits with bounded coefficients, or bounded depth circuits. We defer a more thorough discussion on previous work to SectionĀ 1.5, and proceed to describe bounded depth circuits, which are the focus of this work.
The depth of a circuit is the length (in edges) of a longest path from an input to an output. Constant depth circuits appear to be a particularly weak model of computation. However, even this model is surprisingly powerful (see also SectionĀ 1.2).
The āeasiestā non-trivial model is the model of depth-2 linear circuits. A depth 2 linear circuit computing a linear transformation consists of a bottom layer of input gates, a middle layer of gates, and a top layer of output gates. We assume, without loss of generality, that the circuit is layered, in the sense that every edge goes either from the bottom to the middle layer, or from the middle to the top layer. Indeed, every edge going directly from the bottom to the top layer can be replaced by a path of length 2; this transformation increases the size of the circuit by at most a factor of 2.
By letting be the adjacency matrix of the (labeled) subgraph between the bottom and the middle layer, and be the adjacency matrix as the subgraph between the bottom and the top layer, it is clear that . Thus, a decomposition of into the product of two sparse matrices is equivalent to saying that has a small depth-2 linear circuit. This argument can be generalized, in exactly the same way, to depth- circuits and decompositions of the form , for constant .
Weak super-linear lower bounds are known for constant depth linear circuits. They are based on the following observation, due to Valiant [Val75]: for subsets of size , let denote the submatrix of indexed by rows in and columns in . If has rank , the minimal vertex cut in the subcircuit restricted to input from and outputs from is of size at least : indeed, a smaller cut corresponds to a factorization for and for , contradicting the rank assumption. Using Mengerās theorem, it is now possible to deduce that if is a matrix such that for every as above the matrix is non-singular, then the circuit computing contains, for every subcircuit which corresponds to such , at least vertex disjoint paths from to . Such graphs were named superconcentrators by Valiant, and their minimal size was extensively studied [Val75, Pip77, Pip82, DDPW83, Pud94, AP94, RT00].
Superconcentrators of logarithmic depth and linear size do exist, so while this approach cannot show lower bounds for circuits of logarithmic depth, it is possible to show that for constant , any depth- superconcentrator has size at least , where is a function that unfortunately grows very slowly with . For example, , , , and so on. Such lower bounds apply for any matrix whose minors of all orders are non-zero, e.g., a Cauchy matrix given by for any distinct . Over finite fields it is possible to to modify the proof and obtain a similar lower bounds for matrices defining good error correcting codes [GHK*+*13].
These lower bounds on the size of superconcentrators are tight: for every , there exists a super-concentrator of depth and size . It is thus impossible to improve the lower bounds only using this technique.
1.2 Matrix rigidity
A demonstration of the surprising power of depth-2 circuits can be seen using the notion of matrix rigidity, a pseudorandom property of matrices which we now recall. A matrix is rigid if cannot be written as a sum where is a matrix of rank , and is a matrix with at most non-zero entries. Valiant [Val77] famously proved that if is computed by a linear circuit with bounded fan-in of depth and size , then is not rigid for every .222In fact, one can obtain slightly better parameters. See, for example, [Val77] or [DGW18]. It follows that an explicit construction matrix, for some , will imply a super-linear lower bound for linear circuits of depth . PudlƔk [Pud94] observed that similar rigidity parameters will imply even stronger lower bounds for constant depth circuits. A random matrix (over infinite fields) is -rigid, but the best explicit constructions have rigidity [Fri93, SSS97], which is insufficient for proving lower bounds.
Observe that a decomposition where and is -sparse corresponds to a depth- circuit with a very special structure and with at most edges (this circuit is not layered, but as we explained above, this does not make a significant difference). In particular, one way of interpreting Valiantās result is as a non-trivial depth reduction from depth to depth 2, so that proving any depth-2 lower bound for an explicit matrix, will imply a lower bound for depth .333We note that this statement makes sense only over large fields, as over fixed finite fields, it is always possible to prove an upper bound of on the depth-2 complexity of any matrix [JS13]. This does not contradict the fact that rigid matrices exist over finite fields ā a decomposition to is a very special type of depth- circuit. This can be seen as the linear circuit analog of similar strong depth reduction theorems for general algebraic circuits [AV08, Koi12, Tav15, GKKS16].
However, we would like to argue that proving lower bounds for depth-2 circuits is in fact necessary for proving rigidity lower bounds, by observing that upper bounds on the depth-2 complexity of give upper bounds on its rigidity parameters. Indeed, suppose can be computed by a depth-2 circuit of size . Let be as before the number of columns of (which equals the number of rows of ), and note that we may assume , as zero columns of or zero rows of can be omitted. For , let denote the -th column of , and the -th row of , so that . Fix a constant , and say is dense if either or has more than non-zero entries; otherwise, is sparse. Since can have at most columns with sparsity of more than , and similarly for the rows of , the number of dense -s is at most . It follows that
[TABLE]
The first sum is a matrix of rank at most , and the second is a matrix whose sparsity is at most . Thus, proving rigidity lower bounds of the type required to carry out Valiantās approach necessarily means proving lower bounds of the form āā on the depth-2 complexity of (we remark that the argument above is very similar to the aforementioned result of PudlĆ”k [Pud94]; PudlĆ”kās argument is stated in a slightly different language and in greater generality). Since proving rigidity lower bounds is a long-standing open problem, we view the problem of proving an lower bound for depth-2 circuits as an important milestone towards this.
1.3 Data structure lower bounds
The problem of matrix factorization into sparse matrices also appears in the context of proving lower bounds for data structures. A dynamic data structure with inputs and queries is a pair of algorithms whose purpose is to update and retrieve certain data under a sequence of operations, while minimizing the memory access. In the group model, it is given by a pair of algorithms. The update algorithm is represented by a matrix . Given , thought of as assignment of weights to the inputs, computes a linear combination of those weights and stores them in memory. The query algorithm is given by a matrix . Given a query, it computes a linear function of the memory cells, and returns the answer. Hence, an āupdateā operation followed by a āretrieveā operation computes the linear transformation given by .
The worst case update time of the database is the maximal number of non-zero elements in a column of , and the worst case query time is the maximal number of non-zero elements in a row of . The value denotes the space required by the data structure. It now directly follows that a matrix which cannot be factored as for a row-sparse and column-sparse gives a data structure problem with a lower bound on its worst case query or update time. It is also possible to define an analogous average case notion. Lower bounds for this model were proved by [Fre82, FS89, PD06, PĒt07, Lar12, Lar14, LWY18], but none of these results beats the lower bounds for depth-2 circuits obtained using superconcentrators.
A related model is that of a static data structures, which is again given by a factorization , where now we are interested in trade-offs between the space of the data structure and its worst case query time, while not being charged for the total sparsity of . A recent work of Dvir, Golovnev and Weinstein [DGW18] showed that proving lower bounds for this model is related to the problem of matrix rigidity from SectionĀ 1.2.
Despite the overall similarity, there are several key technical differences between the linear circuit complexity and the data structure problems. The first and obvious issue is that worst-case lower bounds on the update or query time do not necessarily imply that or are dense matrices: the total sparsity of and is related to the average-case update and query time. The second, more severe issue, is that in many applications the number of queries is polynomially larger than , while the lower bounds on running time are still measured as functions of the number of inputs . This makes sense in the data structure settings, but from a circuit complexity point of view, a set of say linear functions trivially requires a circuit of size , and thus a lower bound of say is meaningless in that setting.
This issue also comes up when studying the so-called succinct space setting, where we require . The lower bounds we are aware of for this setting are worst case lower bounds, and require the number of outputs to be at least for some [GM07, DGW18], so that in the corresponding circuit the number of vertices in the middle layer is required to be much smaller than the number of outputs, which may be considered quite unnatural. In particular, we are unaware of any improved lower bounds on the sparsity of matrix factorization for when or even which come from the data structure lower bounds literature.
1.4 Machine learning
We briefly remark that the problem of factorizing a matrix into a product of two or more sparse matrices is also ubiquitous in machine learning and related areas. Naturally, research in those areas did not focus on lower bounds but rather on algorithms for finding such a representation, assuming it exists, sometimes heuristically, and it is usually enough to approximate the target matrix . In particular, algorithms have been proposed for the very related problems of non-negative matrix factorization [LS00]444It is interesting to observe that for the problem of factorizing matrices into non-negative matrices it is quite easy to prove almost-optimal lower bounds even for unbounded depth linear circuits, as mentioned in SectionĀ 1.5 or sparse dictionary learning [MBPS09], and there are also connections to the analysis of deep neural networks [NP13].
1.5 Previous work
As mentioned in SectionĀ 1.1, there are no non-trivial known lower bounds for general linear circuits, and for bounded depth circuits, the best lower bounds follow from the lower bounds on bounded depth super-concentrators, which are barely super-linear.
Shoup and Smolensky [SS96] give a lower bound of for depth- circuits computing a certain linear transformation given by a matrix . Unfortunately, the matrices for which their lower bound holds are not explicit from the complexity theoretic point of view, despite having a very succinct mathematical description (for example, one can take for distinct prime numbers ). For the same matrix, they in fact prove super-linear lower bounds for circuits of depth up to .
Quite informally, the intuition behind their lower bounds is that all small bounded depth linear circuits can be described as lying in the image of a low-degree polynomial map in a small number of variables, and thus, if the elements of are sufficiently āalgebraically richā, for a certain specific measure, cannot be computed by such a circuit. This same philosophy lies behind Razās elusive function approach for proving lower bounds for algebraic circuits [Raz10]. In particular, among other results, Raz uses an argument which can be seen as a modification of the technique of Shoup and Smolensky (as worked out in [SY10]) to prove lower bounds for bounded depth algebraic circuits computing bounded degree polynomials.
One class of linear circuits which has attracted significant attention is the class of circuits with bounded coefficients. Here, the circuit is only allowed to multiply by scalars with absolute value of at most some constant. For definiteness, we may assume this constant is 1 (this does not affect the complexity by more than a constant factor). The earliest result for this model is Morgensternās ingenious proof [Mor73] of an lower bound on bounded coefficient circuits computing the discrete Fourier transform matrix (this lower bound is matched by the upper bound given by the Cooley-Tukey FFT algorithm, which is a bounded coefficient linear circuit). For depth- circuits, PudlĆ”k [Pud00] has proved lower bounds of the form for the same matrix.
Another natural subclass which was considered in earlier works is the class of monotone linear circuits. These are circuits which are defined over , and can only use non-negative scalars. Chazelle [Cha01] observed that it is possible to prove lower bounds in this model, even against unbounded-depth circuits, for any boolean matrix with no large monochromatic rectangle. Instantiated with the recent explicit constructions of bipartite Ramsey graphs [CZ16, BDT17, Coh17, Li18], this gives an almost optimal lower bound against such circuits. The main observation in the proof is that if does not have monochromatic rectangle, then since the model is monotone and no cancellations are allowed, every internal node which computes a linear function supported on at least variables cannot be connected to more than output gates.
For a more detailed survey on these results and some other related results, see the survey by Lokam [Lok09].
1.6 Our results
In this paper, we prove several results regarding bounded depth linear circuits which we now discuss.
Lower bounds for depth- linear circuits.
We start by considering general depth- circuits. We construct, in subexponential time, matrices which require depth- circuits of size .
Theorem 1.1**.**
Let be a field. There exists a family of matrices , which can be constructed in time , such that every depth- linear circuit computing , even over the algebraic closure of , has size at least .
If , the entries of are integers of bit complexity . If is a finite field, the entries of are elements of an extension of of degree .
This theorem is proved in Section 2. We remark again that the best lower bounds against general depth- linear circuits for matrices that can be constructed in polynomial time are barely super-linear and much weaker than . In the recent work of Dvir, Golovnev and Weinstein [DGW18] it was pointed out that currently there are not even known constructions of rigid matrices (with parameters that would imply lower bounds) in classes such as . By arguing directly about circuit size, and not about rigidity, Theorem 1.1 gives constructions of matrices in a much smaller complexity class, which have the same bounded-depth complexity lower bounds as would follow from optimal constructions of rigid matrices using the results of PudlÔk [Pud94].
While the statement in TheoremĀ 1.1 holds for any , for there is a much simpler construction of a hard family of matrices in quasi-polynomial time.
Theorem 1.2**.**
Let be any field and be any positive constant. Then, there is a family of matrices which can be constructed in time such that any depth- linear circuit computing even over the algebraic closure of has size at least .
For every constant , this theorem already improves upon the current best lower bound of known for this problem (seeĀ [RT00]). This construction is based on an exponential time construction of a small hard matrix, and then amplifying its hardness using a direct sum construction (note, however, that over infinite fields even the fact that a hard matrix can be constructed in exponential time, while not very hard to prove, is not completely obvious). For completeness, we describe this simple construction inĀ SectionĀ 2.7.
Lower bounds for restricted depth- linear circuits.
Given the importance of the model of depth-2 linear circuits, as explained above, and its resistance to strong lower bounds, we then move on to consider several natural subclasses of depth-2 circuits. These classes in particular correspond to almost all common matrix decompositions. We are able to prove asymptotically optimal lower bounds for these restricted models. As mentioned above, such lower bounds for general depth-2 circuits will imply super-linear lower bounds for logarithmic depth linear circuits, thus resolving a major open problem.
Symmetric circuits.
A symmetric depth-2 circuit (over ) is a circuit of the form for some (considered as a graph, the subgraph between the middle and the top layer is the āmirror imageā of the subgraph between the bottom and middle layer). Over , one should take the conjugate transpose instead of .
Symmetric circuits are a natural computational model for computing positive semi-definite (PSD) matrix. Clearly, every symmetric circuit computes a PSD matrix, and every PSD matrix has a (non-unique) symmetric circuit. In particular, a Cholesky decomposition of PSD matrices corresponds to a computation by a symmetric circuit (of a very special form).
We prove asymptotically optimal lower bounds for this model.
Theorem 1.3**.**
There exists an explicit family of real PSD matrices such that every symmetric circuit computing (over or ) has size .
We do not know whether every depth-2 linear circuit for a PSD matrix can be converted to a symmetric circuit with a small blow-up in size. One way to phrase this question is given below.
Question 1.4**.**
Is there a constant , such that every PSD matrix which can be computed by a linear circuit of size , can be computed by a symmetric circuit of size ?
A positive answer for 1.4 will imply, using TheoremĀ 1.3, an lower bound for depth-2 linear circuits.
Invertible circuits.
Invertible circuits are circuits of the form , where either or are invertible (but not necessarily both). We stress that invertible circuits can (and do) compute non-invertible matrices. In particular, if and , here we require .
Invertible circuits generalize many of the common matrix decompositions, such as QR decomposition, eigendecomposition, singular value decomposition555A diagonal matrix can be multiplied with the matrix to its left or to its right, without increasing the sparsity, to obtain an invertible depth- circuit. and LUP decomposition (in the case where the matrix is required to be unit lower triangular).666The sparsity of equals the sparsity of , as simply permutes the columns of , so every decomposition corresponds to the invertible depth- circuit given by .
We prove optimal lower bounds for invertible circuits.
Theorem 1.5**.**
Let be a large enough field. There exists an explicit family of matrices over such that every invertible circuit computing has size .
If is an invertible matrix, then clearly every depth- circuit with must be an invertible circuit. However, our technique for proving TheoremĀ 1.5 crucially requires the hard matrix to be non-invertible.
1.7 Proof Overview
Our proofs rely on a few different ideas coming from algebraic complexity theory, coding theory, arithmetic combinatorics and the theory of derandomization. We now discuss some of the key aspects.
Shoup-Smolensky dimension.
For the proof of TheoremĀ 1.1, we rely on the notion of Shoup-Smolensky dimension as a measure of complexity of matrices. Shoup-Smolensky dimensions are a family of measures, parametrized by , of āalgebraic richnessā of the entries of a matrix (see 2.1 for details), which is supposed to capture the intuition that matrices with small circuits should depend on a few āparametersā and thus should not posses much richness.
Shoup and SmolenskyĀ [SS96] showed that for an appropriate choice of parameters, this measure is non-trivially small for linear transformations with small linear circuits of depth at most . Informally, as the order gets larger, this measure becomes useful against stronger models of computation; however, it also becomes harder to construct matrices which have a large complexity with respect to this measure (and hence cannot be computed by a small linear circuit). Shoup and Smolensky do this by constructing hard matrices which do not have small bit complexity (and hence this construction is not complexity theoretically explicit) but do have short and succinct mathematical description.
For our proof, we first observe that for bounded depth circuits it suffices to use much smaller order than what Shoup and Smolensky used. This observation was also made by Raz [Raz10] in a similar context, but in a different language.
We then use this observation to āderandomizeā, in a certain sense, an exponential time construction of a hard matrix, by giving deterministic constructions of matrices with large Shoup-Smolensky dimension.
A key ingredient of our proof is a connection between the notion of Sidon Sets in arithmetic combinatorics and Shoup-Smolensky dimension (seeĀ SectionĀ 2.4 for details). Our construction is in two steps. In the first step we construct matrices with entries in which have a large Shoup-Smolensky dimension over , and degree of every entry is not too large. In the next step, we go from these univariate matrices to a matrix with entries in an appropriate low degree extension of while still maintaining the Shoup-Smolensky dimension over . Our construction of hard matrices over the field of complex numbers is based on similar ideas but differs in some minor details.
Lower bounds via Polynomial Identity Testing.
Our proofs for TheoremĀ 1.3 and TheoremĀ 1.5 are based on a derandomization argument. Connections between derandomization and lower bounds are prevalent in algebraic and Boolean complexity, but in our current setting they have not been widely studied before.
We say that a set of matrices is a hitting set for a class of matrices if for every non-zero there is such that .
Every class has a hitting set of size , namely the indicator matrices of each of the entries. A hitting set is non-trivial if its size is at most . Observe that a non-trivial hitting set for gives an efficient algorithm for finding a matrix , by finding a non-zero such that for every . Such an exists and can be found in polynomial time because the set imposes at most homogeneous linear constraints on the entries of . This argument is a special case of a more general theorem showing how efficient algorithms for black box polynomial identity testing give lower bounds for algebraic circuits [Agr05, HS80].
In practice, it is often convenient (although by no means necessary) to consider hitting sets that contain only rank 1 matrices , since , and thus we find ourselves in the more familiar territory of polynomial identity testing, trying to construct a hitting set for the class of polynomials of the form for . This approach was also taken by Forbes and Shpilka [FS12], who considered this exact problem where is the class of low-rank matrices, and remarked that hitting sets for the class of low-rank matrices plus sparse matrices will give an explicit construction of a rigid matrix.
We carry out this idea for two different classes in the proofs of TheoremĀ 1.3 and TheoremĀ 1.5. However, the following problem remains open.
Open Problem 2**.**
For some , construct an explicit hitting set of size at most for the class of matrices which can be written as where have at most non-zero entries.
A solution to 2 will imply lower bounds of the form for an explicit matrix. If , this will imply lower bounds for logarithmic depth linear circuits.
A useful ingredient in our constructions is the use of maximum distance separable (MDS) codes (for example, Reed-Solomon codes), as their dual subspace is a small dimensional subspace which does not contain sparse non-zero vectors. Over the reals, it is also easy to give such construction based on the well known Descartesā rule of signs which says that a sparse univariate real polynomial cannot have too many real roots. We refer the reader toĀ SectionĀ 3.1 for details.
2 Lower bounds for constant depth linear circuits
In this section, we proveĀ TheoremĀ 1.1. We start by describing the notion of Shoup-Smolensky dimension, but first we set up some notation.
2.1 Notation
We work with matrices whose entries lie in an appropriate extension of a base finite field . We follow the natural convention that the elements of this extension will be represented as univariate polynomials of appropriate degree over the base field, and the arithmetic is done modulo an explicitly given irreducible polynomial.
We use boldface letters () to denote vectors. The length of the vectors is understood from the context.
For a matrix , denotes the number of non-zero entries in .
2.2 Shoup-Smolensky Dimension
A useful concept will be the notion of Shoup-Smolensky dimension of subsets of elements of an extension of a field .
Definition 2.1** (Shoup-Smolensky dimension).**
Let be a field, and be an extension field of . Let be a matrix. For , denote by the set of -wise products of distinct entries of that is,
[TABLE]
The Shoup-Smolensky dimension of of order , denoted by is defined to be the dimension, over , of the vector space spanned by .
We also denote by the number of distinct elements of that can be obtained by summing distinct elements of .
2.3 Upper bounding the Shoup-Smolensky dimension for Sparse Products
The following lemma shows that any matrix computable by a depth- linear circuit of size at most has a somewhat small Shoup-Smolensky dimension.
Lemma 2.2**.**
Let be a field, an extension of and be a matrix such that for , where . Then, for every such that it holds that
[TABLE]
Proof.
Since
[TABLE]
every element in is a sum of monomials of degree in the entries of , that is,
[TABLE]
with the right hand side being the number of monomials of degree in variables. Using the inequality ,
[TABLE]
Over , we do not wish to use field extensions (which would give rise to elements with infinite bit complexity). Thus, we use a similar argument that replaces the measure with (recall 2.1) for a small tolerable penalty.
Lemma 2.3**.**
Let be a positive integer. Let be a matrix such that for , where . Assume that for each , and . Then, for every such that it holds that
[TABLE]
Proof.
We follow the same steps as in the proof ofĀ 2.2, replacing the measure by . As before,
[TABLE]
Every element in can be written as
[TABLE]
where is the set of monomials of degree in the entries of , and each is a non-negative integer of of absolute value at most (since and is ). It now follows that each element in has the same form as in (2.4), with . We conclude that
[TABLE]
which implies the statement of the lemma using the same bounds on binomial coefficients as in 2.2. ā
We now move on to describe constructions of matrices which have large Shoup-Smolensky dimension, and then deduce lower bounds for them.
2.4 Sidon sets and hard univariate matrices
In this section, we describe a construction of a matrix which has a large value of . Let us denote for some non-negative integer . For to have a large Shoup-Smolensky dimension of order , the set should have the property that has size comparable to . A set such that every subset of size of has a distinct sum is called a -wise Sidon set. These are very well studied objects in arithmetic combinatorics, and explicit constructions are known for them in time (e.g., Lemma 60 inĀ [Bsh14]). However, another important parameter in the construction is the degree of , and such a set will inevitably contain integers of size roughly . Thus, the construction of would take time which is not polynomially bounded in . Below we give an elementary construction of such a set in time (cf.Ā [AGKS15]).
Lemma 2.5**.**
Let be a positive integer. There is a set of size such that:
* has size .* 2. 2.
. 3. 3.
* can be constructed in time .*
Proof.
Let . Clearly, every subset of has a distinct sum. For a prime we denote , and we claim that there exists a prime such that . Since this condition can be checked in time , this would immediately imply the statement of the lemma, by checking this condition for every and letting for a which satisfies this condition.
For every subset of size , let denote the sum of its elements, and observe that . Clearly, if and only if , so it is enough to show that there exists which does not divide
[TABLE]
and therefore does not divide any of the terms on the right hand size. It further holds that , so the existence of now follows from the fact that can have at most distinct prime divisors, and from the prime number theorem. ā
Given the above construction of -wise Sidon sets, we now describe the construction of matrices with univariate polynomial entries which has large Shoup-Smolensky dimension.
Construction 2.6**.**
Let be a -wise Sidon set of positive integers, as in 2.5. Then, the matrix is defined as follows as .
The useful properties of 2.6 are given by the following lemma.
Lemma 2.7**.**
Let be a parameter, be a -wise Sidon set of size and let be the matrix defined inĀ 2.6. Then, the following are true.
Every entry of is a monomial of degree at most . 2. 2.
.
Proof.
The first item follows from the definition of and the properties of the set inĀ 2.5. The second item also follows from the properties of and the definition of Shoup-Smolensky dimension, since every -wise product of elements of gives a distinct monomial in , and thus they are all linearly independent over the base field . ā
2.5 Hard matrices over finite fields
From the univariate matrix in 2.6, we now construct, for every and parameter , a matrix over an extension of which has large Shoup-Smolensky dimension over with the same parameters as .
Lemma 2.8**.**
Let be a prime, and be any positive integer. There is a matrix over an extension of of degree , which can be deterministically constructed in time , and satisfies
[TABLE]
Proof.
Let be as in 2.6, and let be the maximum degree of any entry of . Set . We use Shoupās algorithm (see Theorem 3.2 inĀ [Sho90]) to construct an irreducible polynomial of degree over in deterministic time. Let be a root of in an extension of , where .777We identify the elements of with coefficient vectors of polynomials of degree at most in , and in this representation is identified with the polynomial . Then, it follows that are linearly independent over .
The matrix is obtained from by just replacing every occurrence of the variable by . We now need to argue that continues to satisfy . By the choice of , it immediately follows that , since every monomial in the set is mapped to a distinct power of in , which are all linearly independent over .
The upper bound on the running time needed to construction now follows from the upper bound on the degree of the extension , and from 2.5. ā
The following theorem now directly follows.
Theorem 2.9**.**
Let be any prime and be a positive integer. Then, there exists a family of matrices which can be constructed in time such that every depth- linear circuit computing has size at least . Moreover, the entries of lie in an extension of of degree at most .
Proof.
We invokeĀ 2.8 with parameter set to to get matrices in time with the following lower bound on their Shoup-Smolensky dimension.
[TABLE]
If there is a depth linear circuit of size computing the linear transformation , the following inequality must hold (fromĀ 2.2),
[TABLE]
If , we have,
[TABLE]
We also have,
[TABLE]
For any constant , these estimates contradictĀ EquationĀ 2.10, thereby implying a lower bound of on s. ā
2.6 Hard matrices over
We now prove an analog for 2.8. We construct a matrix whose entries are positive integers that can be represented by at most bits, and give a lower bound for its -measure (rather than as before).
Lemma 2.11**.**
Let be any positive integer. There is a matrix , which can be deterministically constructed in time , such that every entry of is an integer of bit complexity at most , and it holds that
[TABLE]
Proof.
Let be as in 2.6. Define as
[TABLE]
that is, is simply the polynomial evaluated at .
As in the proof of 2.7, each element in is now a distinct power of 2, which implies that .
The statement on the running time follows directly from 2.7. ā
The analog of TheoremĀ 2.9 for is given below.
Theorem 2.12**.**
There exists a family of matrices over which can be constructed in time such that every depth- linear circuit computing has size at least . Moreover, the entries of are positive integers of bit complexity at most .
Proof.
Let and and let , where is as in 2.11. A depth- circuit for implies a factorization , with , such that . Observe that since zero columns of or zero rows of can be omitted without affecting the product, we may assume , as otherwise the lower bound trivially holds. By 2.3 and 2.11, this implies that
[TABLE]
If , we have,
[TABLE]
We also have
[TABLE]
For any constant , these estimates contradict the inequality above, thus implying a lower bound of on .
The statement on the running time for constructing follows again from 2.11. ā
2.7 Lower bounds for depth- linear circuits
The lower bounds of TheoremĀ 2.12 and TheoremĀ 2.9 apply to any constant depth. However, here we briefly remark that in the special case of there is in fact a much simpler construction. As discussed in the introduction, for depth- linear circuits, the best lower bounds currently known is a lower bound of based on the study of super-concentrator graphs in the work of Radhakrishnan and Ta-ShmaĀ [RT00]. We now discuss two constructions of matrices in quasi-polynomial time which improve upon this bound. More formally, we prove the following theorem.
Theorem 2.13**.**
Let be any positive constant. Then, there is a family of matrices with entries in of bit complexity at most such that can be constructed in time and any depth- linear circuit over computing has size at least .
The first construction directly follows fromĀ 2.11 when invoked with . Once we have the matrices guaranteed byĀ 2.11, we just follow the proof ofĀ TheoremĀ 2.12 as is by taking and . We skip the technical details and now discuss the second construction, which is based on the following observation.
Observation 2.14**.**
Let be a family of matrices where . Then, any depth linear circuit computing has size .
Proof.
The key to the proof is to observe that for , . This follows from the fact that each wise product of the entries of is a power of where the exponent is a sum of powers of and for any two distinct degree multilinear monomials in the entries of , the set of powers of that appear in the exponent are distinct. On the other hand, fromĀ 2.3, we know that if can be computed by a depth- linear circuit of size at most , then
[TABLE]
Now, for , this upper bound is much smaller than the lower bound of . Thus, any depth- linear circuit for over has size at least . ā
If we directly use this observation to construct hard matrices, the bit complexity of the entries of (and hence the time complexity of constructing ) is as large as . However, it also gives a much stronger (quadratic) lower bound on the depth- linear circuit size for than what is promised inĀ TheoremĀ 2.13. For our second construction for hard matrices forĀ TheoremĀ 2.13, we invokeĀ 2.14 to construct small hard matrices (thus saving on the running time) and then construct a larger block diagonal matrix by taking a Kronecker product of this small hard matrix with a large identity matrix. The following lemma then guarantees a non-trivial lower bound on the size of any depth- linear circuit computing this larger block diagonal matrix.
Lemma 2.15**.**
Let be an matrix, such that any depth- linear circuit computing has size at least . Let be an matrix defined as , where denotes the Kronecker product, and the identity matrix. Then, any depth- linear circuit computing has size at least .
Proof.
A depth- linear circuit for gives a factorization of as for an matrix and an matrix for some parameter . We partition the rows of into contiguous blocks of size each, and let be the submatrix which consists of the block (i.e. rows of ). Similarly, we partition the columns of into contiguous blocks of size each and let be the submatrix of corresponding to the block. From the structure of , it follows that for every , . From the lower bound on the size of any depth- linear circuit for , we get that . Combining this lower bound for , we get . ā
We now note thatĀ 2.14 andĀ 2.15 imply another family of matrices for whichĀ TheoremĀ 2.13 holds.
Second proof ofĀ TheoremĀ 2.13.
Pick such that divdes , and let be the matrix defined as . Let . Clearly, can be constructed in time . Moreover, fromĀ 2.14 andĀ 2.15 it follows that any depth- linear circuit computing has size at least . ā
We note that even though the discussion in this section was confined to depth- linear circuit lower bounds over , similar ideas can be extended to other fields as well.
Extension of the direct sum based construction to arbitrary constant depth?
In light of the above construction, it is a natural question is to ask if this idea also extends to the construction of hard matrices for depth- circuits for arbitrary constant . While this is a reasonable conjecture, the easy proof of 2.15 breaks down even at depth .
There are some variations of this idea, such us looking at , where is the all-1 matrix, which would work equally well to prove a lower bound for depth-, but for which it is possible to prove an upper bound in depth-.
Furthermore, it can be seen that upper bounds on matrix multiplication in bounded depth will give small linear circuits for computing . Thus, improved lower bounds using this construction, even for depth-, will require proving new lower bounds for matrix multiplication in bounded depth (the current best lower bounds are again barely super-linear [RS03]).
3 Lower bounds via Hitting Sets
In this section, we prove lower bounds for several classes of depth 2 circuits using hitting sets for matrices. We first recall the definition.
Definition 3.1** (Hitting set for matrices, [FS12]).**
Let be a set of matrices. A set is said to be a hitting set for , if for every non-zero , there is a pair such that
[TABLE]
3.1 Matrices with no sparse vectors in their kernel
In this section, we recall some simple, deterministic and efficient constructions of matrices which do not have any sparse non-zero vector in their kernel. Such a construction forms the basic building block for building hard instances of matrices for various cases of the matrix factorization problem that we discuss in the rest of this paper. We start by describing such a construction over the field of real numbers.
3.1.1 Construction over
The following is a weak form of a classical lemma of Descartes.
Lemma 3.2** (Descartesā rule of signs).**
Let be non-negative integers, and let be arbitrary real numbers. Then, the number of distinct positive roots of the polynomial is at most .
3.2 immediately gives the following construction of a small set of vectors, such that not all of them can lie in the kernel of any matrix with at least one sparse row.
Lemma 3.3**.**
For , let . Then, for every and for every matrix over real numbers that has a non-zero row with at most non-zero entries, there is an such that .
Proof.
Let be any non-zero vector with at most non zero entries. So, the polynomial has sparsity at most . FromĀ 3.2, it follows that has at most positive real roots. Therefore, there exists an such that is not a root of , i.e., . The lemma now follows immediately by taking to be any non-zero -sparse row of . ā
We remark thatĀ 3.3 also holds for matrices over which have a sparse non-zero row for the choice of the vectors as above. This follows from the application ofĀ 3.2 separately for the real and complex parts of a sparse complex polynomial, both of which are individually sparse, with real coefficients and at least one of them is not identically zero. This observation extends our results over inĀ SectionĀ 3.2 to the field of complex numbers.
3.1.2 Construction over finite fields
We now recall some basic properties of Reed-Solomon codes, and observe they can be used as well in lieu of the construction in 3.3.
The proofs for these properties can be found in any standard reference on coding theory, e.g., Chapter 5 inĀ [GRS18].
Definition 3.4** (Reed Solomon codes).**
Let be the finite field with elements and let . The Reed-Solomon code of block length and dimension are defined as follows.
[TABLE]
Lemma 3.5**.**
Let be the finite field with elements and let . The linear space as inĀ 3.4 satisfies the following properties.
- ā¢
Every non-zero vector in has at least non-zero coordinates.
- ā¢
The dual of is the space of Reed Solomon codes of block length and dimension .
Lemma 3.6**.**
Let be the finite field with elements. For any , let be the matrix over whose -th row is . Then, every non-zero vector in in the kernel of has at least non-zero coordinates.
Proof.
Observe that is the precisely the generator matrix of Reed Solomon codes of block length and dimension over . In particular, the linear space as inĀ 3.5 is spanned by the columns of . Thus any vector in the kernel of is in fact a codeword of the dual of these codes, which as we know from Item 2 ofĀ 3.5, is itself a Reed Solomon code of block length and dimension . From the first item ofĀ 3.5, it now follows that has at least non-zero coordinates. ā
The following lemma is an analog ofĀ 3.3.
Lemma 3.7**.**
Let be the finite field with elements, be a parameter and let be the -th column of the matrix as inĀ 3.6 for .
Then, for every matrix over that has a non-zero row with at most non zero entries, there is an such that .
Proof.
The proof follows from the observation that any non-zero vector orthogonal to all the vectors must be in the kernel of the matrix and hence byĀ 3.6 must have at least non-zero entries. ā
3.2 Lower bounds for symmetric circuits
We now prove our lower bounds for symmetric circuits. Recall that a symmetric circuit is a linear depth-2 circuit of the form .
Theorem 3.8**.**
There is an explicit family of positive semidefinite matrices such that every symmetric circuit computing has size at least .
For the proof of this theorem, we give an efficient deterministic construction of a hitting set for the set of matrices which factor as for of sparsity less than , and as outlined in SectionĀ 1.7, we construct a hard matrix which is not hit by such a hitting set and has a high rank.
We start by describing the construction of .
Lemma 3.9**.**
Let be the set of vectors defined in 3.3. There exists an explicit PSD matrix of rank such that for .
Proof.
We wish to find a matrix of high rank such that for . This can be done by completing to a basis (in an arbitrary way) and requiring that the other basis elements are mapped to linearly independent vectors under . Conveniently, the set is itself a basis for : the matrix whose rows are the ās is a Vandermonde matrix.
We now describe this in some more detail. For , let by the -th elementary basis vector. For a set of variables consider the system of (non-homogeneous) linear equations on the variables given by the constraints.
[TABLE]
Since the vectors are linearly independent, this system has a solution, which can be found in polynomial time using basic linear algebra. More explicitly the -th row of , , is given by the solution to the linear system for and for where is the Vandermonde matrix whose rows are the ās. Let be the matrix whose rows are the solution to the system above. Also, note that the rank of is at least , as linearly independent vectors are in the image of the linear transformation given by .
Now let , so that indeed is a positive semi-definite matrix, and as well. It immediately follows that
[TABLE]
We are now ready to prove TheoremĀ 3.8.
Proof ofĀ TheoremĀ 3.8.
Let be the matrix from 3.9. Let be real matrix such that , and suppose towards contradiction that .
It follows that the rank of must be at least . Thus, must have at least non-zero rows. Now, since the total sparsity of is at most , there must be a non-zero row of with sparsity at most . FromĀ 3.3, it follows that there is an such that is non-zero. Thus, for this index , we have that
[TABLE]
contradicting 3.9. ā
We remark that the proof of TheoremĀ 3.8 goes through almost verbatim for symmetric circuits over (recall that over these are circuits of form , where is the conjugate transpose of ).
3.3 Lower bounds for invertible circuits
Recall that an invertible circuit is a circuit of them form where either or is invertible. In this section, we proveĀ TheoremĀ 1.5, which shows a quadratic lower bound for such circuits. For convenience, we restate the theorem.
Theorem 3.10**.**
There exists an explicit family of matrices , over any field such that , such that every invertible circuit computing has size .
Proof.
We give a proof over the field of real numbers and highlight the ideas necessary to extend the argument to work over large enough finite fields.
Fix , and let be the matrix constructed in 3.9. Let and be matrices over such that . Suppose first that is invertible and has sparsity less than .
Since , the same applies for , and hence the number of non-zero rows in must be at least . Thus, must have a non-zero row with at most non-zero entries. Along with 3.3, this implies that there is an such that , where is as in 3.3. Since is invertible, we get that is a non-zero vector, so for some ,
[TABLE]
However, as in the proof of 3.9
[TABLE]
since for all .
The case that is sparse and is invertible is virtually the same, by considering , and replacing the argument on the rows of by a similar one on the columns of .
For the proof over finite fields, we replace every application of 3.3 by 3.7. Note that this requires the -th matrix in the family to be defined over a field of size more than . The rest of the argument essentially remains the same. ā
Over fixed finite fields (for example, ), it is possible to prove an analog of TheoremĀ 3.10, with worse constants, by replacing the use of Reed-Solomon codes with any good explicit error-correcting code of dimension and distance for some fixed constants . The proof proceeds as above by finding a matrix of rank such that for every .
4 Open Problems
An important problem that continues to remain open is to prove a lower bound of the form for some constant for the depth-2 complexity of an explicit matrix. Such a lower bound would follow from an explicit hitting set of size at most for the class of polynomials of the form such that .
Another natural question here is be to understand if this PIT based approach can be used for explicit constructions of rigid matrices, which improve the state of art. One concrete question in this direction would be to construct explicit hitting sets for the set of matrices which are not rigid for . Using the techniques in this paper, it is possible to construct hitting sets of size for matrices which are not rigid. But, this is non-trivial only when for some constant , which is a regime of parameters for which explicit construction of rigid matrices is already known. A sequence of recent resultsĀ [AW17, DE17, DL19] showed that many natural candidates for rigid matrices that posses certain symmetries are in fact not as rigid as suspected. This approach might circumvent these obstacles by giving an explicit construction which is not ruled out by these results.
A lower bound of on the size of depth linear circuits computing the linear transformation implies a lower bound of for depth algebraic circuits computing the degree-2 polynomial [BS83, KS91] (so, we can convert lower bounds for circuits with outputs to lower bounds for circuits with 1 output). A notable open problem in algebraic complexity, which is very related to this work, is to prove any super-linear lower bound for algebraic circuits of depth computing a polynomial with constant total degree. We refer to [Raz10] for a discussion on the importance of this problem.
Acknowledgements
We thank Swastik Kopparty for an insightful discussion on explicit construction of Sidon sets over finite fields. We also thank Rohit Gurjar, Nutan Limaye, Srikanth Srinivasan and Joel Tropp for helpful discussions.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[AGKS 15] Manindra Agrawal, Rohit Gurjar, Arpita Korwar, and Nitin Saxena. Hitting-Sets for ROABP and Sum of Set-Multilinear Circuits . SIAM J. Comput. , 44(3):669ā697, 2015. Ā· doiĀ ā
- 2[Agr 05] Manindra Agrawal. Proving Lower Bounds Via Pseudo-random Generators . In Proceedings of the \nth 25 International Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2005) , pages 92ā105, 2005. Ā· doiĀ ā
- 3[AP 94] Noga Alon and Pavel PudlĆ”k. Superconcentrators of Depths 2 and 3; Odd Levels Help (Rarely) . J. Comput. Syst. Sci. , 48(1):194ā202, 1994. Ā· doiĀ ā
- 4[AV 08] Manindra Agrawal and V. Vinay. Arithmetic Circuits: A Chasm at Depth Four . In Proceedings of the \nth 49 Annual IEEE Symposium on Foundations of Computer Science (FOCS 2008) , pages 67ā75, 2008. Pre-print available at \Str Substitute TR 08/062TR[ \tmpstring ] \If Sub Str \tmpstring / \Str Before \tmpstring /[ \ecccyear ] \Str Behind \tmpstring /[ \ecccreport ] \Str Before \tmpstring -[ \ecccyear ] \Str Behind \tmpstring -[ \ecccreport ] eccc:TR \ecccyear - \ecccreport . Ā· doiĀ ā
- 5[AW 17] Josh Alman and R. Ryan Williams. Probabilistic rank and matrix rigidity . In Proceedings of the \nth 49 Annual ACM Symposium on Theory of Computing (STOC 2017) , pages 641ā652. ACM, 2017. Ā· doiĀ ā
- 6[BCS 97] Peter Bürgisser, Michael Clausen, and Mohammad A. Shokrollahi. Algebraic Complexity Theory , volume 315 of Grundlehren der mathematischen Wissenschaften . Springer-Verlag, 1997. Ā· doiĀ ā
- 7[BDT 17] Avraham Ben-Aroya, Dean Doron, and Amnon Ta-Shma. An efficient reduction from two-source to non-malleable extractors: achieving near-logarithmic min-entropy . In Proceedings of the \nth 49 Annual ACM Symposium on Theory of Computing (STOC 2017) , pages 1185ā1194. ACM, 2017. Ā· doiĀ ā
- 8[BS 83] Walter Baur and Volker Strassen. The Complexity of Partial Derivatives . Theoretical Computer Science , 22:317ā330, 1983. Ā· doiĀ ā
