Storage capacity in symmetric binary perceptrons
Benjamin Aubin, Will Perkins, Lenka Zdeborov\'a

TL;DR
This paper investigates the storage capacity of symmetric binary perceptrons, introducing two variants and analyzing their capacity thresholds using probabilistic methods and the replica technique, revealing different organizational structures of solutions.
Contribution
The paper provides the first rigorous analysis of the capacity of symmetric binary perceptrons, demonstrating when annealed approximations are valid and proposing a solution organization conjecture.
Findings
Capacity equals annealed computation for certain symmetric cases.
Solutions likely organized in a frozen-1RSB structure.
Replica method estimates capacity threshold for wide $u$-function cases.
Abstract
We study the problem of determining the capacity of the binary perceptron for two variants of the problem where the corresponding constraint is symmetric. We call these variants the rectangle-binary-perceptron (RPB) and the function-binary-perceptron (UBP). We show that, unlike for the usual step-function-binary-perceptron, the critical capacity in these symmetric cases is given by the annealed computation in a large region of parameter space (for all rectangular constraints and for narrow enough function constraints, ). We prove this fact (under two natural assumptions) using the first and second moment methods. We further use the second moment method to conjecture that solutions of the symmetric binary perceptrons are organized in a so-called frozen-1RSB structure, without using the replica method. We then use the replica method to estimate the capacity threshold for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Storage capacity in symmetric binary perceptrons
Benjamin Aubin
Institut de physique théorique, Université Paris Saclay, CNRS, CEA Saclay, F-91191 Gif-sur-Yvette, France
Will Perkins
Department of Mathematics, Statistics and Computer Science, University of Illinois, Chicago, USA
Lenka Zdeborová
Institut de physique théorique, Université Paris Saclay, CNRS, CEA Saclay, F-91191 Gif-sur-Yvette, France
Abstract
We study the problem of determining the capacity of the binary perceptron for two variants of the problem where the corresponding constraint is symmetric. We call these variants the rectangle-binary-perceptron (RPB) and the function-binary-perceptron (UBP). We show that, unlike for the usual step-function-binary-perceptron, the critical capacity in these symmetric cases is given by the annealed computation in a large region of parameter space (for all rectangular constraints and for narrow enough function constraints, ). We prove this fact (under two natural assumptions) using the first and second moment methods. We further use the second moment method to conjecture that solutions of the symmetric binary perceptrons are organized in a so-called frozen-1RSB structure, without using the replica method. We then use the replica method to estimate the capacity threshold for the UBP case when the function is wide . We conclude that full-step-replica-symmetry breaking would have to be evaluated in order to obtain the exact capacity in this case.
I Introduction
In this paper we revisit the problem of computing the capacity of the binary perceptron 1 ; 2 for storing random patterns. This problem lies at the core of early statistical physics studies of neural networks and their learning and generalization properties, for reviews see e.g. watkin1993statistical ; seung1992statistical ; engel2001statistical ; NishimoriBook01 . While the perceptron problem is motivated by studies of simple artificial neural networks as discussed in detail in the above literature, in this paper we view it as a random constraint satisfaction problem (CSP) where the vector of binary weights (a solution) must satisfy step constraints of the type
[TABLE]
where , is the threshold, the random variables are Gaussian variables with zero mean and variance , and the rows of the matrix are called patterns. We define an indicator function associated to the perceptron with a step constraint as .
We say that a given vector w is a solution of the perceptron instance if all constraints given by eq. (1) are satisfied. The storage capacity is then defined similarly to the satisfiability threshold in random constraint satisfaction problems: we denote the constraint density as and define the storage capacity as the infimum of densities such that in the limit , with high probability (over the choice of the matrix ) there are no solutions. It is natural to conjecture that the converse also holds, i.e. the storage capacity equals the supremum of such that in the limit solutions exist with high probability. In this case we would say the storage capacity is a sharp threshold.
Gardner and Derrida in their paper 1 assume the storage capacity is a sharp threshold and they apply the replica calculation to compute it, but reach a result inconsistent with a simple upper bound obtained by the first moment method. Mézard and Krauth 2 found a way to obtain a consistent prediction from the replica calculation and concluded that the storage capacity for the step binary perceptron (SBP), i.e. associated to the constraint , is given by the largest for which the following quantity, the entropy in physics, is positive:
[TABLE]
where is a normal Gaussian measure, and ”” means that the expression is evaluated where the derivatives on the curl-bracket, with respect to and , are zero.
Several decades of subsequent research in the statistical physics of disordered systems are consistent with the conjectured Mézard-Krauth formula for the storage capacity of the binary perceptron. Despite the simplicity of the above conjecture and decades of impressive progress in the mathematics of spin glasses and related problems, (see e.g. talagrand2006parisi ; talagrand2003spin ; 8 ; achlioptas2011solution ; panchenko2014parisi ; ding2015proof and many others), the storage capacity of the binary perceptron remains an open mathematical problem. In fact, even the very existence of a sharp threshold, i.e. the fact that in the limit the probability that patterns can be stored drops sharply from one to zero at the capacity, is an open problem. Up to very recently only widely non-matching upper bounds and lower bounds for the storage capacity of the binary perceptron were available kim1998covering ; stojnic2013discrete . As the present work was being finalized Ding and Sun ding2018capacity proved in a remarkable paper a lower bound on the capacity that matches the Krauth and Mezard conjecture (note that much like Theorem 4 below, the main theorem in ding2018capacity depends on a numerical hypothesis). A matching upper bound remains an open challenge in mathematical physics and probability theory.
In this paper we introduce two simple symmetric variants of the binary perceptron problem. Let . For a threshold , we consider two different types of symmetric constraints:
- •
The rectangle binary perceptron (RBP) requires . Its associated indicator function is .
- •
The -function binary perceptron (UBP) requires . Its associated indicator function is .
These constraints are symmetric in the sense that if w is a solution then is a solution as well. Our motivation behind these symmetric variants of the perceptron is that this symmetry simplifies greatly the mathematical treatment of the problem, while keeping the relevant physical properties intact. Thus, results that remain open questions for the canonical perceptron can be established rigorously for these symmetric versions. Symmetric perceptron models are also directly related to the problem of determining the discrepancy of a random matrix or set system BansalSpencer19 , a problem of interest in combinatorics.
The main result of the present paper, presented in section II, is a proof, subject to a numerical hypothesis, of a formula for the storage capacity, defined in the same way as for the step-function binary perceptron above. In particular, we show that in these symmetric variants the first moment upper bound (corresponding to the annealed capacity in physics) on the storage capacity is tight (except for for the UBP case). We prove this statement using the second moment method. We note that the existing physics literature on perceptron-like problem contains other cases of models where the first moment upper bound on the storage capacity was observed to be tight, in particular the parity machine opper1995statistical , and the reversed-wedge binary perceptron bex1995storage ; hosaka2002statistical . Those works, however, rely on the comparison of the first moment bound on the capacity with the result of the replica method, rather than providing a rigorous justification.
To formally state our main result, let , and for let and .
- •
The storage capacity for the rectangle binary perceptron is:
[TABLE]
- •
The storage capacity for the function binary perceptron is:
[TABLE]
The constant stems from the properties of the second moment entropy eq. (10). In the physics terms it is defined as the point of intersection between the annealed capacity and the local stability of the RS solution eq. (17). That is, is the solution of the following equation:
[TABLE]
The two symmetric variants of the perceptron problem considered here share many of the intriguing geometric properties of the original step-function binary perceptron problem. Most significant is the conjectured frozen-1RSB 2 nature of the space of solutions that splits into well separated clusters of vanishing entropy at any . Remarkably, this frozen-1RSB property can be deduced from the form of the second moment entropy as we explain in section III. Our justification of the frozen-1RSB property does not rely on the replica method and is hence of independent interest.
For the UBP and , the second-moment proof technique fails, and this failure marks tightly the onset of the replica symmetry breaking region. In that region, we evaluate the one-step replica symmetry breaking (1RSB) approximation for the storage capacity, but conclude that full-step replica symmetry breaking (FRSB) would be needed to obtain the exact result. While the FRSB equations can be written along the lines of 20 , they are more involved than the ones for the Sherrington-Kirkpatrick model parisi1979infinite ; parisi1980sequence ; parisi1980order , and solving them numerically or getting additional insight from them is a challenging task left for future work. We present the replica analysis in section IV. Table 1 contains the summary of our main results along with the predictions for the step-function perceptron.
Finally let us comment on the simpler and more commonly considered case of spherical perceptron where the binary constraint on the vector is replaced by the spherical constraint . For the spherical perceptron reduces to the famous problem of intersection of half-spaces with capacity as solved by Wendell wendel1962problem and Cover cover1965geometrical . For the Gardner-Derrida solution 1 is correct as proven in shcherbina2003rigorous ; stojnic2013another . For the situation is more challenging and FRSB is needed to compute the storage capacity; for recent progress in physics see franz2016simplest ; 20 , while mathematical considerations about this case were presented in stojnic2013negative .
II Proof of correctness of the annealed capacity
To state the main results precisely we introduce some definitions. Let be the random pattern matrix. Define the partition functions
[TABLE]
which count respectively the number of solutions for the rectangle and function constraints respectively. Let and be the events that and . We formally define the storage capacity.
Defintition 1**.**
The storage capacity is
[TABLE]
and likewise for .
It is believed that there is a sharp threshold for the existence of solutions.
Conjecture 2**.**
The storage capacity is a sharp threshold:
[TABLE]
and likewise for .
The corresponding conjecture for the random k-SAT model is the celebrated ‘satisfiability threshold conjecture’ proved for large by Ding, Sly, and Sun ding2015proof .
Next, couple two standard Gaussians by letting and be independent standard Gaussians and setting and . Let
[TABLE]
with the probability that two standard Gaussians with correlation are both at most in absolute value, that is:
[TABLE]
Note that and for . We now introduce the functions that dictate the effectiveness of the second moment bound. Let
[TABLE]
where is the Shannon entropy function.
We state a numerical hypothesis in terms of the derivatives of these two functions.
Hypothesis 3**.**
For all choices of and so that , there is exactly one so that . The same holds for .
Our main theorem is a proof, under Hypothesis 3, that the storage capacity is given by the annealed computation.
Theorem 4**.**
Under the assumption of Hypothesis 3, the following hold.
For all , we have . 2. 2.
For all , we have .
Under our definition of and , we must prove two statements to show that (and similarly for ). We use the first moment method to show that for ,
; then we use the second moment method to show that for , (a result analogous to what Ding and Sun prove for the more challenging step binary perceptron ding2018capacity ). Conjecture 2 asserts the stronger statement that for , .
II.1 First moment upper bound
Proposition 5**.**
**
If , then whp there is no satisfying assignment to the binary perceptron with the rectangle activation function. 2. 2.
If , then whp there is no satisfying assignment to the binary perceptron with the -function activation function.
Proof.
We give the proof for the rectangle function as the proof for the -function is identical. Let . Let denote the vector of dimension with all entries.
[TABLE]
∎
II.2 Second moment lower bound
Proposition 6**.**
**
If , then
[TABLE] 2. 2.
If and , then
[TABLE]
To prove Proposition 6 we will apply the second-moment method in a similar fashion to Achlioptas and Moore achlioptas2002asymptotic who determined the satisfiability threshold of random -SAT to within a factor by considering not-all-equal satisfying assignments (not-all-equal satisfiability (NAE-SAT) constraints are symmetric in the same way the rectangle and -function constraints are symmetric). Recall the Paley-Zygmund inequality.
Lemma 7**.**
Let be a non-negative random variable. Then
[TABLE]
We will also use the following application of Laplace’s method from Achlioptas and Moore achlioptas2002asymptotic .
Lemma 8**.**
Let be a real analytic function on and let
[TABLE]
If for all and , then there exists constants so that for all sufficiently large
[TABLE]
II.2.1 Rectangle binary perceptron
We calculate
[TABLE]
where we recall from eq. (6). Define
[TABLE]
If we can show that for all and , then by Lemma 8, we have
[TABLE]
Then since is integer valued, we have
[TABLE]
It remains to show that when , then for all and . By eq. (9) and the fact that , it is enough to show the same for .
Certainly one necessary condition is that . This reduces to the condition or which is exactly the condition of Proposition 6. Next consider .
A calculation shows that
[TABLE]
In particular, if and only if
[TABLE]
But a calculation also shows that
[TABLE]
for all and so the condition of Proposition 6 implies that .
Moreover, since is symmetric around and it has a local maximum at , Hypothesis 3 implies that the global maximum of occurs at either or , and since , we have that for all , completing the proof of Proposition 6 for the rectangle binary perceptron.
II.2.2 -function binary perceptron
The proof for the -function is similar. We can calculate
[TABLE]
where we recall from eq. (6). Using Lemma 8 and Hypothesis 3 again, it suffices to show that for and we have and . The first follows immediately from the fact that . For the second, we have
[TABLE]
and so if and only if
[TABLE]
Unlike with the rectangle function it is not true that
[TABLE]
for all : the left and right sides of the inequality cross at , which implicitly defines . Thus for and we have , which completes the proof of Proposition 6 for the -function binary perceptron.
II.2.3 Illustration
As an illustration, we plot the second moment entropy density for at in fig. 1. For the rectangle function (a), the second moment is tight: the maximum is reached for for all smaller than the first moment (dashed pink). Exactly the same happens for the function with . However for , the second moment method fails (b): becomes a minimum and the maximum is obtained for non trivial values for constraint density smaller than the first moment (dashed yellow).
III Frozen-1RSB structure of solutions in binary perceptrons
One of the most striking properties of the canonical step-function perceptron is the predicted frozen-1RSB 2 nature of the space of solutions. This means that the dominant (measure tending to one) part of the space of solutions splits into well separated clusters each of which has vanishing entropy density at any . This frozen-1RSB scenario and quantitative properties of the solution space were studied in detail recently 16 ; huang2014origin . Following up on conjectures that such a frozen structure of solutions implies computational hardness in diluted constraint satisfaction problems zdeborova2008constraint , it was argued that finding a satisfying assignment in the binary perceptron should also be algorithmically hard since its solution space is dominated by clusters of vanishing entropy density huang2014origin . Yet this conjecture contradicted empirical results of braunstein2006learning . This paradox was resolved in baldassi2015subdominant where the authors identified that there are subdominant parts (i.e. parts of measure converging to zero as the system size diverges) of the solution space that form extended clusters with large local entropy and all the algorithms that work well always find a solution belonging to one of those large-local-entropy clusters. These sub-dominant clusters are not frozen and somewhat strangely are not captured in the canonical 1RSB calculation baldassi2015subdominant . It was argued that existence of these large-local-entropy clusters bears more general consequences on the dynamics of learning algorithms in neural networks, see e.g. baldassi2016unreasonable .
While frozen-1RSB structure has also been identified in constraint satisfaction problems on sparse graphs zdeborova2008locked ; zdeborova2011quiet , we want to note that its nature in the binary perceptron is of a rather different nature. In sparse systems a simple argument using expansion properties of the underlying graph and properties of the constraints show that each cluster with high probability contains only one solution. In the perceptron model, which has a fully connected bipartite interaction graph, this argument from sparse models does not apply.
In the present paper, we deduce from the second moment calculation of the previous section that the space of solutions in the symmetric binary perceptrons is also of the frozen-1RSB type and this property moreover extends to any finite temperature (with energy being defined as the number of unsatisfied constraints). This is different from the locked constraint satisfaction problems of zdeborova2008constraint ; zdeborova2011quiet living on diluted hypergraphs, where the solution-clusters have extensive entropy at any non-zero temperature. Another difference is that whereas in the locked constraint satisfaction problems the size of each cluster is one with high probability, in the binary perceptron there are still many solutions in the clusters, it is only their entropy density (i.e. logarithm of their number per variable) that vanishes as .
Investigation of the large local entropy clusters and their implications for learning in the symmetric perceptrons is also of great interest, but left for future work. Clearly since mathematically the symmetric perceptrons are simpler than the step-function one, they should also be the proper playground to deepen our understanding of the large local entropy clusters and their relation to learning and generalization.
We present the frozen-1RSB scenario as a conjecture and then below indicate how the second moment calculation gives evidence for this conjecture. Given an instance and a solution , let denote the set of solutions with Hamming distance at most from .
Conjecture 9**.**
For every and every there exists so that with high probability over the choice of the random instance from the RBP, the following property holds: for almost every solution ,
[TABLE]
as . The same holds for the UBP for all .
III.1 The link between the second-moment entropy and size of
clusters
In this section we use and note that the form of the second moment entropy density has very direct implications on the structure of solutions in the corresponding models. As we defined it above, the second moment entropy is the normalized logarithm of the expected number of pairs of solutions of overlap .
For problems such as the symmetric binary perceptrons where the quenched and annealed entropies are equal in leading order, there is a striking relation between the planted and the random ensemble of the model achlioptas2008algorithmic ; krzakala2009hiding . The random ensemble is the problem we have considered so far, while the planted ensemble is defined by starting with a configuration of the weights (a solution) and then including only constraints that are satisfied by this planted configuration. As long as the quenched and annealed entropies of the random ensemble are equal in leading order the planted and random ensembles should be contiguous, meaning that high-probability properties that hold in one ensemble also hold in the other. Moreover the planted configuration in the planted ensemble has all the properties of a configuration sampled uniformly at random in the random ensemble. These properties follow on the heuristic level from the cavity method reasoning krzakala2009hiding . They were established fully rigorously in a range of models, see e.g. achlioptas2008algorithmic ; mossel2015reconstruction ; coja2018information . In the present case of symmetric binary perceptrons we have not yet managed to prove contiguity between the random and the planted ensemble, and so we leave a rigorous mathematical result for future work. (In fact the missing ingredient is a version of Friedgut’s sharp threshold result friedgut1999sharp suitable for perceptrons; such a result combined with Theorem 4 would also prove Conjecture 2). We hence rely on the above heuristic argument and assume it holds in what follows.
Given a planted solution and a configuration that agrees with on coordinates, the probability that is a solution in the planted model is , and thus the expected number of solutions at Hamming distance from the planted solution in the planted ensemble is
[TABLE]
and its entropy density is
[TABLE]
Recalling that contiguity implies that the planted solution has the properties of a uniformly chosen solution in the random ensemble then this entropy gives us direct access to properties of the solution space in the random ensemble at equilibrium. Most notably we notice (see derivation in section III.2 below) that the derivative of at is thus implying that with high probability there are no solutions at overlap . In turn, this means that the dominant (measure converging to one as ) part of the solution space splits into clusters each of which has vanishing entropy density (i.e. logarithm of the number of solutions in the cluster divided by goes to zero as ). The missing ingredient in a full proof of Conjecture 9 is a proof of the contiguity statement.
III.2 Form of the 2nd moment entropy implying frozen-1RSB
In fig. 2a we plot for the rectangle binary perceptron, at , . Thanks to the contiguity between the planted and random ensembles that holds as long as the second moment entropy density is twice the first moment entropy density, this curve represents also the annealed entropy of solutions at overlap with a random reference solution. We see notably that there is an interval of distances in which no solutions are present. Analytically we can see from the properties of the functions and that and the derivative of . This is in contrast with, for instance, the satisfiability problems studied in achlioptas2002asymptotic , where the function corresponding to would have a negative derivative in (see fig. 2b). There could still be an interval of forbidden distance, but the bump in entropy for corresponds to the size of the clusters to which typical solutions belong and those would be extensive.
III.2.1 Frozen 1RSB in rectangle binary perceptron
In the rectangle binary perceptron, the random and planted ensembles are conjectured to be contiguous for all and . Using eq. (8), the first derivative of , eq. (11), is given by (see Appendix VI.5)
[TABLE]
and it diverges for all , in the limit :
[TABLE]
This implies vanishing entropy density of clusters to which typical solutions belong.
III.2.2 Frozen 1RSB in the -function binary perceptron
In the -function binary perceptron, the random and planted ensembles are conjectured to be contiguous for all and . Using eq. (8), the first derivative of eq. (11), is given by
[TABLE]
thus reaching the same conclusion on presence of frozen-1RSB.
In appendix VI.5 we extend the second moment calculation to finite temperature (for both the rectangle and function case). This means that we define energy of a configuration as the number of constraints that are violated by this configurations. Then the corresponding partition function is defined . There is a one-to-one mapping between the temperature and energy density , consequently the corresponding finite-temperature second moment entropy density counts the number of pairs of solutions at overlap and energy density . In appendix VI.5 we apply the same argument as here connecting the random and planted ensemble, and deduce that the finite-temperature solution space of the models is of also of the frozen-1RSB type for any .
III.3 Frozen-1RSB as derived from the replica analysis
We stress that we derived the frozen-1RSB nature of the space of solutions without the use of replicas. For completeness we summarize here how this translates to the properties of the one-step-replica-symmetry breaking solution. This is the way this phenomena was originally discovered and described in 2 ; martin2004frozen ; 16 . For readers not familiar with the replica method this section should be read after reading section IV.
In general, three kinds of fixed points of the 1RSB equations are possible:
- •
The replica symmetric (RS) solution ,
- •
The frozen-1RSB solution (f1RSB) ,
- •
The 1RSB solution with .
The frozen-1RSB is characterized by an inner-cluster overlap and an inter-cluster overlap , which means that clusters have vanishing entropy density and remain far from each other. Mathematically RS and f1RSB solutions are equivalent in the sense that these solutions have the same free energy eq. (20) , and the complexity of the f1RSB solution equals the RS entropy eq. (22, 15). However, RS and f1RSB do not share the same configuration space. The RS phase is associated to a single cluster of solution with typical size , while the f1RSB configuration space is composed of many point-like solutions of size and at distance of each other, see fig. 3. From this point of view f1RSB is the correct description of the phase space.
IV Replica calculation of the storage capacity
In this section we recall the replica calculation leading to the expression of the storage capacity in the step-function binary perceptron. We show that in the symmetric binary perceptrons the annealed calculation is reproduced by the replica symmetric result. For the function binary perceptron we show that coincides with the onset of replica symmetry breaking and we evaluate the 1RSB capacity for .
IV.1 Replica calculation
For the purpose of the calculations, we introduce the constraint function that returns if satisfies all the constraints { and [math] otherwise
[TABLE]
Recall the partition function is the number of satisfying vectors , with prior distribution , for a given matrix
[TABLE]
The replica method allows one to compute explicitly the quenched average 3 . More precisely, using the replica trick, the average of the logarithm can be expressed as the limit of the derivative with respect to of the average of the -th moment of the partition function. Finally the free entropy reads:
[TABLE]
Computing the -th moment of the partition function , for , is equivalent to considering copies, also called replicas, of the initial system. For a given disorder, these replicas are non-interacting and can be computed easily. However, averaging over the ”disorder” with distribution makes the replicas interacting: replicated weight-vectors and , for , are correlated by the overlap matrix .
We start averaging over the distribution and then use an analytical continuation for and reverse the limits and . The exchange of limits and is a key and classical ingredient for replica calculations, rendering the replica method heuristic and not rigorously justified. Using this later point, we show in Appendix VI.1 that the free entropy eq. (13) can finally be expressed as a saddle point equation over symmetric matrices and
[TABLE]
where is a parameter involved in the change of variable between and and with
[TABLE]
In order to be able to compute the derivative of with respect to eq. (14), we need an analytical formulation of and as a function of .
IV.2 RS entropy
The simplest ansatz is to assume that the overlap matrix is Replica Symmetric (RS), which means that all replicas play the same role: the correlation between two arbitrary, but different, replicas is denoted , and therefore the RS ansatz reads:
[TABLE]
It enforces the matrix to present the same symmetry, respectively with parameters and . Using this ansatz and the limit, the Replica Symmetric (RS) entropy can be expressed as a set of saddle point equations over scalar parameters and , evaluated at the saddle point (Appendix VI.2):
[TABLE]
[TABLE]
Note that above and in what follows . In the binary perceptron case, the function is defined as (note that this is not a probability distribution because of the normalization), and recall is the indicator function, checking that a constraint on the argument is satisfied (e.g in the step case, if ).
While in the step binary perceptron (SBP) the fixed point solution is non-trivial, the symmetry of the activation function in the RBP and UBP cases enforces the configuration space to be symmetric and the fixed point to exist. If this symmetric fixed point is stable and has the lowest free energy, the RS free entropy matches the annealed entropy from section II.1 with .
IV.2.1 Rectangle
Solving numerically the corresponding saddle point equations leads to the single symmetric fixed point . Hence the RS entropy saturates the first moment bound:
[TABLE]
and the RS capacity equals the annealed capacity eq. (II.1):
[TABLE]
IV.2.2 -function
- •
For , only the symmetric fixed point exists, which leads again to the annealed free entropy:
[TABLE]
and annealed capacity eq. (II.1):
[TABLE]
- •
For , the RS entropy does not match the annealed entropy because the fixed point corresponds to a lower free energy than the symmetric fixed point . The symmetric fixed point becomes unstable for , where is remarkably given by the same value as in the independent section II.2.2. Hence it naturally verifies eq. (5) even though its definition derives from the stability of the RS solution, that we study in the next section.
IV.3 Stability
The local stability of the RS solution can be studied using de Almeida and Thouless (AT) method 22 , based on the positivity of the Hessian of . The replica symmetric AT-line is given by the solution of the following implicit equation (Appendix VI.4):
[TABLE]
As illustrated above, for the rectangle and function, the symmetry of the weights and the constraint imposes the existence of the symmetric fixed point . This simplifies the previous condition and becomes equivalent to the linear stability condition of the symmetric fixed point (see Appendix VI.4):
[TABLE]
We plotted the annealed capacity, the replica symmetric capacity and the AT-line for the step, rectangle and -function binary perceptrons as functions of in fig. 4, 5, 6.
IV.3.1 Step binary perceptron
We note that for the step binary perceptron the RS solution is always stable towards 1RSB, even for negative threshold . This is interesting in the view of recent work on the spherical perceptron with negative threshold where the replica symmetry breaks for all , and full-step RSB is needed to evaluate the storage capacity 20 .
IV.3.2 Rectangle
As the RS capacity is always below the AT line , the RS solution is always locally stable.
IV.3.3 -function
There is a crossing between the values of the RS capacity and the AT-line , which defines implicitly the value , and matches the equality in eq. (10):
[TABLE]
For , the RS solution is locally stable, while for the RS solution becomes unstable, and a symmetry breaking solution appears.
IV.4 1RSB calculation
In the previous section we concluded that the replica symmetric solution is unstable in the function binary perceptron for , we analyze therefore the first-step of replica symmetry breaking (1RSB) ansatz in this section. This ansatz and calculations is due to seminal works of G. Parisi and is classic in the field of disordered systems and well presented in the literature 13 ; parisi1979infinite ; parisi1980sequence ; parisi1980order , we thus mainly give the key formulas and defer the details into the Appendix VI.3.
The 1RSB ansatz assumes that the space of configurations splits into states. Consequently replicas are not symmetric anymore and instead replicas are organized in groups containing replicas each:
[TABLE]
Following 25 , the partition function associated to replicas falling in the same state is expressed as a sum over all possible states weighted by their corresponding free entropy :
[TABLE]
where we introduced the number of states at a given free entropy : and the complexity , also called the configurational entropy.
Using the saddle point method in the limit, the 1RSB replicated free entropy is written as a function of the Parisi parameter , the free entropy and the complexity :
[TABLE]
Injecting the 1RSB ansatz eq. (18) in the replica derivation eq. (14), the 1RSB replicated free entropy is written as a saddle point equation over and (see Appendix VI.3):
[TABLE]
[TABLE]
[TABLE]
Taking the derivative of with respect to , the free entropy and complexity can be written as:
[TABLE]
[TABLE]
IV.5 1RSB results for UBP
From now on, we only consider the function binary perceptron, whose RS solution is unstable for . To describe the equilibrium of the system in the SAT phase, we need to find the value of the Parisi parameter at equilibrium . The complexity is the entropy of clusters having internal entropy . In order to capture clusters that carry almost all configurations, we need to maximize the total entropy under the constraint that the free entropy and complexity are both positive and . Hence from eq. (19), the equilibrium Parisi parameter verifies
[TABLE]
Using the expressions eq. (22) and varying the Parisi parameter , we obtain the curve of the complexity as shown in fig. 7. At , the complexity is negative. Decreasing , the complexity increases and becomes positive at the value . Besides for small values of , an unphysical (convex) branch appears, as commonly observed in other systems solved by the replica method.
We note that at increases both the equilibrium complexity and free entropy decrease. In constraint satisfaction problems such as K-satisfiability or random graph coloring the mechanism in which the satisfiability threshold appears is that the maximum of the complexity becomes negative. In the present UBP problem it is actually both the free entropy and the complexity that vanish together, as illustrated in fig. 7.
Computing the equilibrium value , we have access to the corresponding equilibrium overlaps and , that we may compare with the RS solution . All these are depicted in fig. 8. The function shows a non monotonic behaviour as it has been previously observed, e.g. in the Sherrington-Kirkpatrick model as a function of temperature Mezard1987 .
We also compute the 1RSB entropy that verifies and which vanishes at the 1RSB capacity as depicted in fig. 9a. We note that the above inequality is as predicted by Parisi’s replica theory Mezard1987 , taking into account that we are working at strictly zero energy, where the entropy becomes minus the free energy.
The 1RSB solution provides a small correction to the RS result for storage capacity, as illustrated in fig. 9b, where we plotted the difference between the annealed upper bound and the capacity for the RS and 1RSB solutions: and .
IV.6 1RSB Stability
In the previous section we evaluated the 1RSB storage capacity of the function binary perceptron for . In this section we will argue that this cannot be an exact solution to the problem.
We could investigate the stability of 1RSB towards further levels of replica symmetry breaking along the same lines we did for the RS solution. However, in the present case we do not need to do that to see that the obtained solution cannot be correct. The explanations lies in the breaking of the up-down symmetry in the problem. This symmetry must either be broken explicitly as in the ferromagnet, where the system would acquire an overall magnetization, but we have not observed any trace of this in the present problem. Or this up-down symmetry must be conserved in the final correct solution. The conservation of the up-down symmetry is manifested in the value in the replica symmetric phase. The fact that in the 1RSB solution evaluated above we do not observe , but instead is a sign of the fact that we are evaluating a wrong solution. The only possible way to obtain an exact solution we foresee is to evaluate the full-step replica symmetry breaking with a continuity of overlaps , the smallest one of them should be [math] in order to restore the up-down symmetry. We let the evaluation of the full-RSB for future work.
Finally let us note that the 1RSB solution obtained in the previous section can be interpreted as frozen-2RSB. In 2RSB we would have 3 kinds of overlaps, , and . In frozen 2RSB we would have , , .
V Conclusion
The step-function binary perceptron has thus far eluded a rigorous establishment of the conjectured storage capacity, eq. (2). This prediction is expected to be exact because of the frozen-1RSB nature of the problem 2 ; 16 . At the same time the work of baldassi2015subdominant sheds light on the fact that the structure of the space of solutions is not fully described by the frozen-1RSB picture, and that rare dense and unfrozen regions exist and in fact are amenable to dynamical procedures searching for solutions. It remains to be understood how is it possible that the 1RSB calculation does not capture these dense unfrozen regions of solutions baldassi2015subdominant . They do not dominate the equilibrium, but the RSB calculation is expected to describe rare events via their large deviations, which in this case it does not.
In this paper we focus on two cases of the binary perceptron with symmetric constraints, the rectangle binary perceptron and the function binary perceptron. We prove (up to a numerical assumption) using the second moment method that the storage capacity agrees in those cases with the annealed upper bound, except for the function binary perceptron for eq. (5). We analyze the 1RSB solution in that case and indeed obtain a lower prediction for the storage capacity. However, we do not expect the 1RSB to provide the exact solution because it does not respect the up-down symmetry of the problem. Though the precise nature of the satisfiable phase for the function binary perceptron for remains illusive, we can conjecture it is full-RSB parisi1979infinite ; parisi1980sequence ; parisi1980order . Establishing this rigorously would provide much deeper understanding and remains a challenging subject for future work.
Acknowledgement
We thank Florent Krzakala, Joe Neeman, and Pierfrancesco Urbani for useful discussions. We acknowledge funding from the ERC under the European Union s Horizon 2020 Research and Innovation Programme Grant Agreement 714608-SMiLe. WP was supported in part by EPSRC grant EP/P009913/1.
VI Appendices
VI.1 General replica calculation
We present here the replica computation for general prior distribution and constraint function . In order to compute the quenched average of the free entropy, we consider the partition function of identical copies of the initial system. Using the replica trick, and an analytical continuation, the averaged free entropy of the initial system reads:
[TABLE]
where the replicated partition function can be written as
[TABLE]
with the global constraint function .
We suppose that inputs are distributed from . More precisely, for , , . Hence is the sum of random variables. The central limit theorem insures that , with two first moments:
[TABLE]
In the following we introduce the symmetric overlap matrix . Define and . follows a multivariate gaussian distribution and . Introducing the change of variable and the Fourier representation of the -Dirac function that involves a new parameter :
[TABLE]
the replicated partition function becomes an integral over the matrix parameters and , that can be evaluated using Laplace method in the limit,
[TABLE]
where SP states for saddle point and we defined
[TABLE]
Finally, using eq. (23) and switching the two limits and , the quenched free entropy simplifies as a saddle point equation
[TABLE]
over general symmetric matrices and . In the following we will assume simple ansatz for these matrices that allows to get analytic expressions in in order to take the derivative.
VI.2 RS entropy
Let’s compute the functional appearing in the free entropy eq. (29) in the simplest ansatz: the Replica Symmetric ansatz. This later assumes that all replica remain equivalent with a common overlap for and a norm , leading to the following expressions of the matrices and :
[TABLE]
Let’s compute separately the terms involved in the functional eq. (28): the first is a trace term, the second a term of prior and finally the third a term depending on the constraint .
Trace term
The trace term can be easily computed and takes the following form:
[TABLE]
Prior integral
Evaluated at the RS fixed point, and using a gaussian identity also known as a Hubbard-Stratonovich transformation, the prior integral can be further simplified
[TABLE]
Constraint integral
Recall the vector follows a gaussian distribution with zero mean and covariance matrix . In the RS ansatz, the covariance can be rewritten as a linear combination of the identity and the matrix with all ones entries of size : , that allows to split the variable with and . Finally, the constraint integral reads:
[TABLE]
Summary and RS free entropy
Finally putting pieces together, the functional taken at the RS fixed point has an explicit formula and dependency in :
[TABLE]
Finally taking the derivative with respect to and the limit, the RS free entropy has a simple expression
[TABLE]
with and the following notations,
[TABLE]
VI.3 1RSB entropy
The free entropy eq. (23) can also be evaluated at the simplest non trivial fixed point: the one step Replica Symmetry Breaking ansatz (1RSB). Instead assuming that replicas are equivalent, it assumes that the symmetry between replica is broken and that replicas are clustered in different states, with inner overlap and outer overlap . Translating this in a matrix formulation, the matrices can be expressed as
[TABLE]
Trace term
Again, the trace term can be easily computed
[TABLE]
Prior integral
Separating replicas with different overlaps, the prior integral can be written as
[TABLE]
Constraint integral
Again the vector follows a gaussian vector with zero mean and covariance . The gaussian vector of covariance can be decomposed in a sum of normal gaussian vectors , and , : . Finally the constraint integral reads
[TABLE]
Summary and 1RSB free entropy
Gathering the previous computations eq. (42, 44, 46), the functional evaluated at the 1RSB fixed point reads:
[TABLE]
Let’s introduce the replicated free entropy following [48]. We consider reals replicas of the same system and we imagine we put a small field, that allows the replicas to fall in the same state. The replicated free entropy is the free entropy corresponding to these uncorrelated copies in the limit of zero coupling. To compute it, we consider replicas. Denoting and , the replicated free entropy reads as times the free entropy of replicas with 1RSB structure:
[TABLE]
with , and defined in eq. (21) and
[TABLE]
VI.4 RS Stability
VI.4.1 De Almeida Thouless RS Stability
The stability of a given saddle point ansatz is related to the positivity the hessian of the functional . This stability analysis has first been done by de Almeida Thouless and following [46, 1, 5], replicons eigenvalues of the RS ansatz and can be expressed as functions of defined in eq. (16):
[TABLE]
The instability AT-line is defined when the determinant of the hessian vanishes that translates as an implicit equation over , where are solution of the saddle point equations eq. (15) at :
[TABLE]
However for , is the only solution. Using defined eq. (58), this expression simplifies because of the symmetry of the prior distribution and the constraints in the rectangle and function cases. In fact the symmetry imposes and and the condition reads:
[TABLE]
VI.4.2 Existence and stability of the RS fixed point
We provide an alternative approach to get the instability condition of the RS solution for symmetric prior and constraint. In this symmetric case, the stability can be derived from the existence and stability of the symmetric fixed point . Let’s define
[TABLE]
In fact the saddle point equations at the RS fixed point eq. (15) can be written using the functions , and can be reduced to a single fixed point equation over :
[TABLE]
As stressed above, the RS stability is equivalent to the existence and stability of the fixed point . According to that, let’s compute the stability of the above fixed point equation eq. (59). Computing in the limit , expanding , as functions of and finally using the symmetry that implies and :
[TABLE]
Finally, the existence and stability conditions of the fixed point translate as an explicit condition over that defines
[TABLE]
VI.5 Moments at finite temperature
In this section we generalize the definition of the partition function for any temperature . The energy of a configuration is defined as the number of unsatisfied constraints and the corresponding partition function is defined by . In particular for the rectangle and function constraints, the partition functions at temperature read
[TABLE]
We define the probabilities that constraints are satisfied at temperature :
[TABLE]
VI.5.1 First moment at finite temperature
Let the event that . Let’s compute the first moment in the rectangle case,
[TABLE]
and this derivation holds similarly for the step and function.
VI.5.2 Second moment at finite temperature
Again we show the computation for the rectangle and it can be done similarly for the function.
Expression of
[TABLE]
where we defined the probability that two standard Gaussians with correlation are both at most in absolute value at temperature . Defining and
[TABLE]
the function at finite temperature can be written
[TABLE]
where
[TABLE]
Expression of
To compute the derivative of , we first introduce
[TABLE]
The derivative of each integral involved in eq. (71) can be easily computed as
[TABLE]
Hence taking the derivative of each term of the form and simplifying it, the probability reads:
[TABLE]
In the end, the derivative of the second moment can be evaluated for and at all temperature :
[TABLE]
In particular at ,
[TABLE]
Expression of
Adapting the previous steps and using
[TABLE]
and eq. (74) the derivative for the function is straightforward to compute and is given by
[TABLE]
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] E. Gardner & B. Derrida. Optimal storage properties of neural network models. J. Phys. A: Math. and Gen , 1988.
- 2[2] W. Krauth & M. Mézard. Storage capacity of memory networks with binary couplings. J. Phys. France , 1989.
- 3[3] Timothy LH Watkin, Albrecht Rau, and Michael Biehl. The statistical mechanics of learning a rule. Reviews of Modern Physics , 65(2):499, 1993.
- 4[4] HS Seung, Haim Sompolinsky, and N Tishby. Statistical mechanics of learning from examples. Physical Review A , 45(8):6056, 1992.
- 5[5] A. Engel & C. Van den Broeck. Statistical mechanics of learning . Cambridge university press, 2001.
- 6[6] H. Nishimori. Statistical Physics of Spin Glasses and Information Processing: An Introduction . Oxford University Press, Oxford, UK, 2001.
- 7[7] Michel Talagrand. The Parisi formula. Annals of mathematics , pages 221–263, 2006.
- 8[8] Michel Talagrand. Spin glasses: a challenge for mathematicians: cavity and mean field models , volume 46. Springer Science & Business Media, 2003.
