Compact representation of the full Broyden class of quasi-Newton updates
Omar DeGuchy, Jennifer B. Erway, and Roummel F. Marcia

TL;DR
This paper develops a compact representation for matrices in the Broyden class of quasi-Newton updates, enabling efficient computation of inverses, solutions to linear systems, and eigenvalues, thus facilitating sensitivity analysis.
Contribution
It extends previous work by providing a compact representation for the full Broyden class, including rank-one and rank-two updates, with practical algorithms for inverse and linear system computations.
Findings
Accurately represents Broyden class matrices using the compact form.
Efficiently computes inverses and solves linear systems with these matrices.
Enables eigenvalue, condition number, and sensitivity analysis for Broyden matrices.
Abstract
In this paper, we present the compact representation for matrices belonging to the the Broyden class of quasi-Newton updates, where each update may be either rank-one or rank-two. This work extends previous results solely for the restricted Broyden class of rank-two updates. In this article, it is not assumed the same Broyden update is used each iteration; rather, different members of the Broyden class may be used each iteration. Numerical experiments suggest that a practical implementation of the compact representation is able to accurately represent matrices belonging to the Broyden class of updates. Furthermore, we demonstrate how to compute the compact representation for the inverse of these matrices, as well as a practical algorithm for solving linear systems with members of the Broyden class of updates. We demonstrate through numerical experiments that the proposed linear solver…
| Experiment | |||||
| 1 | 1 | 0 | |||
| 2 | 1 | 0 | |||
| 3 | 1 | ||||
| 4 | 1 | 0 |
| Exp. 1 | Exp. 2 | Exp. 3 | Exp. 4 | |
|---|---|---|---|---|
| 100 | 1.1315e-13 | 1.3383e-11 | 1.6749e-12 | 2.2855e-14 |
| 1,000 | 3.2039e-14 | 1.1225e-14 | 5.4247e-15 | 1.0155e-15 |
| 10,000 | 1.3426e-13 | 8.5453e-14 | 1.9969e-13 | 2.8354e-16 |
| Exp. 1 | Exp. 2 | Exp. 3 | Exp. 4 | |
|---|---|---|---|---|
| 100 | 4.0158e-13 | 1.342e-10 | 1.3065e-09 | 2.8160e-14 |
| 1,000 | 1.518e-14 | 7.6460e-14 | 6.1744e-14 | 1.8431e-13 |
| 10,000 | 2.4175e-12 | 1.6079e-12 | 4.3284e-12 | 1.8795e-14 |
| Exp. | ICR | MATLAB | ICR | MATLAB | ICR | MATLAB |
|---|---|---|---|---|---|---|
| 1 | 9.9e-04 | 3.3e-04 | 1.1e-03 | 2.9e-02 | 3.7e-03 | 1.1e+01 |
| 2 | 7.2e-04 | 3.7e-04 | 1.1e-03 | 2.9e-02 | 3.5e-03 | 1.1e+01 |
| 3 | 6.7e-04 | 3.1e-04 | 1.1e-03 | 3.0e-02 | 3.2e-03 | 1.1e+01 |
| 4 | 5.8e-04 | 3.5e-04 | 9.8e-04 | 3.0e-02 | 2.8e-03 | 1.1e+01 |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Compact Representation of the Full Broyden Class of Quasi-Newton Updates
Omar DeGuchy
School of Natural Sciences, University of California, Merced, 5200 N. Lake Road, Merced, CA 95343
,
Jennifer B. Erway
Department of Mathematics, PO Box 7388, Wake Forest University, Winston-Salem, NC 27109
and
Roummel F. Marcia
School of Natural Sciences, University of California, Merced, 5200 N. Lake Road, Merced, CA 95343
Abstract.
In this paper, we present the compact representation for matrices belonging to the the Broyden class of quasi-Newton updates, where each update may be either rank-one or rank-two. This work extends previous results solely for the restricted Broyden class of rank-two updates. In this article, it is not assumed the same Broyden update is used each iteration; rather, different members of the Broyden class may be used each iteration. Numerical experiments suggest that a practical implementation of the compact representation is able to accurately represent matrices belonging to the Broyden class of updates. Furthermore, we demonstrate how to compute the compact representation for the inverse of these matrices, as well as a practical algorithm for solving linear systems with members of the Broyden class of updates. We demonstrate through numerical experiments that the proposed linear solver is able to efficiently solve linear systems with members of the Broyden class of matrices to high accuracy. As an immediate consequence of this work, it is now possible to efficiently compute the eigenvalues of any limited-memory member of the Broyden class of matrices, allowing for the computation of condition numbers and the ability perform sensitivity analysis.
Key words and phrases:
Limited-memory quasi-Newton methods; quasi-Newton matrices; eigenvalues; spectral decomposition; inverses; condition numbers
Research supported in part by NSF grants CMMI-1334042 and CMMI-1333326.
1. Introduction
Quasi-Newton methods for minimizing a continuously differentiable function generate a sequence of iterates such that is strictly decreasing at each iterate. Crucially, at each iteration a quasi-Newton matrix is used to approximate that is assumed to be either too computationally expensive to compute or unavailable. The approximation to the Hessian is updated each iteration using the most recently-computed iterate by defining a new quasi-Newton pair given by
[TABLE]
The quasi-Newton Broyden family of updates is given by
[TABLE]
where and
[TABLE]
For , is said to be in the restricted or convex Broyden class of updates. Setting gives the Broyden-Fletcher-Goldfarb-Shanno (BFGS) update, arguably the most widely-used symmetric positive-definite update and a member of the restricted Broyden class. For , the sequence of quasi-Newton matrices generated by this update is not guaranteed to be positive definite. The most well-known update not in the restricted Broyden class is the symmetric rank-one (SR11) update, which is obtained by setting .
Recently, there has been renewed interest in the entire Broyden class of updates, and in particular, in negative values of . Research has shown that negative values of are desirable [7] and under some conditions, quasi-Newton methods based on negative values of exhibit superlinear convergence rates [7, 16]. There has also been empirical evidence that may lead to more efficient algorithms than BFGS [16, 14].
In this paper, we present the compact representation for the full Broyden class of quasi-Newton matrices, allowing to be negative and to change each iteration. We also demonstrate how to efficiently solve linear systems with any member of the Broyden class using the compact representation of its inverse. This paper can be viewed as an extension of the results found in [10, 11], which presented the compact representation for members of the restricted Broyden class and their inverses, as well as a practical method for solving linear systems involving only restricted Broyden class matrices (i.e., ).
One important application of the compact representation is the ability to efficiently compute the eigenvalues and a partial eigenbasis when the number of stored pairs is small [10], which is the case in large-scale optimization with so-called limited-memory quasi-Newton updates. In this setting, only the most recently-computed quasi-Newton pairs , , are stored and used to update using the recursive application of (1). Typically, in large-scale applications regardless of , i.e., (see, e.g., [8]). With the eigenvalues it is now possible to compute condition numbers, compute singular values, and perform sensitivity analysis.
This paper is organized in seven sections. In the second section, we review the compact formulation for the restricted Broyden class of updates () as well as overview the efficient computation of their eigenvalues. The main result of the paper is in Section 3 where the compact representation is given for the entire Broyden class of updates that allows for to change each update. In this section, we also present a practical iterative method to compute the compact representation. In Section 4, we show how to perform linear solves with any member of the Broyden class using the compact representation of their inverse. Numerical experiments are reported in Section 5. Finally, Section 6 contains concluding remarks, and Section 7 includes acknowledgements for this work.
1.1. Notation and assumptions
Throughout this paper, we make use of the following matrices:
[TABLE]
Furthermore, we make use of the following decomposition of :
[TABLE]
where is strictly lower triangular, is diagonal, and is strictly upper triangular. We assume that the matrix is nonsingular for each . Finally, throughout the manuscript, denotes the identity matrix.
2. Compact representation for the restricted Broyden class
Compact representations of matrices from the Broyden class of updates were first described by Byrd et al [8] as matrix decompositions of the form
[TABLE]
where , , and is the initial matrix. The size of depends on the rank of the update; in the case of a rank-two update, , and in the case of a rank-one update, . In the case of the BFGS update (i.e., ), and are given in [8]:
[TABLE]
where and are defined in (2). In [10], we presented the compact representation for any matrix in the restricted Broyden class (i.e, ); in particular, for any matrix in the restricted Broyden class,
[TABLE]
where and are given in (4) and is the diagonal matrix (), given by
[TABLE]
To our knowledge, the only compact formulation known for a member of the Broyden class of updates outside the restricted class is for an SR11 matrix. As with the BFGS case, it is also given in [8]; in particular,
[TABLE]
Notice that in the compact representation for SR11 matrices is half the size of that of for the rank-two updates.
2.1. Applications of the compact representation
In this section, we briefly review how the eigenvalues of any quasi-Newton matrix that exhibits a compact representation can be efficiently computed. The first method to compute eigenvalues of limited-memory quasi-Newton matrices was proposed by Lu [15]. This method makes use of the singular value decomposition and an eigendecomposition of small matrices. An alternative approach, first described by Burdakov et al. [5], uses the QR factorization in lieu of the singular value decomposition. An overview of the method found in [5] follows below. For this section, we assume is small, as in the case of limited-memory quasi-Newton matrices; moreover, we assume is full rank, where is either or . Finally, we assume , where .
Let be the “thin” QR decomposition of , where has orthonormal columns and is upper triangular (see, e.g., [12]). Then,
[TABLE]
The matrix is a real symmetric matrix, whose spectral decomposition can be explicitly computed since is small. Letting be its spectral decomposition gives that
[TABLE]
where is a diagonal matrix whose leading block is while the rest of the matrix is zeros. Thus, the spectral decomposition of is given by (7). (Note that in practice, the matrices and in (7) are not stored.) Note that the matrix has an eigenvalue of with multiplicity and eigenvalues given by , where and . It also turns out that it is also possible to efficiently compute the eigenvectors associated with the nontrivial eigenvalues and only one eigenvector associated with the trivial eigenvalue . (For more details, see [5, 10].)
Generally speaking, computing the eigenvalues of directly is an process. In contrast, the above decomposition requires the QR factorization of and the eigendecomposition of , requiring flops and flops, respectively. Since , the proposed method’s runtimes should increase only linearly with . (For some details regarding updating the (full) QR factorization after a new quasi-Newton pair is computed, see [10].) This efficient computation of eigenvalues and a partial eigenbasis appears in new methods for large-scale optimization [5, 4, 2, 1, 3].
The compact representation is also useful for solving linear systems with quasi-Newton matrices. In [6], Burke et al. use the compact formulation of a BFGS matrix to solve a linear system involving a diagonally-shifted BFGS matrix. In [11], the compact representation for the inverse of any member in the restricted Broyden class is given as well as a practical method to solve linear systems involving these matrices using this representation.
3. Compact Representation for any member of the Broyden class
The main result for this section is a theorem giving the compact representation for any member of the Broyden class. The representation allows to change each iteration and to be negative. In this section, we also present a practical algorithm for computing the compact representation.
We begin by observing that in (1) can be written as
[TABLE]
where
[TABLE]
We now state two lemmas about ; specifically, we provide the condition for when is singular as well as its inverse when it is nonsingular.
Lemma 1. The matrix is singular if and only if .
Proof. The determinant of is given by
[TABLE]
Thus, is singular if and only if
[TABLE]
in other words, .
Lemma 1 states that is singular if and only if the SR11 update is used. Special care will given to the SR11 case, since unlike other members of the Broyden class, this is a rank-one update. For the duration of this manuscript, we let . For , is invertible and its inverse is given in Lemma 2. This result can be derived by using the formula for the inverse of a matrix.
Lemma 2. If is invertible, then
[TABLE]
where and
We now state the main theorem of this manuscript that presents the compact representation for any member of the Broyden class, while allowing the parameter to vary at each iteration. After proving this theorem, we discuss several aspects of this compact representation as well as the key differences between the compact representation for the Broyden class of matrices (Theorem 1) and the compact representation of the restricted Broyden class reviewed in Section 2.
Theorem 1.
Let . Let be the permutation matrix
[TABLE]
with . Additionally, let be defined recursively as
[TABLE]
with . Finally, let be a diagonal matrix such that
[TABLE]
If is a member of the Broyden class of updates, then
[TABLE]
where
[TABLE]
and are defined in (4), and
[TABLE]
Proof. This proof is by induction on . For the base case (), with , and is the scalar . Thus, defined in (15) reduces to
[TABLE]
By (8), is given by where and
[TABLE]
It remains to show that . Since the initial permutation matrix is defined as , we only need to show For simplicity, can be written as
[TABLE]
where
[TABLE]
From Lemma 1, is nonsingular if and only if . Thus, we consider the following two cases separately: (a) and (b) .
Case (a): If , then by (12). By (10), , and thus, can be simplified as
[TABLE]
Finally, since and , then
[TABLE]
and thus, , as desired.
Case (b): If , then is nonsingular and . Thus, it remains to show . By Lemma 1, , making well defined. By Lemma 2, the inverse of is given by
[TABLE]
Note that the last equality in (21) follows since .
For the induction step, assume
[TABLE]
where and
[TABLE]
From (8), we have
[TABLE]
where
[TABLE]
Multiplying (22) by on the right, we obtain
[TABLE]
Then, substituting this into (24) yields
[TABLE]
where . Equivalently,
[TABLE]
where
[TABLE]
Note that has the following decomposition:
[TABLE]
Thus, is nonsingular if and only if ; that is, is nonsingular if and only if (see Lemma 1). To complete the induction step, we will show that the last term in (27) is equal to by considering the following two cases separately: (i) and (ii) .
Case (i): If , then by Lemma 1, . Then
[TABLE]
where
[TABLE]
We now show that . By the inductive hypothesis, is nonsingular. Together with the fact that , it can be checked directly that
[TABLE]
The (2,2)-entry of can be simplified by substituting in for and using the inductive step (22):
[TABLE]
Substituting this into (32) and using the inductive hypothesis gives:
[TABLE]
where
[TABLE]
Note that the middle matrix in (33) can be expressed as
[TABLE]
which is equivalent to
[TABLE]
Substituting this into (33), yields
[TABLE]
where is defined in (11), replacing with , and is defined in (23), replacing with . Thus, , as defined in (15).
We finish this case of the proof by showing that the last term in (27) is equal to . Substituting in (30) gives that the last term in (27) can be written as
[TABLE]
Using (35) this simplifies to
[TABLE]
or, in other words,
[TABLE]
which is exactly . Thus, for , the inductive step is proven.
Case (ii): We consider the case that . We begin by showing that , given in (28). By Lemma 1, . Second, (see (12)), and is well-defined (see (13)). Then, the inverse of can be computed using arguments similar to those found in [10]:
[TABLE]
where
[TABLE]
Simplifying the expressions in (37), yields
[TABLE]
We now simplify the entries of (36) using the same approach as in [10]. Since , then , giving us an expression for the (1,2) and (2,1) entries. The (2,2) block entry is simplified by first multiplying (25) by on the left to obtain . Then,
[TABLE]
Thus, using (34), (36) can be written as
[TABLE]
proving that . Finally, using arguments similar to those in case (i), it can be shown that
[TABLE]
as desired.
There are two main differences in the compact representation for the full Broyden class (Theorem 1) and the restricted Broyden class (Section 2). First, in Theorem 1, will always be the identity matrix for updates belonging to the restricted Broyden class. Second, in (14), the permutation matrices in the definitions of and , (equations (15) and (16), respectively) always cancel out in the restricted Broyden case. To emphasize that the permutation matrices do not cancel out for the general Broyden class updates, we use the notation and , in lieu of and as in the restricted Broyden case.
Finally, we provide some insight regarding the permutation matrices (11). The permutation matrix acts in the following manner:
[TABLE]
so that
[TABLE]
In other words, when applied on the right of , the product permutes the columns of and, using the matrices , combines columns of whenever the update is a rank-one update.
Unfortunately, computing is not straightforward. In particular, the diagonal matrix in Eq. (13) involves for each , which requires for . In the next section, we propose a recursive method for computing that does not require storing the matrices for .
3.1. Computing
In this section, we propose a recursive method for computing from . This method is based on the method proposed in [11] for solving a linear system whose system matrix is generated using the restricted Broyden class of updates. In the proof of Theorem 1, we showed that
[TABLE]
which are given in (29) and (31). We now relate some of the entries in with other stored or computable quantities involving the pairs . The vector can be computed as
[TABLE]
Note that in (42), the vector is the first entries in the last column of , and the vector is the first entries in the last column of . Moreover, the entry , given by , can be computed from the following:
[TABLE]
In (43), the quantity is the th diagonal entry in , and is the inner product of and , the latter vector already having been computed in (42). Recall that the entry is given by , where is the st diagonal entry in . Finally, , which uses the previously computed quantities and .
For the initialization of , notice that in (17) can be written as
[TABLE]
where and are defined as in (20).
In Algorithm 1, we use the recursions described above to compute given in (41).
Note that the matrices and are not explicitly formed in Algorithm 1. Instead, (40) can be used to compute in line 4 of Algorithm 1.
4. Solving linear systems
Given the compact representation of , we can solve
[TABLE]
where , by computing the compact representation of the inverse of . Intuitively speaking, computing the compact representation of the inverse is due to the fact that can also be written using a recursion relation [9]:
[TABLE]
where , , and
[TABLE]
Note that when , then the corresponding is given by
[TABLE]
In this section, we derive the compact representation of the inverse of a Broyden class member. This derivation is similar to the process of finding the inverse of a member of the restricted Broyden class presented in [11].
Applying the Sherman-Morrison-Woodbury formula (see, e.g., [13]) to the compact representation of given in (14), gives that
[TABLE]
For quasi-Newton matrices it is conventional to let denote the inverse of for each ; with this notation, the inverse of is given by
[TABLE]
Using the definition of in (16) gives that
[TABLE]
and thus,
[TABLE]
Substituting into (48) gives the compact representation for the inverse of any member of the full Broyden class:
[TABLE]
where and
[TABLE]
Computing . Using an approach similar to how is computed, can be computed as follows:
[TABLE]
where
[TABLE]
and . The initial matrix is given by the following:
[TABLE]
where and are defined as in (52) with . A practical iterative method to solve equations of the form (45) is given in Algorithm 2.
5. Numerical experiments
In this section we test the accuracy of Algorithm 1 to compute the compact representation by comparing it with the matrix obtained using the Broyden update formula (1). In addition, we demonstrate that solves with in (45) can be done efficiently using Algorithm 2 with respect to both accuracy and time. For these experiments, we used five (limited-memory) quasi-Newton pairs to compute . To generate quasi-Newton pairs, we simulated a line-search method where the iterates are updated as follows:
[TABLE]
where was generated randomly. To initialize the process, we randomly generated initial points and so that . The corresponding gradients, for , were also generated randomly in order to form for . The matrix was initially defined as , where was randomly generated. We considered four experiments where we vary the value of at each iteration . In particular, we chose values of according to the scheme given in Table 1. We ran each experiment ten times with and and report results.
5.1. Accuracy of the compact representation
To test the accuracy of the compact representation, we form each using (14) together with the proposed compact formulation given in Theorem 1. (In particular, we use Algorithm 1 to form .) We denote the resulting matrix by . In Table 2, we report the average relative error of the compact representation in the Frobenius norm:
[TABLE]
where is computed using (1).
The small relative errors in Table 2 reflects the fact that the proposed compact representation for the full Broyden class of quasi-Newton matrices is correct; moreover, the relative errors suggest that Algorithm 1 provides a method to compute the compact representation to high accuracy.
5.2. Accuracy of the compact representation of the inverse
In these experiments, we test the accuracy of Algorithm 2 to solve linear systems of the form , where and is a quasi-Newton matrix. The matrix is generated using five quasi-Newton pairs as described in the beginning of this section. Moreover, the righthand side is randomly generated for each experiment. In Table 3, we present the average residual error using the two-norm:
[TABLE]
where is the solution to using the inverse compact representation computed by Algorithm 2. These results suggest that the compact representation of the inverse can be used to solve linear systems to high accuracy.
In addition, during the experiments, the computational time of the proposed method was recorded and compared to a similar solve using the MATLAB “backslash”. In particular, with the same quasi-Newton pairs, the backslash command was used to solve , where was formed using (1). The times required were averaged for each experiment and for each value of . These results are given in Table 4 and do not include the time MATLAB required to form . Note that the average computational times in Table 4 indicate that as increases using Algorithm 2 becomes significantly less computationally expensive than using the backslash command.
6. Conclusion
We derived the compact formulation for members of the full Broyden class of quasi-Newton updates. The compact representation allows for different at each iteration as well as different ranks of updates. With this compact formulation, we demonstrated how to solve linear systems defined by these limited-memory quasi-Newton matrices. Numerical results suggest that the compact representation can be computed to high accuracy and that we can solve (45) efficiently and accurately using the compact representation of the inverse of . Future work includes integrating this linear solver inside large-scale optimization methods.
7. Acknowledgments
The authors would like to thank Lasith Adhikari and Johannes Brust for helpful discussions regarding this work. This research is supported by NSF grants CMMI-1334042 and CMMI-1333326.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] L. Adhikari, J. B. Erway, S. Lockhart, and R. F. Marcia , Limited-memory trust-region methods for sparse relaxation , Technical Report 2016-1, Wake Forest University, 2016.
- 2[2] L. Adhikari, J. B. Erway, and R. F. Marcia , Trust-region methods for nonconvex sparse recovery optimization , in The Interational Symposium on Information Theory and Its Applications 2016, 2016, p. accepted.
- 3[3] J. Brust, O. Burdakov, J. B. Erway, R. F. Marcia, and Y.-X. Yuan , Shape-changing L-SR 1 trust-region methods , Technical Report 2016-2, Wake Forest University, 2016.
- 4[4] J. Brust, J. B. Erway, and R. F. Marcia , On solving L-SR 1 trust-region subproblems , Computational Optimization and Applications, (2016), pp. 1–22.
- 5[5] O. Burdakov, L. Gong, Y.-X. Yuan, and S. Zikrin , On efficiently combining limited memory and trust-region techniques , Tech. Rep. 2013:13, Link ping University, Optimization, 2013.
- 6[6] J. V. Burke, A. Wiegmann, and L. Xu , Limited memory BFGS updating in a trust-region framework , technical report, University of Washington, 1996.
- 7[7] R. H. Byrd, D. C. Liu, and J. Nocedal , On the behavior of broyden’s class of quasi-Newton methods , SIAM Journal on Optimization, 2 (1992), pp. 533–557.
- 8[8] R. H. Byrd, J. Nocedal, and R. B. Schnabel , Representations of quasi-Newton matrices and their use in limited-memory methods , Math. Program., 63 (1994), pp. 129–156.
