Tensor-Free Proximal Methods for Lifted Bilinear/Quadratic Inverse Problems with Applications to Phase Retrieval
Robert Beinert, Kristian Bredies

TL;DR
This paper introduces tensor-free proximal algorithms for solving bilinear and quadratic inverse problems, notably phase retrieval, by leveraging low-rank representations and an augmented Lanczos process to improve efficiency and convergence.
Contribution
It develops tensor-free versions of singular value thresholding methods for inverse problems, enabling practical computation without high-dimensional tensor operations.
Findings
Efficient phase retrieval via tensor-free algorithms
Enhanced convergence with reweighting technique
Incorporation of smoothness constraints improves recovery
Abstract
We propose and study a class of novel algorithms that aim at solving bilinear and quadratic inverse problems. Using a convex relaxation based on tensorial lifting, and applying first-order proximal algorithms, these problems could be solved numerically by singular value thresholding methods. However, a direct realization of these algorithms for, e.g., image recovery problems is often impracticable, since computations have to be performed on the tensor-product space, whose dimension is usually tremendous. To overcome this limitation, we derive tensor-free versions of common singular value thresholding methods by exploiting low-rank representations and incorporating an augmented Lanczos process. Using a novel reweighting technique, we further improve the convergence behavior and rank evolution of the iterative algorithms. Applying the method to the two-dimensional masked Fourier phase…
| reweighting | ||||||
| Lanczos vectors | 100 | 50 | 20 | 10 | 10 | |
| iteration vectors | — | 50 | 25 | 10 | 5 | 5 |
| time in seconds | 383.23 | 139.51 | 67.94 | 60.03 | 57.62 | 35.74 |
| average restarts | — | 0.00 | 0.06 | 2.89 | 10.01 | 4.89 |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
\useosf\KOMAoption
BCOR5mm \KOMAoptionDIVclassic \KOMAoptionheadincludefalse \KOMAoptionfootincludefalse \KOMAoptionpagesizeauto \recalctypearea\KOMAoptionheadingssmall \KOMAoptionnumbersendperiod
\clearscrheadfoot\ohead\pagemark \cehead \cohead
plainheaderbreak
Definition]Problem Definition]Assumption
Definition]Theorem Definition]Proposition Definition]Lemma Definition]Corollary
Definition]Algorithm
**Tensor-Free Proximal Methods for Lifted Bilinear/Quadratic Inverse Problems with Applications to Phase Retrieval
Robert Beinert1 and Kristian Bredies1
1
Institut für Mathematik und Wissenschaftliches Rechnen
Karl-Franzens-Universität Graz
Heinrichstraße 36
8010 Graz, Austria
**Correspondence
R. Beinert:
K. Bredies:**
**
Abstract. We propose and study a class of novel algorithms that aim at solving bilinear and quadratic inverse problems. Using a convex relaxation based on tensorial lifting, and applying first-order proximal algorithms, these problems could be solved numerically by singular value thresholding methods. However, a direct realization of these algorithms for, e.g., image recovery problems is often impracticable, since computations have to be performed on the tensor-product space, whose dimension is usually tremendous. To overcome this limitation, we derive tensor-free versions of common singular value thresholding methods by exploiting low-rank representations and incorporating an augmented Lanczos process. Using a novel reweighting technique, we further improve the convergence behavior and rank evolution of the iterative algorithms. Applying the method to the two-dimensional masked Fourier phase retrieval problem, we obtain an efficient recovery method. Moreover, the tensor-free algorithms are flexible enough to incorporate a-priori smoothness constraints that greatly improve the recovery results.
Keywords. Tensor-free proximal methods, tensorial lifting, nuclear norm relaxation, reweighting techniques, bilinear inverse problem, quadratic inverse problem, masked Fourier phase retrieval
AMS subject classification. 45Q05, 65F10, 65R32, 65K10
1 Introduction
The theory of inverse problems is nowadays one of the main tools to deal with recovery problems in medicine, engineering, and life sciences. The real-world applications of this theory embrace for instance computed tomography, magnetic resonance imaging, and deconvolution problems in microscopy, see [BB98, MS12, Ram05, SW13, Uhl03, Uhl13]. Besides these recent monographs, which are only a small selection, there exist many further publications about applications, regularization, and numerical solvers.
In this paper, we consider the subclass of bilinear and quadratic inverse problems [BB18]. Problem formulations of these kinds originate form real-world applications in imaging and physics [SGG*+*09] like blind deconvolution [BS01, JR06], deautoconvolution [GH94, GHB*+*14, ABHS16], phase retrieval [DF87, Mil90, SSD*+*06], parallel imaging in MRI [BBM*+*04], and parameter identification in EIT [MS12]. Although these problems may be seen as non-linear inverse problems and can be solved by appropriate non-linear solvers, the question arises if we can exploit the specific bilinear or quadratic structure of these problem.
One of the most famous approaches of this kind is PhaseLift [CSV13], where the generic phase retrieval problem is lifted to a linear inverse problem with rank-one constraint using the universal property of the tensor product. After a relaxation the lifted problem is solved by semi-definite programming. From the theoretical side, one has proved that this methodology yields the true solution with very high probability. Based on PhaseLift, we will derive convex lifting methods for general bilinear and quadratic inverse problems.
Similarly to PhaseLift, the relaxations here require the minimization of a nuclear norm functional on the tensor product, which can be done by applying proximal minimization methods, see for instance [CP16] and references therein. Since the dimension of the lifted problem can become huge, the question arises how one can implement these methods numerically. Therefore, we develop tensor-free, equivalent versions that can be performed in an efficient and memory-saving manner. The main benefit is here that the tensor-free reformulations of the required tensorial operations are performed exactly without additional error. To improve the convergence and solution behaviour, we additionally propose a novel reweighting technique
The paper is organized as follows: In Section 2, we introduce the considered bilinear and quadratic inverse problems in more detail. The focus here lies on the bilinear setting since quadratic formulations are based on underlying bilinear structures. Based on the universal property of bilinear mappings and the nuclear norm heuristic, we then derive a relaxed convex minimization problem with linear lifted forward operator. To stabilize the lifted problem regarding noise and measurement errors, we additionally consider a Tikhonov approach.
In Section 3, we develop a proximal solver based on the first-order primal-dual method of Chambolle and Pock [CP11] to solve the lifted problem numerically. The primal-dual iteration is here only one explicit example and can be replaced by any other proximal method. In so doing, we obtain a singular value thresholding depending on the actual Hilbert spaces building the domain of the original problem. Although the tensorial lifting allows us to apply linear methods, the dimension of the relaxed minimization problem becomes tremendous.
To overcome this issue, we derive a tensor-free representation of the suggested algorithm. The efficient computation of the required singular value thresholding is here ensured by exploiting an orthogonal power iteration or, alternatively, an augmented Lanczos process, see Section 4. Moreover, in Section 5, we introduce a novel Hilbert space reweighting to promote low-rank iterations and solutions. We complete the paper with a numerical study, where we consider the masked Fourier phase retrieval, see Section 6.
The contribution of the paper is twofold: Firstly, we derive a novel tensor-free proximal algorithm based on a convex lifting and relaxation. Secondly, we introduce a novel phase retrieval technique, which solves high-dimensional instances of the masked phase retrieval problem. Moreover, our approach allows us to incorporate smoothness constraints or relation between different pixels to improve the convergence of the algorithm.
2 Convex liftings of bilinear and quadratic inverse problems
Bilinear and quadratic problem formulations arise in a wide range of applications in imaging and physics [SGG*+*09] like blind deconvolution [BS01, JR06], deautoconvolution [GH94, GHB*+*14, ABHS16], phase retrieval [DF87, Mil90, SSD*+*06], parallel imaging in MRI [BBM*+*04], and parameter identification in EIT [MS12]. Since we are mainly interested in computing a numerical solution, we restrict ourselves to finite-dimensional bilinear problems of the form
[TABLE]
where is a bilinear operator from into , and where , , and are real finite-dimensional Hilbert spaces equipped with some inner product. Simultaneously, we study finite-dimensional quadratic problems of the form
[TABLE]
A quadratic operator is here the restriction of an associate bilinear operator to its diagonal and is thus given by . Again, the Hilbert spaces and are finite dimensional. Without loss of generality, one may assume that the associated bilinear operator is symmetric, i.e. for all , , which can be enforced by considering . However, we will make no use of this restriction.
For the sake of simplicity, we use the following notation and write an arbitrary vector in the form , which may be interpreted as the coefficient vector with respect to some finite basis. The inner product of can now be stated in the form , where is some symmetric, positive definite matrix in , and where the inner products on the right-hand side denote the usual Euclidian inner product . Here labels the transposition of a vector or matrix. For the remaining spaces and , we proceed in the same manner with the associate matrices and , respectively.
Although we restrict ourselves to the real-valued setting, all following algorithms and statements remain valid for the complex-valued setting, where one considers sesquilinear mappings with , , and . Changing the notation to , replacing the property ‘symmetric’ by ‘Hermitian,’ and using the real part of the inner products, e.g. , where is the transposition and conjugation, all considerations translate one to one. In the complex case, the associate matrices , , and may also be complex-valued, Hermitian, and positive definite.
Inspired by PhaseLift [CESV13, CSV13], which exploits the solution strategy developed for matrix completion problems [CCS10, MGC11], we suggest to tackle the general bilinear and quadratic inverse problem () and () by convex liftings and relaxations. Our approach is here based on the so-called universal property of the tensor product with respect to bilinear mappings, see for instance [Rya02, Theorem 2.9]. In the finite-dimensional setting, the lifting may be stated as follows.
{Theorem}
[Bilinear lifting, [Rya02]] For every bilinear mapping , there exists a unique linear mapping such that .
In the finite-dimensional setting considered by us, the tensor product can be simply identified with the matrix space , where the rank-one tensor corresponds to the matrix . The universal property is also applicable to quadratic mappings, where the tensor product is restricted to the subspace of symmetric tensors which is isomorphic to the space of symmetric matrices.
{Corollary}
[Quadratic lifting] For every bounded quadratic mapping , there exists a unique linear mapping such that .
Proof 1**.**
*Since the lifting of the associate bilinear mapping is unique, the restriction yields a unique linear mapping with the asserted properties. *
Due to the uniqueness of the lifting, the bilinear inverse problem () and the quadratic inverse problem () are equivalent to the linear inverse problems
[TABLE]
where the additional constraint means that is positive semi-definite. Although the positive semi-definiteness of the lifted quadratic problem is not mandatory, it strongly reduces the space of possible solutions. The central benefit of these reformulations is the shift of the non-linearity of the forward operator into the rank-one constraint. Although the problem is now linear, we have to deal with an additional non-convex side condition.
In order to eliminate the non-linear constraint for bilinear operators, we first rewrite () into the rank minimization problem
[TABLE]
and then relax the non-convex objective function by replacing it with the nuclear or projective norm of the tensor . Depending on the Hilbert spaces and , this norm is defined by
[TABLE]
where the infimum is taken over all finite representations of the tensor . In so doing, we finally obtain the convex minimization problem
[TABLE]
i.e., with linear constraints.
Since as well as is a Hilbert space, the nuclear norm here coincides with the trace class norm or with the Schatten one-norm; so the nuclear norm is simply the sum of the corresponding singular values of the matrix with respect to the Hilbert spaces and , cf. for instance [Wer02, Satz VI.5.5].
{Lemma}
[Projective norm, [Wer02]] For , the projective norm is given by , where denotes the -th singular value and the rank of .
The main idea behind the nuclear norm heuristic is that the projective norm is the convex envelope of the rank on the unit ball with respect to the spectral norm. Usually, the nuclear norm heuristic empirically yields low-rank solutions of the matrix equation , see for example [CCS10, MGC11, RFP10] and references therein. If the lifted operator fulfills an appropriate restricted isometry property, one can rigorously prove that the bilinear inverse problem () and the relaxed nuclear norm minimization problem () are equivalent, see [RFP10, Theorem 3.3].
For quadratic inverse problems, we follow a similar approach. This means that we first rewrite () into a rank minimization problem over the positive semi-definite (symmetric) matrix cone. In order to obtain a convex minimization problem, we again replace the objective function by the projective norm. Since the singular values of a symmetric tensor coincide with the absolute value of its eigenvalues , the positive semi-definiteness may be incorporated into the projective norm by
[TABLE]
where the indicator function is equal to [math] for arguments in and otherwise; so the modified projective “norm” sums up the non-negative eigenvalues for positive semi-definite tensors and is infinity otherwise. To solve the quadratic inverse problem () numerically, we thus consider the convex minimization problem
[TABLE]
Up to this point, the given data have been known exactly. A first approach to incorporate noisy measurements into the inverse problems () and () could be to extend the subspace of exact solutions with respect to a supposed error level. More precisely, one may consider the minimization problems
[TABLE]
In other words, we minimize over all solutions that approximate the given noisy data with up to the error level .
Another approach is to incorporate the data fidelity of the possible solutions directly into the objective function. Following this approach, we may minimize a Tikhonov functional with projective norm regularization to solve () and (). In more detail, we consider the problems
[TABLE]
All proposed convex relaxations of the lifted problems () and () have in common that the minimization of the projective norm usually promotes low-rank tensors and, in the best case, yields a rank-one solutions. The later case forms our basic premise to derive numerical solutions.
{Assumption}
[Basic premise] Suppose that the solutions of the relaxation (), (), or () are at most rank-one tensors such that (approximately) solves the bilinear inverse problem (). Likewise, suppose that the solutions of the relaxations (), (), or () have at most rank one so that they (approximately) solve the quadratic inverse problem ().
3 Proximal algorithms for the lifted problem
To exploit the nuclear norm heuristic, we have to solve the minimization problem (), (), (), (), (), and () in an efficient manner. Looking back at the comprehensive literature about matrix completion [CR09, CCS10, MGC11], low-rank solutions of matrix equations [RFP10], and PhaseLift [CESV13, CSV13], there exists several numerical methods like interior-point methods for semi-definite programming, fixed point iterations, singular value thresholding algorithm, projected subgradient methods, and low-rank parametrization approaches.
In order to solve the lifted bilinear and quadratic inverse problems, we follow another approach. Let us first consider the actual structure of the six derived minimization problems in Section 2, which is given by
[TABLE]
where denotes the lifted bilinear or quadratic forward operator, with describes the data fidelity, and is the (modified) projective norm. Since the regularization mapping and, in some circumstances, the data fidelity mapping are non-smooth but convex functions, we may apply proximal first-order methods like the forward-backward splitting, the primal-dual method by Chambolle–Pock, the alternating directions method of multipliers (ADMM), the Douglas–Rachford splitting, and several variants of these and other algorithms, see for instance [CP16].
Although we can apply any of these algorithms, we exemplarily consider the forward-backward splitting [LM79, CW05]
[TABLE]
and the primal-dual method [CP11, Algorithm 1]
[TABLE]
with fixed parameters and . The details of these methods are given below.
First, the forward-backward splitting and the primal-dual method are originally defined for linear forward operators between finite-dimensional Hilbert spaces; so we have to equip the tensor product with a corresponding structure. In the following, we always assume that this structure arises from the inner product defined by
[TABLE]
see for instance [KR83, Section 2.6]. Using the matrices and defining the inner products of and , we can write the resulting inner product of the Hilbertian tensor product in the form
[TABLE]
where the inner products on the right-hand side denote the Hilbert–Schmidt inner product for matrices. For the quadratic setting, the symmetric Hilbertian tensor product is the related subspace embracing the symmetric tensors.
Next, the above stated methods are mainly based on concepts from convex analysis, which is reflected in the presuppositions; so the data fidelity mapping and, similarly, the regularization mapping have to be convex and lower semi-continuous. Commonly, a function on the real Hilbert space is called convex when
[TABLE]
for all and all , and lower semi-continuous when
[TABLE]
for all sequences in with . Since and in the relaxed minimization problems of Section 2 represent norms or indicator functions on closed convex sets, here the assumptions for the primal-dual method are always fulfilled. The forward-backward splitting additionally requires a differentiable data fidelity with Lipschitz-continuous derivative; so this method can only be applied to the Tikhonov relaxations.
For the primal-dual iteration (3), the first proximity operator is computed with respect to the Legendre–Fenchel conjugate . For any function , where denotes an arbitrary Hilbert space, the Legendre–Fenchel conjugate is defined by
[TABLE]
and is always convex and lower semi-continuous, see [Roc70]. If the function is lower semi-continuous and convex, the subdifferential at a certain point is given by
[TABLE]
and figuratively consists of all linear minorants, see [Roc70]. Finally, the proximation of a lower semi-continuous, convex function is defined as the unique minimizer
[TABLE]
Using the subdifferential calculus, one can show that the proximation coincides with the resolvent, i.e.
[TABLE]
see for instance [Roc70, CP16].
The most crucial step in the forward-backward splitting (2) and the primal-dual iteration (3) is the application of the proximal projective norm , whereas the computation of proximal conjugated data fidelity is usually much simpler. To determine the proximal projective norm explicitly, we exploit the singular value decomposition of the argument with respect to the underlying Hilbert spaces and , which can be derived by an adaption of the classical singular value decomposition for matrices with respect to the Euclidian inner product.
{Lemma}
[Singular value decomposition] Let be a tensor in . The singular value decomposition of with respect to and is given by
[TABLE]
where is the classical singular value decomposition of with respect to the Euclidian inner product.
Remark \theDefinition.
Unless stated otherwise, the square roots and are taken with respect to the factorizations
[TABLE]
Allowing also non-symmetric but invertible factorizations, the root and are here non-unique. Possible candidates are the symmetric positive definite square root or the Cholesky decomposition of and . In the following, the roots are solely required to derive the proximal projective norm mathematically. Their actual computation is not necessary in the final tensor-free algorithm.
Proof 2** (§3).**
By assumption the possibly non-symmetric square roots and are invertible. Considering the classical Euclidian singular value decomposition of the matrix , we immediately obtain
[TABLE]
The last arrangement may be easily validated by using the matrix notation of the rank-one tensor . Due to the identity
[TABLE]
*for all , the singular vectors form an orthonormal system with respect to as well as their counterparts with respect to . *
Remark \theDefinition.
The singular value decomposition can also be interpreted as a matrix factorization of the tensor . In this case, we have the factorization
[TABLE]
where is the classical Euclidian singular value decomposition of with the left singular vectors , the right singular vectors , and the singular values .
For the quadratic setting, we will rely on the eigenvalue decomposition instead of the singular value decomposition.
{Corollary}
[Eigenvalue decomposition] Let be a tensor in . The eigenvalue decomposition of with respect to is given by
[TABLE]
where is the classical eigenvalue decomposition of with respect to the Euclidian inner product.
Proof 3**.**
*Due to the close relation between singular value decomposition and eigenvalue decomposition for symmetric matrices, the assertion follows from §3 with . *
With the adaption in §3, we can apply any numerical singular value method to compute the singular value decomposition of a given tensor. The next ingredient is the subdifferential of the nuclear norm based on and with respect to . In the following, the set-valued signum function is defined by
[TABLE]
{Lemma}
[Subdifferential] Let be a tensor in . Then the subdifferential of the projective norm at with respect to is given by
[TABLE]
where is a valid singular value decomposition of with respect to and .
Proof 4**.**
The central idea to compute the subdifferential is to rely on the corresponding statement for the Euclidian setting in [Lew95]. More precisely, if and are equipped with the Euclidian inner product, then the subdifferential with respect to the Hilbert–Schmidt inner product on is given by
[TABLE]
where is an Euclidian singular value decomposition of , see [Lew95, Corollary 2.5]. The upper bound is here some number less than or equal to , and the singular value decomposition of may contain zero as singular value.
Next, we adapt this result to our specific case. Therefore, we exploit that the projective norm is the sum of the singular values. Using §3, we notice that the generalized projective norm of a tensor is given by
[TABLE]
where the norm on the right-hand side is the usual projective norm with respect to the Euclidian inner product. In order to consider the inner product of in the subdifferential, we exploit that
[TABLE]
if and only if
[TABLE]
where the inner product on the right-hand side is the usual Hilbert–Schmidt scalar product for matrices, see (4). Thus, the subdifferential with respect to the scalar product can be expressed in terms of by
[TABLE]
*Plugging (5) into (6), and using the chain rule, we obtain the assertion. *
With the characterization of the subdifferential, we are ready to determine the proximity operator for the projective norm with respect to the underlying Hilbert spaces and . In the following, the soft-thresholding operator with respect to the level is defined by
[TABLE]
{Theorem}
[Proximal projective norm] Let be a tensor in . The proximation of the projective norm is given by
[TABLE]
where is a singular value decomposition of with respect to the Hilbert spaces and .
Proof 5**.**
In order to establish the statement, we only have to convince ourselves that is the resolvent for the given , i.e. . Since is already represented by its singular value decomposition, Proof 3 implies
[TABLE]
*If , the related summand becomes . Otherwise, the summand is with . Since the singular value decomposition of obviously has this form, the proof is completed. *
Remark \theDefinition (Singular value thresholding).
The proximation of the projective norm with respect to and is simply a soft thresholding of the corresponding singular values. In the following, we denote the matrix-valued operation
[TABLE]
as (soft) singular value thresholding .
The proximation associated to the modified projective norm used in the quadratic setting may be computed analogously. As preliminary step, we determine the subdifferential of the modified norm with respect to the symmetric Hilbertian tensor product. In a nutshell, we have simply to replace the set-valued signum function by the modified signum defined by
[TABLE]
Since the truncated absolute value is not an absolutely symmetric function, one cannot apply the subdifferential characterization in [Lew96]; there we compute the subdifferential directly.
{Lemma}
[Subdifferential] Let be a positive semi-definite tensor in . The subdifferential of the modified projective norm at with respect to is given by
[TABLE]
where is an eigenvalue decomposition of with respect to . If is not positive semi-definite, then the subdifferential is empty.
Proof 6**.**
Similarly to Proof 3, we rely on the corresponding statement for the Euclidian setting in [Lew99]. If we endow with the Euclidian inner product, then the subdifferential of the modified projective norm is determined by
[TABLE]
*where is a eigenvalue decomposition of , see [Lew99, Theorem 6]. The transformation to an arbitrary Hilbert space works exactly as in the proof of Proof 3. *
Together with the positive soft-thresholding operator defined by
[TABLE]
the computed subdifferential leads us to the following proximity operator.
{Theorem}
[Proximal projective norm] Let be a tensor in . Then the proximation of the modified projective norm is given by
[TABLE]
where is an eigenvalue decomposition of with respect to .
Proof 7**.**
*The assertion follows similarly to Proof 4 by replacing the subdifferential in Proof 3 by Proof 5 and the singular value decomposition by an eigenvalue decomposition. *
Remark \theDefinition.
Analogously to the tensor-valued singular value thresholding operator , we define the positive eigenvalue thresholding operator , which additionally projects the argument to the positive semi-definite cone, by
[TABLE]
Knowing the proximation of the (modified) projective norm, we are now able to perform proximal algorithms to solve the minimization problem in Section 2. Although we can use any of the mentioned method, here we restrict ourselves the primal-dual iteration (3). First, we consider the bilinear minimization problems (), (), and ().
For exactly given data corresponding to the minimization problem (), the data fidelity functional corresponds to with . A simple computation shows that the conjugate is given by and the associate proximal mapping by
[TABLE]
Thus, we obtain the following algorithm.
{Algorithm}
[Primal-dual for exact data]
- (i)
Initiation: Fix the parameters and . Choose an arbitrary start value in , and set to . 2. (ii)
Iteration: For , update , , and by
[TABLE]
Remark \theDefinition.
If the projective norm in () is weighed with a parameter in order to control the influence of the data fidelity and the regularization, cf. (), then the iteration in Proof 7 changes slightly. More precisely, one has to replace by .
For inexact data , we first consider the Tikhonov minimization (), whose data fidelity corresponds to . Here the conjugate is given by with subdifferential . Again a simple computation leads to the proximation
[TABLE]
which yields the following algorithm.
{Algorithm}
[Tikhonov regularization]
- (i)
Initiation: Fix the parameters and . Choose an arbitrary start value in , and set to . 2. (ii)
Iteration: For , update , , and by
[TABLE]
Remark \theDefinition.
Since the data fidelity for the Tikhonov functional is differentiable, one may here apply the forward-backward splitting as an alternative for the primal-dual iteration. In so doing, the whole iteration in Proof 7.ii becomes
[TABLE]
Analogously, one can here apply FISTA [BT09] in order to improve the convergence.
If we incorporate the measurement errors by extending the solution space as in (), then the data fidelity is chosen by , where denotes the closed unit ball in the Hilbert space , and is the indicator functional of the closed -ball in , i.e., if and otherwise. Since the conjugation of the unit ball yields the corresponding norm, we obtain with subdifferential
[TABLE]
cf. [RW09, Exercise 8.27]. Since the proximation is not as simple as in the previous cases, we give a more detailed computation.
{Lemma}
[Proximity operator] Let the functional be defined by . The proximation of is then given by
[TABLE]
Proof 8**.**
The vector is the resolvent if and only if
[TABLE]
which is an immediate consequence of (7). Bringing to the left-hand side, we are looking for a such that
[TABLE]
For , the last condition is fulfilled for . Otherwise, it follows that for some . With the notation , the first condition becomes
[TABLE]
*Since , we obtain , and consequently, the assertion. *
Remark \theDefinition.
The central part of the resolvent in Proof 7 is given by the operator with
[TABLE]
Pictorially, this operator may be interpreted as shrinkage or contraction around the origin.
After this small digression to compute the proximation of the conjugated data fidelity, the minimization problem () may be solved by the following primal-dual iteration.
{Algorithm}
[Primal-dual for inexact data]
- (i)
Initiation: Fix the parameters and . Choose an arbitrary start value in , and set to . 2. (ii)
Iteration: For , update , , and by
[TABLE]
The weighing between data fidelity and regularization in §3 analogously holds for Proof 8.
The central differences between the primal-dual iterations in Proof 7, 7, and 8 for the minimization problems (), (), and () are contained in the dual update of . If the parameter is chosen close to zero, the three iterations nearly coincide. Thus, all three iterations should yield similar results; so Proof 7 should also be able to deal with noisy measurements.
Remark \theDefinition.
For the relaxations (), (), and () of the quadratic inverse problem (), we can derive analogous algorithms. We do not state these variants more closely since the only differences are the application of the eigenvalue thresholding instead of the singular value thresholding and the usage of the quadratic lifting instead of the bilinear lifting . Up to these two small modifications the methods completely coincide with the derived algorithms for bilinear inverse problems.
4 Tensor-free singular value thresholding
Each of the proposed methods solving bilinear or quadratic inverse problems is based on a singular value thresholding on the tensor product or respectively. If the dimension of the original space or is already enormous, then the dimension of the tensor product literally explodes, which makes the computation of the required singular value decomposition impracticable. This difficulty occurs for nearly all bilinear and quadratic image recovery problems. However, since the tensor is generated by a singular value thresholding, the iterations usually possesses a very low rank. Hence, the involved tensors can be stored in an efficient and storage-saving manner. In order to determine this low-rank representation, we only compute a partial singular value decomposition of the argument of by deriving iterative algorithms only requiring the left- and right-hand actions of .
Our first algorithm is based on the orthogonal iteration with Ritz acceleration, see [Ste69, GV13]. In order to compute the leading singular values, the main idea is here a joint power iteration over two -dimensional subspaces and alternately generated by and . These subspaces are represented by orthonormal bases in and in .
{Algorithm}
[Subspace iteration] Input: , , .
- (i)
Choose , whose columns are orthonormal with respect to . 2. (ii)
For , repeat:
- (a)
Compute , and reorthonormalize the columns in . 2. (b)
Compute , and reorthonormalize the columns in . 3. (c)
Determine the Euclidian singular value decomposition
[TABLE]
and set and .
until singular vectors have converged, which means
[TABLE]
where denotes the operator norm estimated by .
Output: , , with .
Here, reorthonormalization means that for each applicable , the span of the first columns of the matrix and its reorthonormalization coincide, and that the reorthonormalized matrix has orthonormal columns. This can, for instance, be achieved by the well-known Gram–Schmidt procedure.
Under mild conditions on the subspace associated to , the matrices , , and converge to leading singular vectors as well as to the leading singular values of a singular value decomposition .
{Theorem}
[Subspace iteration] If none of the basis vectors in is orthogonal to the leading singular vectors , and if , then the singular values in §4 converge to with a rate of
[TABLE]
Proof 9**.**
By the construction in steps (a) and (b), the columns in and form orthonormal systems in and . In this proof, we denote the corresponding subspaces by and , which are related by and . Due to the basis transformation in (c), the columns of and also form orthonormal bases of and . Next, we exploit that the projection onto acts as identity on by construction. Since is a basis of , and since by the singular value decomposition in step (c), we have
[TABLE]
and diagonalizes on the subspace .
Using the substitutions
[TABLE]
as well as
[TABLE]
we notice that the iteration in §4 is composed of two main steps. First, in (a) and (b), we compute an orthonormal basis of
[TABLE]
Secondly, (8) implies that we determine an Euclidian eigenvalue decomposition on the subspace by
[TABLE]
and .
*This two-step iteration exactly coincides with the orthogonal iteration with Ritz acceleration for the matrix , see [Ste69, GV13]. Under the given assumptions, this iteration converges to the leading eigenvalues and eigenvectors with the asserted rates. In view of §3, the columns in and together with converge to the leading components of the singular value decomposition of with respect to and . *
Considering the subspace iteration (§4), notice that the algorithm does not need an explicit representation of its argument but the left- and right-hand actions of as a matrix-vector multiplication. We may thus use the subspace iteration to compute the singular value thresholding without a tensor representation of .
{Algorithm}
[Tensor-free singular value thresholding] Input: , , , .
- (i)
Apply §4 with the following modifications:
- •
If for all , increase and extend by further orthonormal columns, unless , i.e., when the columns of would become linearly dependent.
- •
Additionally, stop the subspace iterations when the first singular values with have converged and . Otherwise, continue the iteration until all non-zero singular values converge and set . 2. (ii)
Set , , and
[TABLE]
Output: , , with .
{Corollary}
[Exact singular value thresholding] If the non-zero singular values of are distinct, and if none of the columns in is orthogonal to the singular vectors with , then Proof 9 computes the low-rank representation of .
Although Proof 9 for generic start values always yields the singular value thresholding, the convergence of the subspace iteration is rather slow. Therefore, we now derive an algorithm that is based on the Lanczos-based bidiagonalization method proposed by Golub and Kahan in [GK65] and the Ritz approximation in [GV13]. This method again only require the left-hand and right-hand action of with respect to a given vector. For simplifying the following considerations, we initially present the employed Lanczos process with respect to the Euclidian singular value decomposition.
The central idea is here to construct, for fixed , orthonormal matrices and such that the transformed matrix
[TABLE]
is bidiagonal, and then to compute the singular value decomposition of by determining orthogonal matrices , , and in such that
[TABLE]
Defining and as
[TABLE]
we finally obtain a set of approximate right-hand and left-hand singular vectors, see [GK65, BR05, GV13].
The values and of the bidiagonal matrix and the related vectors and can be determined by the following iterative procedure [GK65]: Choose an arbitrary unit vector with respect to the Euclidian norm, and compute
[TABLE]
For the first iteration, we set and . If vanishes, then we stop the Lanczos process since we have found an invariant Krylov subspace such that the computed singular values become exact.
In order to compute an approximate singular value decomposition with respect to the Hilbert spaces and , we exploit §3 and perform the Lanczos bidiagonalization regarding the transformed matrix . Moreover, we incorporate the back transformation in §3 with the aid of the substitutions
[TABLE]
In this manner, the square roots and and their inverses cancel out, and we obtain the following algorithm, which only relies on the original matrices and .
{Algorithm}
[Lanczos bidiagonalization] Input: , .
- (i)
Initiation: Set and . Choose a unit vector . 2. (ii)
Lanczos bidiagonalization: For while , repeat:
- (a)
Compute , and reorthogonalize with in . 2. (b)
Determine , and set .
Compute and reorthogonalize with in . 3. (c)
Determine , and set . 3. (iii)
Compute the Euclidian singular value decomposition of according to (9), i.e. , and set and .
Output: , , with .
Remark \theDefinition.
The bidiagonalization by Golub and Kahan is based on a Lanczos-type process, which is numerically unstable in the computation of and . For this reason, we have to reorthogonalize all newly generated vectors and with the previously generated vectors, see [GK65]. This amounts to projecting to the orthogonal complement of the span of and the analog for , for instance, via the Gram–Schmidt procedure.
Remark \theDefinition.
The computation of the last seems to be superfluous since it is not needed for the determination of the matrix . On the other side, this vector represents the residuals of the approximate singular value decomposition. More precisely, we have
[TABLE]
for , see [BR05]. Here the vectors , , and denote the columns of the matrices , , and respectively; the singular values of are given by ; the vector represents the last unit vector .
Since the bidiagonalization method by Golub and Kahan is based on the Lanczos process for symmetric matrices, one can apply the related convergence theory to show that the approximate singular values and singular vectors – for increasing – converge to the wanted singular value decomposition of , see [GV13]. Since we are only interested in the leading singular values and singular vectors, and since we want to choose the matrix as small as possible, this convergence theory does not apply to our setting.
In order to improve the quality of the approximate singular value decomposition computed by Proof 9, we here use a restarting technique proposed by Baglama and Reichel [BR05]. The central idea is to adapt the Lanczos bidiagonalization such that the method can be restarted by a set of previously computed Ritz vectors. For this purpose, Baglama and Reichel suggest a modified bidiagonalization of the form
[TABLE]
where the first columns of the orthonormal matrices
[TABLE]
are predefined by the Ritz vectors of the previous iteration. For the computation of the first leading singular values and singular vectors, we employ the following algorithm [BR05], which has been adapted to our setting by incorporating §3 and the substitution (11).
{Algorithm}
[Augmented Lanczos Bidiagonalization] Input: , , .
- (i)
Apply Proof 9 to compute an approximate singular value decomposition . 2. (ii)
For , until singular vectors have converged, which means
[TABLE]
where denotes the operator norm estimated by , repeat:
- (a)
Initialize the new iteration by setting and for . Further, set and . 2. (b)
Compute , and reorthogonalize with in . 3. (c)
Determine , compute the inner products for , and reorthogonalize in by
[TABLE] 4. (d)
Set and . 5. (e)
Determine , and set . 6. (f)
Calculate the remaining values of by applying step (ii) of Proof 9 with . 7. (g)
Compute the Euclidian singular value decomposition of in (12), i.e. , and set and . 3. (iii)
Set , , and .
Output: , , with .
Remark \theDefinition.
The stopping criterion in step (ii) originates from the error representation in (11). For the operator norm , one may use the maximal leading singular values of the former iterations, which usually gives a sufficiently good approximation, see [BR05].
Although the numerical effort of the restarted augmented Lanczos process is enormously reduced compared with the subspace iteration, we are unfortunately not aware of a convergence and error analysis for this specific variant of Lanczos-type method. Nevertheless, we can employ the obtained partial singular value decomposition to determine the singular value thresholding.
{Algorithm}
[Tensor-free singular value thresholding] Input: , , , , .
- (i)
Apply Proof 9 with the following modifications:
- •
If for all , increase and with , unless , i.e., when in Proof 9 vanishes.
- •
Additionally, stop the augmented Lanczos method when the first singular values with have converged and . Otherwise, continue the iteration until all non-zero singular values converge and set . 2. (ii)
Set , , and
[TABLE]
Output: , , with .
Remark \theDefinition (Tensor-free eigenvalue thresholding).
Using the relation between eigenvalues and singular values, we apply §4 and 9 in the same manner to compute the positive eigenvalue thresholding. More precisely, with and , we obtain the eigenvalue decomposition from the singular value decomposition, i.e.,
[TABLE]
Analogously, we can transfer Proof 9 and 9 to the quadratic setting.
Besides the singular value thresholding, the proximal methods in Section 3 to solve the lifted and relaxed bilinear and quadratic problems in Section 2 require the application of the lifted operators and as well as their adjoints and . Both operations can be computed in a tensor-free manner. Assuming that has a low rank, one may compute the lifted bilinear forward operator with the aid of the universal property in §2.
{Corollary}
[Tensor-free bilinear lifting] Let be a bilinear mapping. If has the representation with , , and , then the lifted forward operator acts by
[TABLE]
Likewise, one may apply the lifted quadratic forward operator by using §2.
{Corollary}
[Tensor-free quadratic lifting] Let be a quadratic mapping. If has the representation with , , then the lifted forward operator acts by
[TABLE]
Considering the proximal methods, we see that the adjoint lifting only occurs in the argument of the singular value thresholding. If one applies the subspace iteration or the augmented Lanczos process, it is hence enough to study the left-hand and right-hand actions of the adjoint liftings. In the bilinear setting, these actions can be expressed by the left-hand or right-hand adjoint of the original bilinear mapping .
{Lemma}
[Tensor-free adjoint bilinear lifting] Let be a bilinear mapping. The left-hand and right-hand actions of the adjoint lifting with are given by
[TABLE]
for and .
Proof 10**.**
Testing the right-hand action of the image on with an arbitrary vector , we obtain
[TABLE]
*The left-hand action follows analogously. *
An similar observation holds for the quadratic setting, where the adjoint is taken with respect to the symmetric subspace . The associate bilinear mapping to is again denoted by .
{Lemma}
[Tensor-free adjoint quadratic lifting] Let denote a quadratic mapping. The action of the (symmetric) adjoint lifting with is given by
[TABLE]
for .
Proof 11**.**
Similarly to before, we test the action of the image on with an arbitrary vector . Exploiting the symmetry, we obtain
[TABLE]
Remark \theDefinition (Composed tensor-free adjoint lifting).
Since the left-hand and right-hand actions of the tensor are simply given by
[TABLE]
the right-hand action of the singular value thresholding argument within the proximal methods in Section 3 is given by
[TABLE]
and the left-hand action by
[TABLE]
where . Analogously, the actions in the quadratic setting are given by
[TABLE]
where .
Now we are ready to rewrite the proximal methods in Section 3 into tensor-free variants. Exemplarily, we consider the primal-dual method for bilinear operators and exact data, see Proof 7.
{Algorithm}
[Tensor-free primal-dual for exact data]
- (i)
Initiation: Fix the parameters and . Choose the start value in , and set to . 2. (ii)
Iteration: For , update and :
- (a)
Using the tensor-free computations in Proof 9, determine
[TABLE] 2. (b)
Compute a low-rank representation of the singular value threshold
[TABLE]
with Proof 9 (or 9). The required actions are given in (15) and (16).
Remark \theDefinition.
As starting value for the augmented Lanczos bidiagonalization according to Algorithm 9 required for step (ii.b) of Algorithm 11, we suggest a linear combination of the right-hand singular vectors of the previous iteration in the hope that they are good approximations of the new singular vectors.
Adapting the computation of , one may analogously apply Proof 7 and 8 in a complete tensor-free manner. Using Proof 9, §4, and Equation (17), we obtain the corresponding tensor-free proximal methods for quadratic inverse problems.
Because the singular value thresholding can be computed with arbitrary high accuracy, the convergence results for the primal-dual algorithm translates to our setting. The convergence analysis [CP11, Theorem 1] yields the following convergence guarantee, where the norm of the bilinear operator is defined by
[TABLE]
{Theorem}
[Convergence] Under Proof 1 and the parameter choice rule as well as , the iteration in Proof 11 converges to a point , where corresponds to a solution of the bilinear inverse problem ().
Proof 12**.**
For the general minimization problem (1), the related saddle-point problem is given by
[TABLE]
cf. [CP11]. Hence, the bilinear relaxation with exact data () corresponds to the primal-dual formulation
[TABLE]
Due to [Roc70, Theorem 28.3], the first components of the saddle-points of (19) are solutions of (). Vice versa, [Roc70, Corollary 28.2.2] implies that the solutions of () are saddle-points of (19). In particular, the saddle-point problem (19) has at least one solution since the given data are exact.
*Now, [CP11, Theorem 1] yields the convergence of the primal-dual iteration in Proof 11, where the limit denotes a saddle-point of (19), and thus a solution of (). Under Proof 1, this solution has at most rank one; so every with is a solution of the bilinear problem (). *
Similar convergence guarantees can be obtained for the bilinear relaxations () and () as well as for the quadratic variants (), (), and (). Depending on the considered problem – the bilinear or quadratic forward operator – and on the applied proximal algorithm, one may even obtain explicit convergence rates.
5 Reducing rank by Hilbert space reweighting
The projective norm heuristic usually ensures that the solutions of the relaxed lifted minimization problems in Section 2 have low rank, but how may we decrease the rank of the iteration even further to speed up the convergence, and how can we obtain meaningful solutions if Proof 1 is not fulfilled?
If we look back at the employed methods in compressed sensing, where one like to recover a sparse vector from a set of linear measurements, one idea to improve the sparsity of the solution is to reweight the -norm. Since the nuclear norm coincides with the -norm of the singular values, one can try to incorporate this reweighting approach, which however is not possible. The main reason here is that the singular values do not have a special ordering. If a tensor, for instance, possesses a multiple singular value, then the reweighting will be ambiguous.
Instead of reweighting the projective norm itself, we propose to modify the Hilbert norms of the underlying spaces and , where we initially consider the bilinear setting. From a heuristic point of view, we may interpret the leading singular value together with corresponding singular vectors of the variable as an approximate solution of the inverse problem (); so we may adapt the inner products of and in a way that promote vectors in this direction.
More generally, we want to reweight the norms of and with respect to some orthonormal bases, which, for instance, result from the singular value decomposition of the primal variable . In the following, we only consider the reweighting of . The reweighting of can be done completely analogously. If denotes an arbitrary orthonormal basis of , then Parseval’s identity states
[TABLE]
In other words, the matrix corresponding to the inner product on can be written in the form , which incidentally shows . To reweight the -norm (20) with respect to the basis , we introduce the weights and the adapted norm defined by
[TABLE]
In so doing, we obtain the updated inner product matrix with inverse . Depending on the weights, some directions are more promoted or penalized than others.
Unfortunately, for large-scale bilinear inverse problems, the proposed approach is impractical since we have to store a complete orthonormal basis. To overcome this disadvantages, we make a slightly refinement of our approach, where we only reweight a small set of basis vectors that we want to promote; so we choose the weights as with for and otherwise, and thus make the approach
[TABLE]
with . The inverse is here given by
[TABLE]
Hence, to update the inner product matrices, we only require the original matrices and , the (transformed) promoted vectors and , and the weights . For the second Hilbert space , we proceed completely analogously.
The reweighting of the Hilbert spaces has consequences for the proposed algorithms. On the one hand, notice that the adjoint of the lifted operator directly depends on the actual Hilbert spaces and . In order to update the adjoint, we first determine the standardized adjoint with respect to the Hilbert-Schmidt and Euclidian inner product. Afterwards, we transform this adjoint to the actual spaces by the following lemma.
{Lemma}
[Adjoint operator] The adjoint operator with respect to the Hilbert spaces and is given by
[TABLE]
where denotes the adjoint with respect to Hilbert–Schmidt and Euclidian inner product.
Proof 13**.**
The assertion immediately follows from
[TABLE]
*for all and all . *
On the one side, §5 allows us to transform the adjoint , which can usually be determined more easily, to the Hilbert spaces , , and . On the other side and more generally, we may transform the adjoint lifted operator between arbitrary Hilbert space structures. For our specific setting in (22) and (23), for instance, we obtain the following transformation rule, where denotes the transformation matrix
[TABLE]
with the identity . The transformation is defined analogously.
{Corollary}
[Adjoint operator] The adjoint operator with respect to the Hilbert spaces and is given by
[TABLE]
Proof 14**.**
*Apply §5 two times to transform the adjoint firstly from to and, secondly, from to . *
The main benefit of Proof 13 compared with §5 is that the transformation can be done without involving inverse matrices. On the other side, one relies on an efficient and direct implementation of for the unweighted spaces. Remember that this adjoint may be determined by Proof 9. The required actions to compute the singular value threshold regarding the reweighted spaces then have the following form.
{Corollary}
[Tensor-free adjoint bilinear lifting] The left-hand and right-hand actions of the reweighted adjoint with are given by
[TABLE]
for and .
Proof 15**.**
The assertion follows from Proof 13 and Equation (24) by
[TABLE]
*The second identity follows completely analogously. *
Thanks to the above transformations, we can expand the tensor-free primal-dual iteration for exact data in Proof 11 by an efficient reweighting step, which we perform every -th iteration. If the set of promoted directions and for and is empty, we perform the unweighted algorithm with and hence as well as .
In order to avoid a recursive reweighting, we always reweight the original Hilbert spaces. Thus, if we want to promote the leading singular vectors of , we first have to compute the singular value decomposition with respect to underlying, original Hilbert spaces and . Based on the tensor-free characterization obtained form the (weighted) singular value thresholding, this decomposition can be computed by §4 or Proof 9, where the required actions are simply given by (13) and (14). Note that the involved inner products are here again the unweighted versions.
After a reweighting step, the singular value thresholding must be computed with respect to the new weight. For this purpose, we adapt the actions in (15) and (16) by Proof 14. In so doing, for , we obtain the right-hand action
[TABLE]
and the left-hand action
[TABLE]
The definition of the transformations and is given in (24). The associated matrix in (22) leads to the inner product
[TABLE]
where is the number of promoted directions. For , we obtain a similar representation.
{Algorithm}
[reweighted tensor-free primal-dual for exact data]
- (i)
Initiation: Fix the parameters , , , and . Choose the start value in , and set to . Starting without weights, i.e. and . 2. (ii)
Iteration: For , update and :
- (a)
Using the tensor-free computations in Proof 9, determine
[TABLE] 2. (b)
Compute a low-rank representation of the singular value threshold
[TABLE]
with Proof 9. The required actions are given by (25) and (26). 3. (c)
Every iteration, re-weight the Hilbert spaces:
- •
When , set and .
- •
Otherwise, use Proof 9 to compute the (unweighted) singular value decomposition , i.e., with respect to , where the needed actions are given in (25) and (26).
- Set for , and as well as .
Analogously, one can apply the reweighting technique in Proof 15.ii.c to Proof 7 and 8. All observations and heuristics in this section also hold true or may be easily translated for the quadratic setting.
6 Masked Fourier phase retrieval
In this section, we apply the developed algorithm to the phase retrieval problem. Generally, the phase retrieval problem consists in the recovery of an unknown signal from its Fourier intensity. Problems of this kind occur, for instance, in crystallography [Mil90, Hau91], astronomy [BS79, DF87], and laser optics [SST04, SSD*+*06]. To be more precise, in the following, we consider the two-dimensional masked Fourier phase retrieval problem [LCL*+*08, CSV13, CESV13, CLS15, GKK17], where the true signal is firstly pointwise multiplied with a set of known masks and afterwards transformed by the two-dimensional -point Fourier transform defined by
[TABLE]
Denoting by the Hadamard (or pointwise) product, the masked Fourier phase retrieval problem can be stated as follows.
{Problem}
[Masked Fourier phase retrieval] Recover the unknown complex-valued image from the masked Fourier intensities with .
In general, phase retrieval problems are ill-posed due to the loss of the phase information in the frequency domain. In one dimension, the problem usually possesses an enormous set of non-trivial solutions, which heavily differ from the true signal, see for instance [BP15] and references therein. In the two-dimensional setting considered by us, the situation changes dramatically since here almost every signal can be uniquely recovered up to a global phase and up to reflection and conjugation, see [HM82, Hay82, BBE17].
Before considering some numerical simulations, we study the quadratic nature of the masked Fourier phase retrieval problem, where we employ the complex notation as mentioned in the introduction of Section 2. For this purpose, we interpret both, the domain and the image of the measurement operator in §6, as real Hilbert spaces. To simplify the notation, we vectorize the unknown image and the given Fourier intensities for a fixed mask columnwise. Henceforth, the vectorized variables are labeled with . On the Fourier side, we additionally attach the measurements for different masks to each other. Thus, the domain of the measurement operator in §6 becomes and the image . At the moment, the endowed real inner product is not specified in detail.
In order to derive an explicit representation of the (vectorized) forward operator, we write the two-dimensional -point Fourier transform as
[TABLE]
with the Fourier matrices
[TABLE]
where denotes the Kronecker product of two matrices. For the vectorized version of the Fourier transform , we henceforth use the notation .
In the same manner, we write the pointwise multiplication as matrix vector multiplication , where denotes the matrix with diagonal . Combining the interference with the given masks into one operator, we define the matrix
[TABLE]
The action thus attaches the single masked signals to each other.
Composing the two operations, and squaring the measurements, we notice that §6 is equivalent to
[TABLE]
where denotes the identity matrix, and the vectorized exact squared, masked Fourier intensities of the looked for signal. The associate bilinear operator of the quadratic forward operator in () is now given by
[TABLE]
where the function extracts the diagonal of a matrix. Since the last right-hand side is linear in or linear in for , the quadratic lifting has to be
[TABLE]
The last missing ingredient in order to apply our proximal algorithms is the action of the adjoint for a fixed .
{Lemma}
[Tensor-free adjoint lifting] If the Hilbert spaces and are endowed with the real Euclidian inner product, i.e. and , then the action of the adjoint operator with fixed is given by
[TABLE]
for .
Proof 16**.**
We compute the action of the adjoint operator with the aid of Proof 10. For this purpose, we first determine the left adjoint of by considering
[TABLE]
for all and fixed . For the right adjoint, we analogously obtain
[TABLE]
*where diagonal in the middle is not conjugated. Summation of the left and right adjoint as in Proof 10 yields the assertion. *
Remark \theDefinition.
Using §5, we can now transform the actions of the Euclidian adjoint to our actual Hilbert spaces. More precisely, we have
[TABLE]
One of the central reasons to choose the masked Fourier phase retrieval problem as application of the proposed algorithms and heuristics is that the phase retrieval problem (), although severely ill posed, behaves nicely under convex relaxation. More precisely, one can show that under certain conditions the solution of the convex relaxed problem
[TABLE]
is unique and coincides with the true lifted solution with high probability, see [CSV13, CESV13, CLS15, GKK17]. Therefore, we expect that the proposed tensor-free proximal algorithms converge to a unique rank-one solution.
6.1 Effects of bidiagonalization and reweighting
In the first numerical example, we want to study the effect of the applied augmented Lanczos bidiagonalization and of the reweighting heuristic to the computation time and the convergence behavior. For all simulations in this paper, the proposed methods have been implemented in MATLAB® (R2017a, 64-bit) and are performed using an Intel© Core™ i7-4790 CPU (43.60 GHz) and 32 GiB main memory. The employed true two-dimensional signal consists in a synthetic image referring to transmission electron microscopy experiments with nanoparticles. The test image is rather small and is composed of pixels. The corresponding tensor is already of dimension .
Based on the true image, we compute synthetic and noise-free data by applying the masked Fourier transform. The entries of the eight employed masks have been randomly generated with respect to independent Rademacher random variables. More precisely, the entries of the masks are distributed with respect to the model
[TABLE]
Masks of this kind have been studied in [CLS15, GKK17] in order to de-randomize the generic phase retrieval problem considered in [CSV13, CESV13]. In our experiment, we employ the -point Fourier transform such that the complete autocorrelation of the masked signals is encoded in the given Fourier intensities. A first reconstruction of the true signal based on Proof 7 is shown in Figure 1, where we compute the singular value threshold with the aid of a full singular value decomposition of the tensor . For the involved Hilbert spaces and , we simply choose the corresponding Euclidian spaces. In other words, we employ the Hilbert–Schmidt inner product for the (vectorized) matrices in .
Although the reconstruction is quite accurate, the main drawbacks of a direct application of Proof 7 are the computation time and memory requirements. For 1 000 iterations, we need about 6.38 minutes to recover the true signal. Next, we employ the tensor-free variant of the primal-dual iteration in Proof 11, where we apply the augmented Lanczos bidiagonalization to determine the singular value thresholding (Proof 9). Using this modification, we merely need about 58 seconds to perform the reconstruction. Since we can control the accuracy of the partial singular value decomposition, the performed iteration essentially coincides with the original iteration.
The influence of the augmented Lanczos bidiagonalization on the computation time is presented in Table 1. The parameter here describes the maximal size of the bidiagonal matrix . Further, denotes the number of fixed Ritz pairs in the augmented restarting procedure. Since the approximation property of the incomplete Lanczos method becomes worse for small , we require more restarts in order to observe an accurate partial singular value decomposition. For higher-dimensional input images, the time saving aspect becomes much more important.
Considering the evolution of the non-zero singular values of during the iteration, see Table 2, we observe that the projective norm heuristic here enforces a very low rank. After 500 iterations, we nearly obtain a rank-one tensor such that the additional reconstruction error caused by the rank-one projection of to extract the recovered image becomes negligible. After 2 000 iterations, the tensor has converged to a rank-one tensor.
In order to promote the rank-one solutions of the masked Fourier phase retrieval problem even further, we next exploit the reweighting approach proposed in Section 5. For our current simulation, this means that we replace the inner product matrices by the modified matrices defined in (22), where we build up the basis from the singular value decomposition of . In the Euclidian setting considered in this simulation, the transformed directions simply coincide with . The reweighting is here applied every ten iterations with relative weight .
The results of the tensor-free masked Fourier phase retrieval with augmented Lanczos process and Hilbert space reweighting (Proof 15) are shown in Figure 2. Although the reconstruction looks comparable, we want to point out that the absolute errors are several magnitudes smaller. If we compare the evolution of the ranks, see Figure 3, we can see that the proposed reweighting heuristic reduces the rank quite efficiently. More precisely, most of the iterations have rank one. Due to this reduction, the reweighting has also a positive effect on the computation time and the average number of restarts of the Lanczos process, see Table 1. Moreover, we may notice that the data fidelity term decreases with a higher rate. Here the convergences stops after about 650 iterations due to numerical reasons.
6.2 Incorporating smoothness properties
One of the central difference between our tensor-free reweighting algorithm and PhaseLift [CSV13, CESV13] consists in the modeling of the domain and image space of the phase retrieval problem. Where PhaseLift is based on the standard Euclidian setting, we rely on arbitrary discrete Hilbert spaces and . Especially, in two-dimensional phase retrieval, we may thus exploit relationships between neighbored pixels like finite differences. More precisely, we here study the influence of the a-priori smoothness property formulated in terms of the two-dimensional discretized Sobolev norm.
In order to discretize the (weighted) Sobolev space , we employ the forward differences
[TABLE]
to approximate the first partial derivatives. The associate linear mappings for the vectorized image are here denoted by and . Thus the weighted Sobolev space with weights corresponds to the matrix
[TABLE]
In comparison, the discretized -norm corresponds to the standard Euclidian setting associated with the identity matrix.
The masked Fourier intensities of the second example has been created on the basis of a transmission electron microscopy (TEM) reconstruction in [LSP*+*16a]. The image has a dimension of pixels such that the related tensor possesses complex-valued entries and requires 64 GiB memory (double precision complex numbers). Further, we apply four random masks of Rademacher-type (29). Since the masks are generated entirely random, about a one-sixteenth of the pixel are blocked by all masks. The test image together with the number of masks covering a certain pixel are shown in Figure 4.
To solve the corresponding masked phase retrieval problem, we apply Proof 15, where we reweight the Hilbert spaces every 10 steps with the relative weight . Since the algorithm tends to higher-rank tensors in the starting phase, we only compute a partial singular value decomposition with at most five leading singular values using ten Lanczos vectors and . Hence, we perform Proof 11 in an inexact manner. After a few iterations the rank of decreases such that the method becomes again exact, and that the convergence is ensured.
The reconstructions for the Euclidian and Sobolev setting are presented in Figure 5 and 6, respectively. Due to the small number of four masks, the convergence of the algorithm using the discretized -norm is very problematic. Although the method converges for the chosen parameters, the convergence rate is very low. Moreover, pixels that are not covered by any masks cannot be recovered and cause reconstruction defects characterized by black holes. Using instead the discretized Sobolev norm with weight and the same parameters, we obtain a much faster convergence and rank reduction, see Figure 7. Further, the required number of dual updates in order to produce a non-zero primal update is reduced. A small drawback is that the Sobolev norm tends to smooth out the edges of the particles in the reconstruction. One the other side, the a-priori smoothness condition allows us to recover pixels not covered by the given data.
6.3 Phase retrieval for large-scale images
Using the proposed reweighting heuristic to reduce the rank of the iteration , we are able to perform Proof 15 for much larger test images. In this numerical experiment, we consider an pixel image, whose Fourier data are again based on a TEM micrograph of gold nanoparticles [LSP*+*16a]. The lifted image here already requires 16 TiB memory in order to hold the complex-valued entries with double precision. Differently from the previous examples, we here apply eight Gaußian masks following the standard normal distribution
[TABLE]
The recovered signal for the Euclidian inner product is shown in Figure 8 together with the evolution of the rank and data fidelity in Figure 9. Analogously to the above experiments, the Hilbert space is reweighted every ten iterations with a relative weight .
6.4 Corruption by noise
In the last numerical example, we study the influence of noise to the proposed tensor-free primal-dual algorithm. For simplicity, we only study the behavior of the proposed method with respect to white or Gaußian noise of the form where is a normal distributed random vector. For the noise level , we consider different percentages of the norm . Similarly to the first numerical examples, we again apply four Rademacher-type masks of the form (29). The synthetic data for the test image are again based on a TEM reconstruction of gold nanoparticles [LSP*+*16b, Figure S1B] and the -point Fourier transform. The domain is again equipped with the discretized Sobolev norm with weight . Due to the noise, we have adapted Proof 15 to the Tikhonov regularization in Proof 7, which simply means that we have to multiply in step (ii.a) with additionally and to scale the threshold in step (ii.b) to . Since affects the influence of the projective norm heuristic, this parameter has to be chosen relatively large. In this brief test run with noisy measurements, we chose independent of the noise level. Surely, the results can be improved by more sophisticated parameter choice rules.
The recovered signals and the evolution of the rank and data fidelity are shown in Figure 10 and 11, respectively. Due to the noise, we cannot ensure that the recovered tensor has rank one and corresponds to a meaningful approximation of the true signal. If we endow the domain with the Euclidian inner product in analogy to classical PhaseLift, then we observed that the rank of the iteration increases uncontrollable, and the proposed algorithm diverges because of the limited data provided by only four masks. Using instead the weighted Sobolev norm and the Hilbert space reweighting, the rank becomes one after a short starting phase, where the maximal rank is restricted by five. Because of the same starting value and the same regularization parameter, the primal-dual iteration initially performs nearly identical for the three considered noise levels such that the rank evolutions coincide. Further, for all three cases, we here obtain reasonable reconstructions. Moreover, the pixels not covered by any mask are filled up, and the influence of the noise to the reconstruction is smoothed out. Further numerical experiments suggest that Proof 15 in combination with a-priori smoothness properties recovers the unknown signal in a stable manner.
7 Conclusion
In this paper, we developed a novel proximal algorithm to solve bilinear and quadratic inverse problems. The basic idea was to exploit the universal property to lift the considered problem to linear inverse problems on the tensor product. In order to deal with the rank-one constraint, we applied a nuclear or projective norm heuristic, which is known to produce low-rank solutions. The relaxation of the lifted problem yields a constrained minimization problem, which has been solved by applying the first-order primal-dual algorithm proposed by Chambolle and Pock. Since the choice of the underlying algorithm is somehow arbitrary, there are several further options to develop minimization methods for the lifted and relaxed problem. For instance, one may apply the alternating direction method of multipliers (ADMM) [BPC*+*11], forward-backward splitting [LM79, CW05], or FISTA [BT09].
The flexibility to adapt the domain of the bilinear or quadratic operator allows us to incorporate smoothness assumptions or neighborhood relations. As demonstrated for the masked Fourier phase retrieval problem, this freedom enables us to recover pixels that are blocked by the applied masks such that they do not contribute to the given measurements. Further, the smoothing properties of the discretized Sobolev norm greatly improve the numerical observed convergence rates. Moreover, one can exploit this flexibility to reweight the pre-image spaces in order to promote low-rank solutions. In the moment, we rely on a projective norm based on the corresponding Hilbert norms. Here, the question arises if one can employ nuclear “norms” based on semi-norms like the total variation or total generalized variation.
For the masked Fourier phase retrieval problem, we have shown that the developed algorithms seem to be stable under noise. More precisely, we have studied the influence of white noise, which can be treated by choosing the squared Euclidian or discretized -norm as data fidelity term of the Tikhonov functional in () and (). In order to incorporate more realistic noise models like Poisson noise into phase retrieval, one can, for instance, replace the data fidelity by the Kullback–Leibler divergence. In so doing, one only has to update the proximal mapping to compute the dual variable.
Acknowledgments. We gratefully acknowledges the funding of this work by the Austrian Science Fund (FWF) within the project P28858. The Institute of Mathematics and Scientific Computing off the University of Graz, with which the authors are affiliated, is a member of NAWI Graz (https://www.nawigraz.at/).
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1]
- 2[ABHS 16] Anzengruber , Stephan W. ; Bürger , Steven ; Hofmann , Bernd ; Steinmeyer , Günter: Variational regularization of complex deautoconvolution and phase retrieval in ultrashort laser pulse characterization. In: Inverse Probl 32 (2016), No. 3, pp. 035002(27)
- 3[BB 98] Bertero , Mario ; Boccacci , Patrizia: Introduction to Inverse Problems in Imaging . Bristol : Institute of Physics Publishing, 1998
- 4[BB 18] Beinert , Robert ; Bredies , Kristian: Non-convex regularization of bilinear and quadratic inverse problems by tensorial lifting. In: Inverse Probl 35 (2018), No. 1, pp. 015002
- 5[BBE 17] Bendory , Tamir ; Beinert , Robert ; Eldar , Yonina C.: Fourier phase retrieval: uniqueness and algorithms. In: Boche , H. (Ed.) ; Caire , G. (Ed.) ; Calderbank , R. (Ed.) ; März , M. (Ed.) ; Kutyniok , G. (Ed.) ; Mathar , R. (Ed.): Compressed Sensing and its Applications . Cham : Birkhäser, 2017 (Applied and Numerical Harmonic Analysis), Chapter 2, pp. 55–91
- 6[BBM + 04] Blaimer , M. ; Breuer , F. ; Mueller , M. ; Heidemann , R. M. ; Griswold , M. A. ; Jakob , P. M.: SMASH, SENSE, PILS, GRAPPA: how to choose the optimal method. In: Top Magn Reson Imaging 15 (2004), No. 4, pp. 233–236
- 7[BP 15] Beinert , Robert ; Plonka , Gerlind: Ambiguities in one-dimensional discrete phase retrieval from Fourier magnitudes. In: J Math Anal Appl 21 (2015), No. 6, pp. 1169–1198
- 8[BPC + 11] Boyd , Stephen ; Parikh , Neal ; Chu , Eric ; Peleato , Borja ; Eckstein , Jonathan: Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers. In: Found Trends Mach Learn 3 (2011), No. 1, pp. 1–122
