This paper investigates the Douglas-Rachford algorithm's behavior when minimizing a convex function under a linear constraint, including cases with no common feasible points, and introduces new convergence results and parallel splitting methods.
Contribution
It provides new convergence results for the DRA in the absence of feasible points and introduces a novel parallel splitting approach for constrained convex minimization.
Findings
01
DRA converges to a best approximation solution even without feasible points.
02
New parallel splitting method for convex optimization with linear constraints.
The Douglas-Rachford algorithm (DRA) is a powerful optimization method for minimizing the sum of two convex (not necessarily smooth) functions. The vast majority of previous research dealt with the case when the sum has at least one minimizer. In the absence of minimizers, it was recently shown that for the case of two indicator functions, the DRA converges to a best approximation solution. In this paper, we present a new convergence result on the the DRA applied to the problem of minimizing a convex function subject to a linear constraint. Indeed, a normal solution may be found even when the domain of the objective function and the linear subspace constraint have no point in common. As an important application, a new parallel splitting result is provided. We also illustrate our results through various examples.
F:=\operatorname{Fix}T(\cdot+v)=\big{\{}{x\in X}~{}\big{|}~{}{x=T(x+v)}\big{\}}\text{~{}is convex, closed, and nonempty.}
F:=\operatorname{Fix}T(\cdot+v)=\big{\{}{x\in X}~{}\big{|}~{}{x=T(x+v)}\big{\}}\text{~{}is convex, closed, and nonempty.}
(∀n∈N)Tny=y−nv;
(∀n∈N)Tny=y−nv;
(∀n∈N)∥(n+1)v+Tn+1x−y∥≤∥nv+Tnx−y∥;
(∀n∈N)∥(n+1)v+Tn+1x−y∥≤∥nv+Tnx−y∥;
n=0∑+∞∥Tn+1x−Tnx−v∥2<+∞,
n=0∑+∞∥Tn+1x−Tnx−v∥2<+∞,
Tnx−Tn+1x→v;
Tnx−Tn+1x→v;
n→+∞limPF(nv+Tnx)∈F
n→+∞limPF(nv+Tnx)∈F
vD:=PS1(0),vR:=PS2(0),v:=PS1∩S2(0).
vD:=PS1(0),vR:=PS2(0),v:=PS1∩S2(0).
PC=PC∘PU
PC=PC∘PU
⟨c−PCPUx,x−PCPUx⟩
⟨c−PCPUx,x−PCPUx⟩
=⟨c−PCPUx,PUx−PCPUx⟩
≤0,
(∀z∈X)h(z)≥h(x)+⟨z−x,x∗⟩.
(∀z∈X)h(z)≥h(x)+⟨z−x,x∗⟩.
⟨y−x,x∗⟩=0.
⟨y−x,x∗⟩=0.
h(x)=h(y).
h(x)=h(y).
(∀z∈X)h(z)
(∀z∈X)h(z)
=h(y)+⟨z−y,x∗⟩+⟨y−x,x∗⟩
=h(y)+⟨z−y,x∗⟩.
argmin(ιU+h)⇉X:x↦U⊥∩∂h(x)
argmin(ιU+h)⇉X:x↦U⊥∩∂h(x)
v=PU−domg(0).
v=PU−domg(0).
v=PU−domg(0).
v=PU−domg(0).
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsOptimization and Variational Analysis · Advanced Optimization Algorithms Research · Sparse and Compressive Sensing Techniques
Full text
On the behaviour of the Douglas–Rachford algorithm
for
minimizing a convex function subject
to a linear constraint
Heinz H. Bauschke and
Walaa M. Moursi
Mathematics, University
of British Columbia,
Kelowna, B.C. V1V 1V7, Canada. E-mail:
[email protected].
Department of Combinatorics and Optimization, University of Waterloo,
Waterloo, Ontario N2L 3G1, Canada.
E-mail: [email protected].
(July 9, 2020)
Abstract
The Douglas-Rachford algorithm (DRA) is a powerful optimization method
for minimizing the sum of two convex (not necessarily smooth)
functions. The vast majority of previous research dealt with the
case when the sum has at least one minimizer.
In the absence of minimizers, it was recently shown that
for the case of two indicator functions,
the DRA converges to a best approximation solution.
In this paper, we present a new convergence result on
the DRA applied to the problem of minimizing a convex function
subject to a linear constraint.
Indeed, a normal solution may be found even when the domain of the
objective function and the linear subspace constraint have no point in common.
As an important application, a new parallel splitting result is provided.
We also illustrate our results through various examples.
with inner product
⟨⋅,⋅⟩:X×X→R
and induced norm ∥⋅∥.
We furthermore assume that
[TABLE]
and that
[TABLE]
Our aim is to discuss the behaviour of the
Douglas–Rachford algorithm [17] applied to solving the
optimization problem111
Let us point out that if U=u+U
is an affine subspace and g is convex,
lower semicontinuous, and proper, then
all our results are applicable by working
with U and g=g(⋅−u) instead.
[TABLE]
where ιU(x)=0 if x∈U and ιU(x)=+∞ if x∈/U.
Note that we do not assume a priori that
4 has a solution.
Given any starting point x0∈X,
the Douglas–Rachford algorithm generates
the so-called governing sequence
[TABLE]
where
[TABLE]
is the Douglas–Rachford operator,
PU is the projector of U,
Pg is the proximal mapping
of the function g, and
RU=2PU−Id=PU−PU⊥
is the reflector of U.
The basic convergence result
(see [22], [18], and [27]),
guarantees that the shadow sequence
[TABLE]
converges weakly to a solution of (4)
provided that (NU+∂g)−1(0)=∅.
To deal with the potential lack of solutions
of (4), we define
the minimal displacement vector
[TABLE]
This vector is well defined because
ran(Id−T) is convex, closed, and trivially nonempty.
We now assume that the so-called normal problem corresponding
to (4), which asks to find a zero
of the operator −v+NU+∂g(⋅−v), admits at least one
normal solution222
Note that it is possible that
Z is empty:
indeed, consider the case when X=R=U and g=exp.
In this case, ∣Tnx∣→+∞ for every x∈R.
(see [9, Definition 3.7]):
[TABLE]
We also assume throughout that
[TABLE]
which is automatically the case when X is finite-dimensional, and
that
[TABLE]
which is a rather mild constraint qualification that is
satisfied, for instance, if g has minimizers333
Also note that (11) implies that
the Fenchel dual of (4) is feasible and hence
that (4) is implicitly assumed to be bounded below..
Note that if (4) has a solution
and ∂(ιU+g)=NU+∂g (this sum formula
is typically guaranteed through a regularity condition),
then v=0 and Z=argmin(ιU+g).
Our main result (see Theorem 5.1 below) can now be concisely stated as follows:
Under the above assumptions, which we assume for the rest of the paper,
we have
[TABLE]
This is a completely new (and very beautiful) variant of the classical result
which is proven with a careful function value analysis in Section 4!
It reveals the Douglas–Rachford algorithm to be a method for solving
the following bilevel optimization problem:
first, obtain the gap vector between U=domιU and domg.
This level is purely geometrical, depending on the sets U and domg,
and revealing the minimal displacement vector v.
Secondly, if v=0, rather than minimizing the original ιU+g
which would have the optimal value +∞,
we then instead minimize the minimal perturbation function
ιU+g(⋅−v).
This has consequences for minimizing the sum
of convex function by using a product space technique;
in fact, real world applications inspired this research
(see the last section).
Let us now comment on related previous works which
will illustrate the complementary nature of the present work.
To the best of our knowledge, none of these works contains
the result (12) in the generality of the setting of
Theorem 5.1.
The paper [2] by Banjac, Goulart, Stellato, and Boyd
applies the Douglas–Rachford algorithm with the function f
being the sum of a quadratic function and the indicator function of
an affine subspace rather
than ιU and with g being the indicator function of a
nonempty closed convex set. The Douglas–Rachford method (equivalent to
ADMM in this setting) is shown to be useful in providing
certificates of infeasibility.
The paper [8]
concerns the more restrictive case when g is the indicator function of a
nonempty closed convex set; however, the underlying assumptions there
do not require (10).
The paper [9] introduces the normal problem
but it does not contain any algorithmic/dynamic results.
Similarly to [8], the paper
[12] deals with the case when
g is assumed to be an indicator function of a closed affine subspace.
Under suitable assumptions,
the shadow sequence (PUTnx0)n∈N is shown to converge strongly.
The paper [13] considers an infinite-dimensional setting
that encompasses two indicator functions; however,
our present main result is not covered by these results
(see Remark 5.4 below).
In the paper [23] by Liu, Ryu, and Yin, the authors study the
behaviour of the Douglas–Rachford algorithm applied to
conic programming where g is the indicator function of a nonempty closed
convex cone while ιU is replaced by the sum of a linear function and
the indicator function of an affine subspace. The Douglas–Rachford method
is shown to reveal information on the type of pathologies the conic
program may exhibit.
Finally, the paper [26]
by Ryu, Liu, and Yin
is the first to provide a comprehensive
function-value analysis in pathological cases.
It differs from the present work in that
Ryu et al. allow for a general function f rather than the indicator function
ιU considered here. However, our main result Theorem 5.1 gives
information on the iterates and the function values that are not covered
by the results in [26] when strong duality fails.
The remainder of this paper is organized as follows.
In Section 2 we review known facts and present
new auxiliary results that are needed in the main analysis.
Section 3 presents new descriptions of the
minimal displacement vector and the set of minimizers
which are crucial in the convergence proofs.
The building blocks of our analysis and the main result are presented
in Sections 4 and 5 respectively.
In the final Section 6,
we provide a useful application of
our theory to describe the behaviour of a parallel splitting method.
We employ standard notation from convex analysis and optimization
as can be found, e.g., in [6] and [25].
2 Known and new auxiliary results
Because Z=∅ (see 9),
the generalized fixed point set introduced in
[9]
is very well behaved in the sense that
[TABLE]
The Douglas–Rachford operator T defined in (6)
enjoys the following nice properties which also underline
the importance of F for understanding
the Douglas–Rachford algorithm:
Fact 2.1**.**
Let x∈X and y∈F.
Then444We point out that 2.1 holds
in the more general setting when T is any firmly nonexpansive mapping.
[TABLE]
the sequence (nv+Tnx)n∈N is Fejér monotone with respect to F, i.e.,
[TABLE]
[TABLE]
[TABLE]
and the limit
[TABLE]
exists.
Proof. See [13, Corollary 4.2],
[12, Proposition 2.5(vi)]
and [6, Proposition 5.7].
\hfill■
Before we proceed, we recall the following useful fact
that will be used in the proofs of Proposition 2.3 and Proposition 3.1.
Fact 2.2**.**
Let C be a nonempty closed convex subset
of X.
Set w=PU−C(0) and let x∈X. Then
w=limn→∞(PU−Id)(PCPU)nx∈ran(PU−Id)=−U⊥=U⊥.
The next result will also be used in the proof of Proposition 3.1.
Proposition 2.3**.**
Let C1 and C2 be nonempty closed convex subsets of X,
and set S1:=U−C1 and S2:=U⊥−C2.
Define
[TABLE]
Then the following hold:
(i)
(vD,vR)∈U⊥×U.
2. (ii)
PU⊥(S1)⊆S1.
3. (iii)
PU(S2)⊆S2.
4. (iv)
vD+vR∈S1∩S2.
5. (v)
v=vD+vR.
Proof. (i):
Apply 2.2 with (C,w) replaced
by (C1,vD) (respectively (C,w) replaced
by (C2,vR)).
(ii):
Let y∈S1.
Then there exist (un)n∈N in U
and (c1,n)n∈N is C1 such that
un−c1,n→y.
Now, PU⊥y←PU⊥(un−c1,n)=−PU⊥c1,n=PUc1,n−c1,n∈U−C1.
Hence, PU⊥y∈U−C1=S1
and the claim follows.
(iii):
Proceed similar to the proof of (ii).
(iv):
Indeed, note that by (i)
we have vR∈U, hence
vD+vR∈S1+vR=U−C1+vR=U−C1+vR=U−C1=S1.
Similarly, we show that
vD+vR∈S2
and the conclusion follows.
(v):
Note that (ii) & (iii)
imply that (PUv,PU⊥v)∈S2×S1.
Consequently,
∥vR∥≤∥PUv∥
and
∥vD∥≤∥PU⊥v∥.
Altogether, in view of
(i),
we learn that
∥vD+vR∥2=∥vD∥2+∥vR∥2≤∥PUv∥2+∥PU⊥v∥2=∥v∥2.
Combining this with
(iv),
and the definition of v, we obtain the result.
\hfill■
The following simple result, which relies on the assumption
that U is a closed linear subspace, will be used in the proof of
Theorem 5.1.
Lemma 2.4**.**
Let C be a nonempty closed convex subset of U.
Then
[TABLE]
Proof. Let x∈X and let c∈C⊆U.
Then PCPUx∈C and
[TABLE]
and we are done.
\hfill■
We now turn to the minimization of
a convex function subject to a linear constraint.
The following result will be used in the proof of Theorem 3.4.
Lemma 2.5**.**
Let h:X→]−∞,+∞] be a proper lower semicontinuous convex function.
Furthermore, let x and y be points in U, and let x∗∈X.
Then the following hold:
(i)
If U⊥∩∂h(x)=∅, then x is a minimizer
of ιU+h.
2. (ii)
If x∗∈U⊥∩∂h(x) and y is a minimizer of
ιU+h, then x∗∈U⊥∩∂h(y).
Proof. (i):
Suppose that U⊥∩∂h(x)=∅.
Then, since U⊥ is a subspace,
(−U⊥)∩∂h(x)=∅.
Suppose that x∗∈∂h(x).
Then
−x∗∈U⊥=NU(x).
It follows that
0=(−x∗)+x∗∈NU(x)+∂h(x)=∂ιU(x)+∂h(x)⊆∂(ιU+h)(x).
By Fermat’s rule, x is a minimizer of ιU+h.
On the other hand,
because y is a minimizer of ιU+h,
we learn from (i) that
[TABLE]
Altogether,
[TABLE]
Therefore, x∗∈∂h(y).
\hfill■
The assumption that U⊥∩∂h(x)=∅
in Lemma 2.5(ii) is critical:
Example 2.6**.**
Suppose that X=R, that U={0},
and that h(ξ)=−ξ, if ξ≥0 and
h(ξ)=+∞ if ξ<0.
Then [math] minimizes ιU+h=ιU yet
U⊥∩∂h(0)=∂h(0)=∅.
Remark 2.7**.**
Let h:X→]−∞,+∞] be a proper lower semicontinuous convex function.
Then Lemma 2.5 implies that the set-valued operator
[TABLE]
is constant.
3 New static results
We start with the following useful
result for the minimal displacement vector v from (8).
Proposition 3.1**.**
Set w=PU−domg(0).
Then the following hold:
(i)
w∈U⊥.
2. (ii)
If X is finite-dimensional, then
v=w=PU−domg(0)∈U⊥.
Proof. Clearly
U−domg=U−domg
and,
U⊥+domg∗=U⊥+domg∗.
(i):
Apply 2.2 with C replaced by
domg. (ii): Note that ιU∗=ιU⊥ and
thus domιU∗=U⊥.
Hence (11) states exactly that 0∈domιU∗+domg∗.
It follows from [10, Proposition 6.1(ii) and Corollary 6.5(i)] that
v=P(U−domg)∩(U⊥+domg∗)(0).
By Proposition 2.3 applied with
(C1,C2) replaced by (domg,−domg∗)
we have
The result in Proposition 3.1(ii)
was first proved — in an even more general form —
by Ryu, Liu, and Yin
with a different argument relying on recession functions
(see [26, Lemma 3]).
From now on, we assume:
The fact that v belongs to U⊥ is new and crucial to our analysis.
We now turn towards alternative descriptions of the set Z of normal
solutions, defined in (9).
In passing, we mention that the next result is true even if Z=∅.
Proposition 3.2**.**
We have
[TABLE]
and
[TABLE]
Proof. Recall that
v∈U⊥
by (29).
Hence NU=−v+NU.
Now let x∈X.
Then
[TABLE]
which proves 30, 31a, and 31b.
Turning to 31c,
let x∈zer(NU+∂g(⋅−v)).
On the one hand, x∈dom(NU+∂g(⋅−v)) and
thus NU(x)=∅ and ∂g(x−v)=∅.
Hence x∈U and x−v∈dom∂g, i.e.,
x∈U∩(v+dom∂g.
On the other hand, zer(NU+∂g(⋅−v))=zer(∂ιU+∂g(⋅−v)).
Hence
0∈∂ιU(x)+∂g(⋅−v)(x)⊆∂(ιU+g(⋅−v))(x)
and therefore x minimizes ιU+g(⋅−v).
Finally, 31d and 31e are obvious.
\hfill■
Example 3.3** **(linear-convex feasibility).
Suppose that g=ιW,
where W is a nonempty closed convex subset of X.
Then v=PU−W(0),
argming=dom∂g=W, and
v+argming=v+W=v+domg.
Thus
Proposition 3.2 yields
We are now ready for our first main result which
provides a useful description of Z:
Theorem 3.4**.**
Because Z is nonempty, we have
[TABLE]
Proof. Proposition 3.2 yields the inclusions
Z\subseteq U\cap(v+\operatorname{dom}\partial g)\cap\\
\ \big{(}\iota_{U}+g(\cdot-v)\big{)}\subseteq\operatorname{argmin}\big{(}\iota_{U}+g(\cdot-v)\big{)}.
Because Z=∅,
we let x∈Z, and also let y∈argmin(ιU+g(⋅−v))⊆U.
First, by (30),
x∈U and U⊥∩∂g(x−v)=∅.
Secondly, it follows from Lemma 2.5 (applied with h=g(⋅−v))
that U⊥∩∂g(y−v)=∅.
Therefore, by using again 30, we obtain
y∈Z.
\hfill■
Here is an example of a case where Z=∅.
Example 3.5**.**
Suppose that g is polyhedral.
Then
[4, Theorem 5.6.1] implies that U∩(v+domg)=U∩domg(⋅−v)=∅.
Hence,
by [6, Corollary 27.3(c)]
we have Z=\operatorname{argmin}\big{(}\iota_{U}+g(\cdot-v)\big{)}.
The underlying assumption that Z be nonempty (see 9)
in Theorem 3.4 is critical:
Example 3.6**.**
Suppose that X=R2,
that U={0}×R
and that g is the Rockafellar function defined by
[TABLE]
(see [25, Example on page 218]).
Then v=0 and it follows from [24, Example 7.5]
that
Z=∅,
argmin(ιU+g(⋅−v))={0}×[−1,1],
and
U∩(v+dom∂g)∩argmin(ιU+g(⋅−v))={0}×{−1,1}.
Proof. Clearly we have U⊥=R×{0} and
domg=R+×R.
Moreover,
[24, Example 6.5] implies that
\operatorname{dom}\partial g=\big{\{}{(\xi_{1},\xi_{2})}~{}\big{|}~{}{\xi_{1}>0,\xi_{2}\in\mathbb{R}}\big{\}}\cup\big{\{}{(0,\xi_{2})}~{}\big{|}~{}{\xi_{2}\geq 1}\big{\}},
and
\operatorname{dom}\partial g^{*}=\operatorname{dom}g^{*}=\big{\{}{(\xi_{1},\xi_{2})}~{}\big{|}~{}{\xi_{1}\leq 0,\lvert\xi_{2}\rvert\leq 1}\big{\}}.
Therefore, using [10, Corollary 6.5(i)]
we learn that
v=P(U−domg)∩(U⊥+domg∗)(0)=0.
It follows from Proposition 3.2 that
Z=\big{\{}{(0,\xi_{2})}~{}\big{|}~{}{U^{\perp}\cap\partial g((0,\xi_{2}))\neq\varnothing}\big{\}}.
Now let (0,ξ2)∈U∩domg and note that [24, Example 6.5]
implies that
[TABLE]
which proves the claim that Z=∅.
Finally, using 35,
we see that
argmin(ιU+g(⋅−v))=argmin(ιU+g)={0}×[−1,1]
and the conclusion follows.
\hfill■
When X=R, then we obtain the following positive result,
which holds even when Z=∅:
Proposition 3.7**.**
Suppose that X=R. Then
[TABLE]
More precisely, exactly one of the following cases holds:
(i)
U={0}, v=P−domg(0), Z=0⋅∂g(−v),
and either ιU+g(⋅−v)=ι{0} if −v∈domg
or ιU+g(⋅−v)=ι∅ if −v∈/domg.
2. (ii)
U=R, v=0, and Z=dom∂g∩argming=argming.
Proof. Denote the right side of 37 by R.
It is clear from Proposition 3.2 that Z⊆R.
Now let x∈R.
On the one hand,
[TABLE]
On the other hand,
x∈dom∂ιU∩dom∂g(⋅−v).
By the sum rule for the real line, we have
[TABLE]
Altogether,
0∈∂ιU(x)+∂g(x−v) and
thus x∈Z by Proposition 3.2.
The remaining statements follow readily.
\hfill■
The previous results make it tempting to conjecture that
when X=R and Z=∅, then we have
argmin(ιU+g(⋅−v))=∅.
Unfortunately, this conjecture is false:
Example 3.8**.**
Suppose that X=R, that U={0} and
that −x with domg=R+.
Then v=P−domg(0)=0.
Hence Z={0}⋅∂g(0)=∅ by
Proposition 3.7
while argmin(ιU+g(⋅−v))={0}
because ιU+g(⋅−v)=ιU+g=ιU=ι{0}.
We conclude this section with another
useful consequence of (29):
Proposition 3.9**.**
We have Z=PU(F) and
[TABLE]
Proof. Set A=−v+NU and B=∂g(⋅−v), and
note that by (29) A=NU.
Then the Douglas–Rachford operator corresponding to (A,B) is
[9, Proposition 3.2]
[TABLE]
Moreover JA:=(Id+A)−1=PU.
Note that A and B are subdifferential operators,
hence paramonotone by [19, Theorem 2.2].
So [5, Corollary 5.6] yields
F=Z+K, Z=JA(F)=PU(F), where K:=(Id−JA−1)(F)=PU⊥(F)⊆U⊥.
Moreover, because Z−Z⊆U and so Z−Z⊥K,
we have
JAPZ+K=PZ,
equivalently, PUPF=PZ,
by [5, Theorem 6.7(ii)].
\hfill■
4 New dynamic results
Recall that
[TABLE]
We start with a result that provides some
information on the shadow sequence (PUTnx)n∈N.
(In passing, we note that only item (v) requires that Z be nonempty.)
Lemma 4.1**.**
Let x∈X.
Then the following hold:
(i)
PUTnx−PgRUTnx=Tnx−Tn+1x→v∈U⊥.
2. (ii)
PUTnx−PUPgRUTnx=PUTnx−PUTn+1x→0.
3. (iii)
−PU⊥PgRUTnx=PU⊥Tnx−PU⊥Tn+1x→v*.
(iv)
All weak cluster points of (PUTnx)n∈N lie in
U∩(v+domg).
5. (v)
The sequences (nv+Tnx)n∈N,
(PUTnx)n∈N, and
(PgRUTnx)n∈N
are bounded.
Proof. (i): Clear from the definition of T, (17) and (29).
(ii):
Apply PU to (i).
(iii):
Apply PU⊥ to (i).
(iv):
On the one hand,
(Tnx−Tn+1x)+PgRUTnx=PUTnx∈U.
On the other hand,
PgRUTnx∈dom∂g⊆domg.
Altogether, combined with (i),
we obtained the desired result.
(v):
By 2.1 and 13,
the sequence (nv+Tnx)n∈N is Fejér monotone with respect to
F=∅, hence it is bounded.
Therefore, (PUTnx)n∈N=(PU(nv+Tnx))n∈N is also bounded.
The boundedness of (PgRUTnx)n∈N follows from (i).
\hfill■
Note that Proposition 3.2 yields that Z−v⊆(U−v)∩domg,
and thus U−v∩domg is nonempty.
The next result provides information on function values of g of
a sequence occurring in the Douglas–Rachford algorithm.
Lemma 4.2**.**
Let x∈X,
let y∈(U−v)∩domg,
and let n∈N.
Then
[TABLE]
Proof. The characterization of the prox operator Pg gives
[TABLE]
We also have
[TABLE]
Now write y=u−v, where u∈U.
Then, using also the identity in Lemma 4.1(iii)
to derive 46e,
we have
[TABLE]
Therefore,
substituting 45 and 46
into 44, we obtain
[TABLE]
which completes the proof.
\hfill■
We are now able to locate weak cluster points
of the shadow sequence (PUTnx)n∈N:
Lemma 4.3**.**
Let x∈X and
let y∈(U−v)∩domg.
Then there exists a sequence (εn)n∈N in R
such that
[TABLE]
and
for every n∈N, we have
[TABLE]
Moreover, the sequence
[TABLE]
[TABLE]
and
[TABLE]
Finally, the sequence
[TABLE]
Proof. Lemma 4.1(v)&(i)
yield that (y−PgRUTnx)n∈N is bounded
and that PUTnx−v−PgRUTnx→0.
Thus
[TABLE]
Lemma 4.1(iii)&(i)
yield that PU⊥Tnx−PU⊥Tn+1x−v→0 and
that (PU⊥(nv+Tnx))n∈N is bounded.
Hence
Now choosing y so that g(y) is as close to μ as we like,
we deduce from 60 and 62 that
[TABLE]
Hence c is a minimizer of ιU−v+g.
Because c was an arbitrary weak cluster point of
(PgRUTnx)n∈N, we obtain through a simple proof by contradiction that
Note that (52) is equivalent to
n⋅⟨Tnx−Tn+1x−v,v⟩→0.
On the other hand,
(15) and (16) combined with [21, Chapter III, Section 14, Theorem on p. 124]
(or [20, Problem 3.2.35]) yields
n⋅∥Tnx−Tn+1x−v∥2→0.
We do not know whether
n⋅∥Tnx−Tn+1x−v∥→0.
5 The main result
We are now ready for the main result.
In the following we set
Proof. For brevity, we write y=y(x). Because PU is continuous, we have
[TABLE]
On the other hand,
PUPF=PZ=PZPU
by 40 and 20.
Invoking the fact that v∈U⊥ (see 29),
we conclude altogether that
[TABLE]
Recall from (53) and (34) that
(PUTnx)n∈N is bounded and that
all its cluster points lie in argmin(ιU+g(⋅−v))=Z.
Now let z be an arbitrary weak cluster point of (PUTnx)n∈N,
say PUTknx⇀z∈Z⊆U.
Then PZPUTknx⇀PZz=z using (10).
Combining with 69,
we deduce that z=PUy.
Hence every weak cluster point of (PUTnx)n∈N
coincides with PUy.
In view of the boundedness of (PUTnx)n∈N,
we obtain (66).
The remainder follows from
Lemma 4.1(i) and
(51).
\hfill■
Example 5.2** **(linear-convex feasibility).
Suppose that g=ιW,
where W is a nonempty closed convex subset of X
such that U∩(v+W)=∅.
Then, 0∈domg∗ which implies that
0∈U⊥+domg∗, hence 11 is verified.
Moreover,
v=PU−W(0) by [9, Proposition 3.16]
and (∀x∈X)PUTnx⇀PUy∈U∩(v+W),
where y=limn→∞PF(nv+Tnx)
by Theorem 5.1.
Example 5.3**.**
Suppose that W is a linear subspace of X such that
{0}⫋W⫋U⊥.
Let w∈W∖{0},
let b∈(U⊥∩W⊥)∖{0},
and suppose that
g=21∥⋅∥2+⟨w,⋅⟩+ι−b+W.
Let x∈X.
Then the following hold:
(i)
∂g=w+Id+N−b+W.
2. (ii)
U∩W={0}.
3. (iii)
domg=dom∂g=−b+W,
domg∗=X, and 0∈U⊥+domg∗=X.
4. (iv)
Proof. Note that U+W⫋U+U⊥=X and thus
U⊥∩W⊥=(U+W)⊥⫌{0}.
Hence the choice of b is possible.
(i): Clear.
(ii):
Indeed,
{0}⊆U∩W⊆U∩U⊥={0}.
(iii):
It is clear that domg=dom∂g=−b+W.
Because lim∥x∥→+∞g(x)/∥x∥=+∞,
it follows that domg∗=dom∂g∗=X by, e.g.,
[6, Proposition 14.15 and Proposition 16.27].
(iv):
Using (29) and (iii),
we obtain
v=PU−domg(0)=Pb+U+W(0)=b+PU+W(0−b)=P(U+W)⊥(b)=PU⊥∩W⊥(b)=b.
(v):
Clear from (iv).
(vi):
This follows from (9),
(i), (ii),
and (iii).
(vii):
Set y=−b−21w+21PWx.
Then y∈−b+W.
Thus,
PW⊥x∈−2b+W⊥⇔x∈2(−b−21w+21PWx)+w+W⊥=2y+w+W⊥=y+w+y+N−b+W(y)=(Id+∂g)(y)⇔y=Pg(x).
(viii):
This follows from (6) and (vii).
(ix):
Using (13) and (viii), we obtain
x∈F⇔x=T(x+v)=T(x+b)⇔x=−b−21w+x+b−PU(x+b)−21PW(x+b)⇔0=21w+21PUx+21PWx⇔
[x∈U⊥ and x∈−w+W⊥].
(x):
We have the equivalences
0∈F⇔0=T(0+v)⇔0=T(b)⇔0=−b−21w+b−PUb−21PWb⇔0=−21w, which is absurd.
(xi):
This follows from (ix) and induction.
(xii):
Clear from (xi).
\hfill■
Remark 5.4**.**
We point out that in [13, Theorem 4.4]
the authors provide an instance where the shadow sequence converges.
The proof in [13]
critically relies on the assumption
that Z⊆F.
Our new result does not require
this assumption.
Indeed, by
Example 5.3(vi)&(x),
Z={0} and Z∩F=∅.
Example 5.5**.**
Suppose that X is finite-dimensional555
We require this assumption in the proof of
item (v) which relies on [10].
, that U={0},
let u∗∈U∖{0},
suppose that666Given a nonempty closed convex subset C of X,
the associated distance function to the set C is denoted by
distC.
g=21distU2+⟨u∗,⋅⟩,
and let x∈X.
Then the following hold:
(i)
∂g=∇g=u∗+PU⊥.
2. (ii)
U−dom∇g=U−domg=X.
3. (iii)
ranNU+ran∂g=U⊥+domg∗=U⊥+dom∂g∗=u∗+U⊥* is closed.*
4. (iv)
0∈U⊥+domg∗=ranNU+ran∂g.
5. (v)
v=u∗∈U∖{0}.
6. (vi)
Z=U.
7. (vii)
Pg=−u∗+Id−21PU⊥.
8. (viii)
T=Pg=−u∗+Id−21PU⊥.
9. (ix)
F=U.
10. (x)
(∀n∈N)*
Tnx=−nu∗+PUx+2n1PU⊥x.*
11. (xi)
(∀n∈N)*
PUTnx=−nu∗+PUx.*
12. (xii)
(∀n∈N)*
∥Tnx∥≥∥PUTnx∥≥n∥u∗∥−∥PUx∥→+∞.
*
Proof. (i):
Clear since ∇21distU2=Id−PU=PU⊥.
Note that ∇g=u∗+Id−PU=u∗+PU⊥.
(ii):
U−dom∂g=U−X=X.
(iii):
dom∂g∗=ran∇g=u∗+U⊥ is closed.
On the other hand, dom∂g∗ is a dense
subset of domg∗.
Hence dom∂g∗=domg∗=u∗+U⊥
and thus ranNU+ran∂g=U⊥+(u∗+U⊥)=u∗+U⊥.
(iv):
Clear from (iii)
and the assumption that u∗=0.
(v):
By [10, Proposition 6.1], (ii),
and (iii),
we have v=PU−domg∩U⊥+domg∗(0)=Pu∗+U⊥(0)=u∗+PU⊥(0−u∗)=PU(u∗)=u∗.
(vi):
Using (9), (i),
and (v), we have
x∈Z⇔v∈NU(x)+∂g(x−v)⇔
[x∈U and u∗∈U⊥+u∗+PU⊥(x−u∗)]
⇔x∈U.
(vii):
Set y=−u∗+x−21PU⊥x.
By (i) and (v),
y+∇g(y)=(−u∗+x−21PU⊥x)+(u∗+PU⊥(−u∗+x−21PU⊥x))=x.
Thus y=Pg(x) as claimed.
(viii):
Using (6) and (vii),
we obtain T=Id−PU+PgRU=PU⊥+Pg(PU−PU⊥)=PU⊥−u∗+(Id−21PU⊥)(PU−PU⊥)=−u∗+PU+21PU⊥=−u∗+PU+PU⊥−21PU⊥=−u∗+Id−21PU⊥=Pg.
(ix):
Using (13), (v), and
(viii), we have
x∈F⇔x=T(x+v)⇔x=−u∗+PUx+21PU⊥(x+v)⇔x=PUx+21PU⊥x⇔x∈U.
(x):
This follows from (viii) and
(v) by a straight-forward induction.
(xi):
Apply PU to (x) and use
(v).
(xii):
This follows from (xi).
\hfill■
Remark 5.6**.**
Example 5.5* illustrates the importance
of the constraint qualification (11);
indeed, it provides a scenario where
(11) fails (see item (iv)) and the
shadow sequence never converges (see item (xii)).*
Remark 5.7**.**
While Theorem 5.1 guarantees that (PUTnx)n∈N
converges weakly to a minimizer of ιU+g(⋅−v),
we leave numerical experiments and the
development of
meaningful termination criteria as topics for future research.
A promising starting point appears to be the analysis in
[2, Section 5].
The remaining results in this section were
inspired by a referee’s question.
Theorem 5.8** **(switching the order of the operators).
*Set T=Id−Pg+PURg=Id−Pg+PU(2Pg−Id).
Suppose that777This assumption is satisfied if, for instance,
X is finite-dimensional.
To see this, proceed as in the proof
of Proposition 3.1(ii),
with the roles of ιU and g switched.
Pran(Id−T)(0)=−v.
Let x∈X.
Then the following hold:*
Proof. Observe that PURU=PU and RU2=Id.
(i):
Using [14, Theorem 2.7(i)] we learn that
(∀n∈N)PUTn=PURUTnRURU=PUTnRU.
(ii):
Tn−Tn+1=PgTn−PURgTn=PgTn−2PUPgTn+PUTn=PUTn−RUPgTn.
Now combine with 17.
(iii):
Recall that −v∈U⊥ by 29.
Now combine with (ii).
(iv):
This is Theorem 5.1.
(v):
Combine (i)
and (iv) with x
replaced by RUx.
(vi):
It follows from (iii) and
(v)
that
PUPgTnx⇀PUy(RUx).
Now combine with (ii).
\hfill■
In the setting of Theorem 5.1, we point out
that no general conclusion can be drawn about the sequence
(PgTnx)n∈N as we illustrate below.
Example 5.9** ((PgTnx)n∈N may converge).**
Suppose that (U,g)=(X,ιX).
Then PU=Pg=T=T=Id.
Hence, ran(Id−T)=ran(Id−T)={0}.
Consequently, v=−v=0 and
(∀n∈N)(∀x∈X)PgTnx=x=limn→∞PgTnx.
Example 5.10** ((PgTnx)n∈N may have no cluster points).**
Suppose that X=R2, that U=R×{0},
that C=epi(∣⋅∣+1) and that g=ιC.
Let x∈[−1,1]×{0}.
Using induction, one can show that
(∀n∈{1,2,…})Tnx=(0,n)∈C.
Consequently, ∥PgTnx∥=∥PCTnx∥=n→+∞.
6 Minimizing the sum of finitely many functions
In this section we assume for simplicity
that
[TABLE]
that m∈{2,3,…},
that I={1,2,…,m},
and that
[TABLE]
for every i∈I.
Furthermore, we set (see also [6] and [16])
[TABLE]
Remark 6.1**.**
In passing we point out that, by
[11, Theorem 2.16],
we have (∀i∈I)Di=dom∂gi=domgi.
Fact 6.2**.**
Write x=(xi)i∈I∈X.
Then the following hold:
(i)
g:X→]−∞,+∞]*
is convex, lower semicontinuous, and proper.*
2. (ii)
Proof. (i):
Clear.
(ii):
This is [6, Proposition 13.30].
(iii):
This is [6, Proposition 16.9].
(iv):
This is [6, Proposition 26.4(ii)].
(v):
This is [6, Proposition 24.11].
(vi):
This is [6, Proposition 26.4(i)].
\hfill■
Next we define the set of least squares solutions
of (Di)i∈I
[TABLE]
Finally, throughout the remainder of this section, we assume that
[TABLE]
Remark 6.3**.**
In many applications,
the individual functions gi have minimizers.
In such cases, (∀i∈I)0∈dom∂gi∗⊆domgi∗, and therefore
0∈domg∗⊆Δ⊥+domg∗.
Proof. (i):
Observe that
that Δ−domg=Δ−domg=Δ−D.
Now combine this with 74
and
Proposition 3.1(ii) applied with (X,U,g)
replaced by (X,Δ,g).
(ii)&(iii):
Combine [3, Lemma 2.2(i)&(iv)]
and 34
applied with (X,U,g)
replaced by (X,Δ,g).
(iv):
The first identity follows from
applying 30 with (X,U,g)
replaced by (X,Δ,g).
The second identity follows from [6, Proposition 26.4(vii)&(viii)].
(v):
This is a direct consequence of item (iv).
(vi):
Combine item (i),
[3, Lemma 2.2(i)]
and [8, Corollary 3.1].
(vii):
This is a direct consequence of
(iv) and (vi).
\hfill■
Proposition 6.5**.**
Suppose that j∈I satisfies that domgj=X.
Then vj=0.
Proof. Set A=argmin(ιΔ+g(⋅−v))
and observe that
Proposition 6.4(i)&(ii)
imply that
A⊆Δ∩(v+domg)⊆Δ∩(v+D)=FixPΔPD.
Note that
74 and Theorem 3.4
(applied with (U,g)
replaced by (Δ,g))
imply that
A=Z.
Hence, e(A)=e(Z)⊆L,
by Proposition 6.4(vii).
Now, let y∈FixPΔPD.
Then Proposition 6.4(iii) implies that
v=y−PD(y)=(y1,…,ym)−(PD1y1,…,PDmym).
Consequently, if
Dj=X then vj=yj−PDjyj=0.
\hfill■
Theorem 6.6**.**
Let x=(xi)i∈I∈X
and set y=limn→∞PFixT(nv+Tnx).
Then
[TABLE]
[TABLE]
Furthermore,
[TABLE]
Proof. 75 and 76 follow
from applying Theorem 5.1 with (X,U,g)
replaced by (X,Δ,g).
It follows from combining 75 and
Theorem 3.4 (applied with (U,g) replaced by (Δ,g))
that
PΔy∈argmin(ιΔ+g(⋅−v))=Z.
Now combine with
Proposition 6.4(vii).
\hfill■
Corollary 6.7**.**
Let
x0∈X, and set
x0=x0,1=⋯=x0,m=x0.
Update via (∀n∈N)
[TABLE]
Then \overline{x}_{n}\to\overline{x}\in\operatorname{argmin}\big{(}\sum_{i\in I}g_{i}(\cdot-v_{i})\big{)}.
Suppose that J⊆I,
that
for every i∈I∖J, fi:X→R
is convex and satisfies domfi=X and argminfi=∅,
and that
for every i∈J, Ci=X is convex, closed, and nonempty.
Set LC=argmin∑i∈JdistCi2.
Consider the problem
[TABLE]
Suppose that \operatorname{zer}\big{(}\sum_{i\in I\smallsetminus J}\partial f_{i}+\sum_{i\in J}{\operatorname{N}}_{C_{i}}(\cdot-v_{i})\big{)}\neq\varnothing.
Let
x0∈X, and set
x0=x0,1=⋯=x0,m=x0.
Update via (∀n∈N)
[TABLE]
Then xn→x∈X,
and x is a solution of
[TABLE]
In particular, if ∩i∈JCi=∅,
then LC=∩i∈JCi=∅
and x is a solution of 79.
Proof. Suppose that gi=fi, if i∈I∖J;
and gi=ιCi, if i∈J,
and observe that 79 reduces to
[TABLE]
Note that combining
78
and [6, Example 23.4]
yields 80.
It follows from Proposition 6.5
that (∀i∈I∖J)vi=0.
Consequently,
\operatorname{zer}\big{(}\sum_{i\in I}\partial g_{i}(\cdot-v_{i})\big{)}=\operatorname{zer}\big{(}\sum_{i\in I\smallsetminus J}\partial f_{i}+\sum_{i\in J}{\operatorname{N}}_{C_{i}}(\cdot-v_{i})\big{)}\neq\varnothing,
and by Corollary 6.7 we have xn→x∈X,
and \overline{x}\in\operatorname{zer}\big{(}\sum_{i\in I\smallsetminus J}\partial f_{i}+\sum_{i\in J}{\operatorname{N}}_{C_{i}}(\cdot-v_{i})\big{)}.
Finally, using Proposition 6.4(vi),
(∃u∈X)−u∈∑i∈I∖J∂fi(x)=∂(∑i∈I∖Jfi)(x)
and u∈∑i∈JNCi(x−vi)⊆N∩i∈J(vi+Ci)(x)=NLC(x).
Therefore, x solves 81.
\hfill■
Acknowledgements
The authors thank the editor and three anonymous referees for
insightful comments that led to a substantially improved manuscript.
The research of HHB was partially supported by a Discovery Grant
of the Natural Sciences and Engineering Research Council of
Canada.
The research of WMM was partially supported by
the Natural Sciences and Engineering Research Council of
Canada Postdoctoral Fellowship.
Bibliography27
The reference list from the paper itself. Each links out to its DOI / PubMed record.
1[1]
2[2] G. Banjac, P. Goulart, B. Stellato, and S. Boyd, Infeasibility detection in the alternating direction method of multipliers for convex optimization, Journal of Optimization Theory and Applications 183 (2019), 490–519.
3[3] H.H. Bauschke and J.M. Borwein, Dykstra’s alternating projection algorithm for two sets, Journal of Approximation Theory 79 (1994), 418–443.
4[4] H.H. Bauschke, J.M. Borwein, and A.S. Lewis, The method of cyclic projections for closed convex sets in Hilbert space, in Recent Developments in Optimization Theory and Nonlinear Analysis (Jerusalem 1995), Contemporary Mathematics 204 (1997), 1–38.
5[5] H.H. Bauschke, R.I. Boţ, W.L. Hare, and W.M. Moursi, Attouch-Théra duality revisited: paramonotonicity and operator splitting, Journal of Approximation Theory 164 (2012), 1065–1084.
6[6] H.H. Bauschke and P.L. Combettes, Convex Analysis and Monotone Operator Theory in Hilbert Spaces , 2nd edition, Springer, 2017.
7[7] H.H. Bauschke, P.L. Combettes, and D.R. Luke, Finding best approximation pairs relative to two closed convex sets in Hilbert spaces, Journal of Approximation Theory 127 (2004), 178–192.
8[8] H.H. Bauschke, M.N. Dao, and W.M. Moursi, The Douglas–Rachford algorithm in the affine-convex case, Operations Research Letters 44 (2016) 379–382.