A general theory of singular values with applications to signal denoising
Harm Derksen

TL;DR
This paper develops a general theory extending singular values to arbitrary finite-dimensional spaces with dual norms, with applications in various signal denoising techniques.
Contribution
It introduces a unified framework for singular values in general normed spaces, generalizing matrix and tensor decompositions for improved denoising methods.
Findings
Unified theory of singular values for dual norm spaces
Applications to diverse denoising problems like TV and LASSO
Enhanced understanding of signal decomposition and noise separation
Abstract
We study the Pareto frontier for two competing norms and on a vector space. For a given vector , the pareto frontier describes the possible values of for a decomposition . The singular value decomposition of a matrix is closely related to the Pareto frontier for the spectral and nuclear norm. We will develop a general theory that extends the notion of singular values of a matrix to arbitrary finite dimensional euclidean vector spaces equipped with dual norms. This also generalizes the diagonal singular value decompositions for tensors introduced by the author in previous work. We can apply the results to denoising, where is a noisy signal, is a sparse signal and is noise. Applications include 1D total variation denoising, 2D total variation Rudin-Osher-Fatemi image denoising, LASSO, basis pursuit denoising and tensor…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
A general theory of singular values
with applications to Signal Denoising
Harm Derksen
Abstract.
We study the Pareto frontier for two competing norms and on a vector space. For a given vector , the Pareto frontier describes the possible values of for a decomposition . The singular value decomposition of a matrix is closely related to the Pareto frontier for the spectral and nuclear norm. We will develop a general theory that extends the notion of singular values of a matrix to arbitrary finite dimensional euclidean vector spaces equipped with dual norms. This also generalizes the diagonal singular value decompositions for tensors introduced by the author in previous work. We can apply the results to denoising, where is a noisy signal, is a sparse signal and is noise. Applications include 1D total variation denoising, 2D total variation Rudin-Osher-Fatemi image denoising, LASSO, basis pursuit denoising and tensor decompositions.
The author was partially supported by NSF grant DMS 1601229.
1. Introduction
Sound, images, and videos can be corrupted by noise. Noise removal is a fundamental problem in signal and image processing In the additive noise model, we have an original signal , additive noise and a corrupted signal . We will work with discrete signals and view , and as vectors or arrays. The problem of noise removal can be framed in terms of competing norms on a vector space. The Pareto frontier defines the optimal trade-off between the two norms. The Pareto frontier was used in the L-curve method in Tikhonov regularization (see [44, 50, 27, 28] and [36, Chapter 26]), and in basis pursuit denoising (see [6, 48, 26]) to find optimal regularization parameters. The Pareto frontier is a continuous convex curve, and has a continuous derivative if one of the norms is the euclidean norm (see [6], Lemma 3.2 and Proposition 4.5).
We will assume that the original signal is sparse. A vector is sparse when it has few nonzero values. We will also consider other notions of sparseness. For example, a piecewise constant function on an interval can be considered sparse because its derivative has few nonzero values (or values where it is not defined), and a piecewise linear function can be considered sparse because the second derivative has few nonzero values. A sound signal from music is sparse because it contains only a few frequencies. A typical image is sparse, because it has large connected areas of the same color, i.e., the image is piecewise constant. This is exploited in the total variation image denoising method of Rudin, Osher and Fatemi ([46]). A matrix of low rank can also be viewed as sparse because it is the sum of a few rank 1 matrices. In this context, principal component analysis can be viewed as a method for recovering a sparse signal. The noise signal , on the other hand, is not sparse. For example, white noise contains all frequences in the sound spectrum. gaussian additive noise in an image will be completely discontinuous and not locally constant at all.
There are many ways to measure sparseness. Examples of sparseness measures are the “norm” (which actually is not a norm), the rank of a matrix or the number of different frequencies in a sound signal. It is difficult to use these measure because they are not convex. We deal with this using convex relaxation, i.e., we replace the non-convex sparseness measure by a convex one. In this paper, we will measure sparseness using a norm on the vector space of all signals. These norms coming from convex relaxation are typically -type norms. For example, we may replace the “norm” by the norm, or replace the rank of a matrix by the nuclear norm. Noise will be measured using a different norm . This typically will be a euclidean norm or perhaps an type norm. The quotient is large for the sparse signal , and is small for the noise signal . To denoise a signal we search for a decomposition where and are small. Minimizing and are two competing objectives. The trade-off between these two objectives is governed by the Pareto frontier. The concept of Pareto efficiency was used by Vilfredo Pareto (1848–1923) to decribe economic efficiency. A point is called Pareto efficient if there exists a decomposition with and such that for every decomposition we have , or . If is Pareto efficient then we will call the decomposition an -decomposition. Many methods such as LASSO, basis pursuit denoising, the Dantzig selector, total variation denoising and principal component analysis can be formulated as finding an -decomposition for certain norms and .
In most examples that we consider, the space of signals has a positive definite inner product and the norms and are dual to each other. The inner product gives an euclidean norm defined by . We now have different norms. We will show that -decompositions and -decompositions are the same. We define the Pareto sub-frontier as the set of all points where is an -decomposition (or equivalently, a -decomposition). The Pareto sub-frontier lies on or above the Pareto frontier (by the definition of the Pareto frontier). A vector is called tight if its Pareto frontier and Pareto sub-frontier coincide. If every vector is tight then the norms and are called tight. We show that for tight vectors, the Pareto (sub-)frontier is piecewise linear. For a tight vector , we will define the slope decomposition of which can be thought of as a generalization of the singular value decomposition.
The nuclear norm and spectral norm of a matrix are dual to each others (see for example [12, 45] and Lemma 7.1) and we will show that these norms are tight in Section 7. If the singular values of a matrix are with multiplicities respectively, we have the following well-known formulas for the spectral, nuclear and euclidean (Frobenius) norm:
[TABLE]
The singular value region of is a bar of height and width , followed by a bar of height and width , etc. The singular values and their multiplicity can now easily be read off from the singular value region. For exampe if a matrix has eigenvalues with multiplicity 2, 1 with multiplicity 1 and 0.5 with multiplicity 2, then the singular value region is plotted below:
The height of the singular value region is the spectral norm, the width is the rank, the area is the nuclear norm, and if we integrate over the region we obtain the square of the Frobenius norm . The Pareto frontier of a matrix (which is also its Pareto sub-frontier) encodes the singular values of the matrix, and the slope decomposition is closely related to the singular value decomposition. The slope decomposition is unique, but the singular value decomposition may not be unique if some of the singular values coindice.
A tensor is a higher dimensional array. A -way tensor is a vector for and a matrix for . For , one can generalize the rank of a matrix to tensor rank ([31]). The tensor rank is closely related to the canonical polyadic decomposition of a tensor (CP-decomposition). This decomposition is also known as the PARAFAC ([29]) or the CANDECOMP model ([8]). The nuclear norm of a matrix can be generalized to the nuclear norm of a tensor ([25, 47]), and this can be viewed as a convex relaxation of the tensor rank. The spectral norm of a matrix can be generalized to a spectral norm of a tensor, and this norm is dual to the nuclear norm of a tensor ([19]). Not every tensor is tight. A tensor that is tight will have a slope decomposition which generalizes the diagonal singular value decomposition introduced by the author in [19]. Every tensor that has a diagonal singular value has a slope decomposition but the converse is not true. We will define singular values and multiplicities for tight tensors such that the formulas (1), (2), (3) are satisfied. The multiplicities of the singular values of tensors are nonnegative, but are not always integers. For example, in Section 13 we will show that the tensor
[TABLE]
is tight and has the singular value with multiplicitiy (and not singular value with multiplicity 6 as one might expect). For tensors that are not tight we can still define the singular value region, but the singular value interpretation may me more esoteric. For example, we will show that the tensor
[TABLE]
has the following singular value region:
We will define the singular value region in a very general context. Whenever is a finite dimensional euclidean vector space, and and are norms that are dual to each other, then we can define the singular value region for any .
Contents
2. Main Results
2.1. The Pareto frontier
Let us consider a finite dimensional -vector space equipped with two norms, and . Suppose that . We are looking for decompositions that are optimal in the sense that we cannot reduce without increasing and we cannot reduct without increasing . We recall the definition from the introduction:
Definition 2.1**.**
A pair is called Pareto efficient if there exists a decomposition with , such that for every decomposition we have , or . If is a Pareto efficient pair then we call an -decomposition.
By symmetry, is an -decomposition if and only if is a -decomposition. The Pareto frontier consists of all Pareto efficient pairs (see [6]). The Pareto frontier is the graph of a strictly decreasing, continuous convex function
[TABLE]
(see [6] and Lemmas 3.2 and 3.3). If we change the role of and we get the graph of , so and are inverse functions of each other.
Example 2.2**.**
Consider the vector space . Sparseness of vectors in can be measured by the number of nonzero entries. For we define
[TABLE]
Note that is not a norm on because it does not satisfy for . Convex relaxation of gives us the norm . This means that the unit ball for the norm is the convex hull of all vectors with . Let us take and and describe the Pareto frontier. Suppose that and . If then we have
[TABLE]
If we take for all , then we have equality. So is an -decomposition where . This shows that
[TABLE]
For the vector we plotted and (see also Example 2.11)
For example if we take then we get the decomposition with and and we have and . The vector is sparser than . This procedure of noise reduction is soft tresholding in its simplest form.
In Section 3, we will study the Pareto frontier and -decompositions in more detail.
2.2. Dual norms and the Pareto sub-frontier
We now assume that we have a positive definite bilinear form on the finite dimensional vector space . The euclidean norm on defined by . Suppose that is another norm on . We may think of as a norm that measures sparseness. For denoising, we compare the norms and . We also consider the dual norm on defined by
[TABLE]
The dual norm of is again. There is an interesting interplay between the 3 norms, and the -decompositions, -decompositions and -decompositions are closely connected. The following proposition and other results in this section will be proved in Section 4.
Proposition 2.3**.**
For a vector , the following three statements are equivalent:
- (1)
* is an -decomposition;* 2. (2)
* is a -decomposition;* 3. (3)
* and .*
Definition 2.4**.**
If the statements (1)–(3) in Proposition 2.3 imply
- (4)
is an -decomposition.
then is called tight. If all vectors in are tight then the norm is called tight.
Definition 2.5**.**
The Pareto sub-frontier is the set of all pairs such that there exists an -decomposition with and .
We will show in Section 4 that the Pareto sub-frontier is the graph of a decreasing Lipschitz continuous function
[TABLE]
By symmetry, is the inverse function of . From the definitions it is clear that . If , then every -decomposition is automatically a -decomposition, and is tight
Corollary 2.6**.**
A vector is tight if and only if .
If then there is no space between the two graphs and they fit together tightly, which explains the name of this property. Let us work out an example where the norms are not tight.
Example 2.7**.**
Define a norm on by
[TABLE]
Its dual norm is given by
[TABLE]
For , the decomposition
[TABLE]
is an -decomposition, where
[TABLE]
To verify this, we compute
[TABLE]
and
[TABLE]
The Pareto sub-frontier is parameterized by
[TABLE]
The -decompositions of are for , where
[TABLE]
The Pareto frontier is parameterized by
[TABLE]
We plotted the Pareto frontier in green and the Pareto sub-frontier in blue:
The Pareto frontier and sub-frontier are not the same, so the vector is not tight.
The Pareto sub-frontier encodes crucial information about the -decompositions. If is a point on the Pareto subfrontier and is an -decomposition with and , then we can read off , , , , , , and from the Pareto sub-frontier using the following proposition.
Proposition 2.8**.**
Suppose that is a point on the -Pareto sub-frontier of , and let be an decomposition with and .
- (1)
The area below the sub-frontier is equal to . 2. (2)
The area below the sub-frontier and to the right of is equal to . 3. (3)
The area below the sub-frontier and above is equal to . 4. (4)
The area of the rectangle is .
2.3. The slope decomposition
Proofs of results in this subsection will be given in Section 6. We define the slope of a nonzero vector as the ratio
[TABLE]
If is a norm that is small for sparse signals, then is large for sparse signals, and small for noise. Note that . Using the slope function, we define the slope decomposition. If is the nuclear norm for matrices, then the slope decomposition is closely related to the singular value decomposition. So we can think of the slope decomposition as a generalization of the singular value decomposition.
Definition 2.9**.**
An expression is called an -slope decomposition if are nonzero, for all and .
Note that because of symmetry, is an slope decomposition if and only if is a slope decomposition. There may be vectors that do not have a slope decomposition. We will prove the following result.
Theorem 2.10**.**
- (1)
A vector is tight if and only if has a slope decomposition. 2. (2)
Suppose that is a slope decomposition, and let and for all . Then , and the Pareto frontier (which is the same as the sub-Pareto frontier) is the piecewise linear curve through the points
[TABLE]
Example 2.11**.**
Let us go back to Example 2.2. The norms and are dual to each other. We will show that these norms are tight. If we integrate the function
[TABLE]
from [math] to , we get which is the area under the graph of . So the areas under the graphs of and are the same, and we deduce that . This shows that is tight. Since is arbitrary, the norms and are tight By Theorem 2.10 above, every vector has a slope decomposition. For example
[TABLE]
is a slope decomposition. Let and . Then we have , , , , , . The Pareto curve is the piecewise linear function going through
[TABLE]
2.4. Geometry of the unit ball
For we define the -ball of radius by
[TABLE]
We explain denoising in terms of the geometry of the -balls. Suppose we want to denoise a signal , such that the denoised signal is sparse. We impose the constraint . Under this constraint, we minimize the amount of noise, by minimizing the norm of . This means that the is the vector inside the ball that is closest to the vector . We call the projection of onto the ball and write . The function is a retraction of onto the ball . If , then it is clear that
[TABLE]
For one might expect that . This is not always true, but it is true in the case where is tight by Proposition 2.12 below.
We also define a shrinkage operator by
[TABLE]
If is an -decomposition, and then we have
[TABLE]
and
[TABLE]
The function can be seen as a denoising function where is the noise level.
A nice property of tight vectors is the transitivity of denoising.
Proposition 2.12**.**
If is tight, then we have
- (1)
* and* 2. (2)
.
The unit ball is a closed convex set, but not always a polytope. We recall the definition of a face of a closed convex set:
Definition 2.13**.**
A face of a closed convex set is a convex closed subset with the following property: if , and then we have .
We will study faces of the unit ball , and the cones associated to it.
Definition 2.14**.**
A facial -cone is a cone of the form where is a (proper) face of the unit ball . The set is considered a facial -cone as well.
If is a nonzero facial -cone, then for some proper face of the unit ball . In that case, we have , where is the unit sphere. We now will discuss two notions of sparseness related to a norm .
Definition 2.15**.**
For a nonzero vector we define its -sparseness as the smallest nonnegative integer such that we can write
[TABLE]
where is a extremal point of the unit ball for . The geometric -sparseness is where is the smallest facial -cone containing .
Each notion of sparseness has its merits. We have
[TABLE]
but a similar inequality does not always hold for geometric sparseness. On the other hand, the set
[TABLE]
of -geometric -sparse vectors is closed, but the set
[TABLE]
of --sparse vectors is not always closed.
We have
[TABLE]
by the Carathéodory Theorem (see [4, Theorem 2.3]).
Example 2.16**.**
Consider , and let and . We have
[TABLE]
where
[TABLE]
The function is sometimes referred to as the norm, but is strictly speaking not a norm.
[TABLE]
We have
[TABLE]
but
[TABLE]
Example 2.17**.**
If is the nuclear norm on a space of matrices. and is a matrix, then is the rank of .
Example 2.18**.**
Let be the tensor product space. The value is the tensor rank of . It is known that the set of all tensors of rank is not closed. If is the standard orthogonal basis of , then the tensor
[TABLE]
has rank , but is a limit of tensors of rank . In this case, the geometric sparseness and sparseness are not the same. For example, if
[TABLE]
then is the tensor rank, but .
Theorem 2.19**.**
Suppose that is an -decomposition. Then we have
[TABLE]
where . Moreover, if is tight then we have
[TABLE]
since , the theorem also implies that
[TABLE]
2.5. The Singular Value Region
We can generalize the notion of the singular values region to an arbitrary finite dimensional vector space with dual norms and . Let be the Pareto sub-frontier of . For the definition of the singular value region, we view as a function of which gives . The function is Lipschitz and decreasing and is differentiable almost everywhere.
Definition 2.20**.**
The singular value region is the region in the first quadrant to the left of the graph of .
If the graph of is a step function, then we can interpret the region as singular values with multiplicities similarly as in the case of matrices. If a vector has a slope decomposition, then we can easily find the singular values and multiplicities from Theorem 2.10. Recall that and .
Corollary 2.21**.**
If
[TABLE]
is an slope decomposition, then the singular values are
[TABLE]
with multiplicities
[TABLE]
respectively.
As we will see in Section 7, this notion of the singular value region is indeed a generalization of the singular value region defined for matrices in the introduction.
3. The Pareto Curve
3.1. Optimization problems
We will formulate signal denoising as an optimization problem. Suppose that is an -dimensional -vector space equipped with two norms, and . Let be a fixed vector. Given a noisy signal we would like to find a decomposition where both and is small. We can do this by minimizing under the constraint for some fixed , or by minimizing under the constraint for some with , we formally define the following optimization problem:
Problem : *Minimize for under the constraint .
This has a solution because the function is continuous, and the domain is compact. Let be the smallest value of . As we will see later in Lemma 3.3, the graph of consists of all Pareto efficient pairs , and we call the Pareto curve or Pareto frontier. Since will be fixed most of the time, we may just write and instead of and . We will prove some properties of the Pareto curve.
Lemma 3.1**.**
If and is a solution to , then .
Proof.
Suppose that is a solution to and . Note that , so and . Choose such that and define . Then we have and
[TABLE]
Contradiction. ∎
3.2. Properties of the Pareto curve
The following lemma gives some basic properties of the Pareto curve. In various contexts, these properties are already known. We formulate the properties for the problem of two arbitrary competing norms on a finite dimensional vector space.
Lemma 3.2**.**
The function is a convex, strictly decreasing, continuous function on the interval .
Proof.
Suppose that . There exist with and for .
Let for some . Then we have , so by definition we have
[TABLE]
This proves that is convex.
For some we can write . We have
[TABLE]
This shows that is strictly decreasing.
Since is convex, it is continuous on . Because is decreasing, it is continuous at . We will show that is also continuous at . Suppose that is a sequence in with . Choose such that and . Since , we have that . It follows that because is a continuous function. ∎
We show now that the graph of consists of all Pareto efficient pairs.
Lemma 3.3**.**
Suppose that .
- (1)
* is an -decomposition if and only if is a solution to .* 2. (2)
The pair is Pareto efficient if and only if and .
Proof.
(1) Suppose that . If is an -decomposition, then it is clear from the definitions that is a solution to .
On the other hand, suppose that is a solution to and . Let and . Assume that . Then we have . If , then is also a solution to and by Lemma 3.1. This shows that is an -decomposition.
(2) Suppose that is Pareto efficient. Assume that . From the decomposition follows that . Contradiction. This shows that . There exists a decomposition with and . Because , we have . Suppose that is a solution of . Then by Lemma 3.1. Because is Pareto efficient, we have . We conclude that .
Conversely, suppose that and . Let be a solution of and . Suppose that is another decomposition. If , then we have . If then we have . We conclude that is Pareto efficient. ∎
Corollary 3.4**.**
- (1)
The function is a homeomorphism and its inverse is . 2. (2)
A vector is a solution to if and only if is a solution to .
Proof.
(1) If and then we have
[TABLE]
So and are inverse of each other. Since both functions are continuous, the functions are homeomorphisms.
If and then we have
[TABLE]
∎
3.3. Rigid norms
The problem does not always have a unique solution.
Definition 3.5**.**
We say that is rigid if has a unique solution for all .
Let us give an example of a vector that is not rigid.
Example 3.6**.**
Suppose and for all . Then is rigid because for the vector is the unique solution to . The vector is not rigid: If then has infinitely many solutions, namely , .
If is rigid, then we can study how the unique solution of varies as we change the value of . The lemma below shows that the solution varies continuously. In various contexts, this property is well-known and used in homotopy continuation methods (see for example [42, 43, 21, 6]) for some optimization problem.
Lemma 3.7**.**
Suppose that is rigid and let be the unique solution to for . Then is continuous.
Proof.
Suppose that is a sequence for which exists. We assume that does not exist, or that it is not equal to . By replacing by a subsequence, we may assume that exists, but that it is not equal to . We have . Also, we get because is continuous. Because has a unique solution, we conclude that . Contradiction. We conclude that . This proves that is continuous. ∎
Definition 3.8**.**
The norm is called strictly convex if implies that and are linearly dependent.
The -norm on is strictly convex, for example.
Lemma 3.9**.**
If is strictly convex, then every vector is rigid.
Proof.
Suppose that is strictly convex and that . We will prove that is rigid. Suppose and , where and . Let . Then we have . By definition, we have . It follows that
[TABLE]
Since is strictly convex, and are linearly dependent. Because , it follows that . If then we have . From follows that and and the uniqueness is established. If , then and we have again uniqueness. ∎
3.4. The Pareto curve of sums
Next we will compare the Pareto curve of with the Pareto curves and . For this purpose, we introduce the concatenation of two functions. Suppose that and are two functions with . The concatenation is defined by
[TABLE]
Note that and . If and are decreasing, then so is . If and are continuous, then so is . Note that concatenation is associative: . However, it is not commutative.
Example 3.10**.**
Suppose that are defined by and :
The graphs of and are:
Lemma 3.11**.**
Suppose that . If then we have
[TABLE]
Proof.
Suppose that . Choose such that and . Then we have
[TABLE]
Reversing the roles of and , and and gives
[TABLE]
if . Substituting where yields
[TABLE]
Applying the decreasing function gives
[TABLE]
∎
4. Duality
4.1. - and -decompositions
Suppose that the vector space is equipped with a positive definite bilinear form and a norm . The bilinear form gives an -norm and let be the dual norm of . In this section we will study -decompositions, which turn out to be the same as -decompositions. We start with the following characterization of an -decomposition:
Proposition 4.1**.**
The expression is an -decomposition if and only if .
Proof.
Suppose that is an -decomposition. Choose a vector such that and . Let . Define and . We have . Therefore, . It follows that
[TABLE]
Taking the limit yields the inequality
[TABLE]
so . The opposite inequality holds because the norms are dual to each other. We conclude that .
Conversely, suppose that . Let , let be the solution to and define . Then is an -decomposition.
[TABLE]
So we conclude that , and is an -decomposition. ∎
The equivalence between the -decomposition and -decomposition (Proposition 2.3) now easily follows.
Proof of Proposition 2.3.
In Proposition 2.3, (1) and (3) are equivalent because of Proposition 4.1. Dually, (2) and (3) are equivalent. ∎
4.2. The Pareto sub-frontier
We define by . The graph of is the Pareto subfrontier. Indeed, if is an -decomposition with and , then we have and . We now prove some properties of the Pareto sub-frontier.
Lemma 4.2**.**
We have .
Proof.
Let and . Then is an -decomposition, therefore also a -decomposition. So is a -decomposition. Let . Then we have . ∎
Lemma 4.3**.**
The function is a strictly decreasing homeomorphism and its inverse is .
Proof.
We have have
[TABLE]
The function is injective, because the function is injective. It follows that . By symmetry, we also have , so is the inverse of .
The function is continuous, because and are continuous. This proves that is a homeomorphism. By the Intermediate Value Theorem, it has to be strictly increasing or strictly decreasing. Since , the function must be strictly decreasing. ∎
Proposition 4.4**.**
The function is Lipschitz continuous.
Proof.
Since and are norms on a finite dimensional vector space, there exists positive constant such that . Suppose that and are -decompositions, with . It follows that . We have
[TABLE]
We conclude that
[TABLE]
∎
4.3. Differentiating the Pareto curve
The function is differentiable. A special case (but with a similar prove) was treated in [6, §2].
Proposition 4.5**.**
The function is differentiable on , and
[TABLE]
Proof.
If then we have
[TABLE]
Reversing the roles of gives us
[TABLE]
if then we obtain
[TABLE]
and if then we have
[TABLE]
Since is continuous, it follows that is differentiable on with derivative . Since is positive on , it is differentiable on . We have
[TABLE]
∎
Proof of Proposition 2.8.
From Proposition 4.5 follows that
[TABLE]
So the area to the right of is
[TABLE]
Similarly, the area below the graph and above the line is equal to . The area below the graph of is equal to . The area of the rectangle is . ∎
The solution for can be obtained from a regularized quadratic minimization problem.
Proposition 4.6**.**
The vector is a solution to if and only if
[TABLE]
is minimal.
Proof.
We can choose such that is minimal. Let . Then is an -decomposition, so . The function
[TABLE]
has a minimum at . So we have
[TABLE]
and . This shows that is solution . Since has a unique solution, this unique solution must minimize . ∎
A similar argument shows that is a solution to if and only if is minimal.
5. Tight vectors
Suppose that is an -dimensional vector space with a positive definite bilinear form, and that and are norms which are dual to each other. From the definitions it is clear that . Recall that is tight if we have equality for all . If is tight and rigid, then for . If every vector in is tight, then and are called tight norms. In this section we study properties of tight vectors and tight norms.
5.1. An example of a norm that is not tight
Consider the norm on defined by for . Its dual norm is given by . The unit balls are polar duals of each other:
Consider the vector . Below are the functions and . We see that and are not the same, so is not tight. The example shows that is not always convex.
The trajectories of (green) and (blue) are sketched in the graph below.
For every positive value of , lies on the unit ball . In fact, is the vector in that is closest to with respect to the euclidean distance. In the graph below, and are plotted for for various values of . Note that is constant on the intervals and . On these intervals moves on a line through the origin. On the other intervals and , moves on a line through .
The singular value region for the vector is as follows:
If we want to iterpret the singular value region in terms of singular values, we must allow negative multiplicities. The singular values of are 21 with multiplicity , with multiplicity , with multiplicity and with multiplicity .
5.2. The Pareto sub-frontier of sums
We have defined the concatenation of two graphs, and now we define the concatenation of two paths in in a similar manner. If and are curves with , then we define the concatenation by
[TABLE]
Theorem 5.1**.**
Suppose that is tight and that is an -decomposition.
- (1)
* and are tight as well;* 2. (2)
* and ;* 3. (3)
; 4. (4)
.
Proof.
Suppose that . Choose a vector such that and . We have
[TABLE]
Suppose that one of the inequalities above is strict for some . Integrating from [math] to yields
[TABLE]
Contradiction. We conclude that for all . In particular, is tight. By symmetry, is tight as well. This proves (1). For we get . So we have and by symmetry we also have . This proves (2).
If , then we have
[TABLE]
By symmetry, if , then we have
[TABLE]
It follows that
[TABLE]
Substituting yields
[TABLE]
This proves (3).
If , then we have
[TABLE]
So we get
[TABLE]
If then we have
[TABLE]
and
[TABLE]
So for all we have
[TABLE]
and
[TABLE]
So is the unique solution to and therefore equal to . This proves (4). ∎
Proof of Proposition 2.12.
We already now part (1) in the case . Assume that . Let be the -decomposition with . This is also a -decomposition and . Now and are tight by Theorem 5.1(1). Let be an -decomposition with . Now is also an -decomposition and . We have . So is an -decomposition (and -decomposition) and . This proves that and part (1) has been proved.
Suppose that is an -decomposition with . Then . Let be an -decomposition with . Then . Similar reasoning as before shows that is an -decomposition. Also is tight and is an -decomposition, so . Therefore, and we are done. ∎
Lemma 5.2**.**
If is an -decomposition and and are tight, then we have .
Proof.
Suppose that and let . We get
[TABLE]
It follows that
[TABLE]
So is an -decomposition and . We get
[TABLE]
Suppose that and define . Then we have and by symmetry we get
[TABLE]
and
[TABLE]
We conclude that . ∎
We will show later in Proposition 6.11 that under the assumptions of Lemma 5.2 the vector is tight.
6. The slope decomposition
6.1. unitangent vectors
In this section we study the slope decomposition. We show that a vector is tight if and only if it has a slope decomposition. We also will show that the Pareto frontier is always piecewise linear for a tight vector. In that case, the different slopes in the Pareto frontier correspond to different summands in the slope decomposition of .
For a vector we have the inequality .
Definition 6.1**.**
We call a vector unitangent if .
Unitangent vectors are the simplest kind of tight vectors. As we will see, their Pareto frontier is linear, i.e., has only one slope. Recall that is the maximal value of the functional on the unit ball . Now is unitangent if and only if the maximum of is attained at .
Proposition 6.2**.**
If is unitangent, then it is tight and we have and for .
Proof.
Suppose that . Then we have
[TABLE]
It follows that
[TABLE]
Since was arbitrary, we conclude that .
If we take , then we have and . We conclude that and . ∎
6.2. Faces of the unit ball
Suppose that is a compact convex subset of a finite dimensional -vector space . Recall that a convex closed subset of is a face if , and implies that . The following lemma is easily shown by induction and is left to the reader.
Lemma 6.3**.**
If is a face of , , such that and , then .
Proof.
We prove the statement by induction on . This is clear from the definition of a face for . Suppose . Let and . We have , so . Since and we get by induction. ∎
Lemma 6.4**.**
If is a compact convex set, then the smallest face containing is
[TABLE]
Proof.
Suppose that and for some . There exist such that , so we have
[TABLE]
and therefore . This proves that is a face of .
Suppose that is any face of containing . If then there exists such that . Since is a convex combination of , we have . So contains . ∎
Let be the unit ball for the norm .
Lemma 6.5**.**
A convex cone in is a facial -cone if and only if it has the following norm-sum property: If , and then we have .
Proof.
Suppose that is a cone in . If then is a facial -cone and has the norm-sum property. Assume now that . Let be the unit sphere. Take so that . We have to show that is a face of if and only if has the norm-sum property.
Suppose that is a face. If and then we have
[TABLE]
where and . If or then or and . Otherwise, and because is a face. We conclude that . So has the norm-sum property.
Conversely, suppose that has the norm-sum property, and such that then we have
[TABLE]
and the inequalities are equalities. It follows that The norm-sum property gives , so . We conclude that . ∎
Lemma 6.6**.**
Suppose that is nonzero, and is the set of all for which there exists such that . Then is the smallest facial -cone containing .
Proof.
If , then we have and every facial cone containing must also contain and by Lemma 6.5. Now itself is a facial cone: if then there exists such that for . We can replace and by the minimum of the two and assume that . We have
[TABLE]
We must have equalities everywhere, so and lie in by Lemma 6.5. ∎
Definition 6.7**.**
For and we define as the smallest face of containing .
Lemma 6.8**.**
Suppose that is a tight vector. If then we have .
Proof.
Suppose that is tight and . For we have
[TABLE]
which lies in the unit ball . This proves that lies in . We conclude that . ∎
Proposition 6.9**.**
If is tight, then and are piecewise linear.
Proof.
We can divide up the interval into finitely many intervals such that on each interval is constant. Suppose that is an open interval on which equal to . The affine hull of is of the form where is a subspace and . Now is the vector in closest to (in the euclidean norm). If we define , then is the vector closest to . So is the orthogonal projection of onto . Since , is the orthogonal projection of onto , so is constant. This proves that is linear.
Because , we have
[TABLE]
So is linear for . ∎
Proof of Theorem 2.19.
Suppose that is an -decomposition and let and . The smallest face of containing is and the smallest face containing is . For every and every we have . We have . Since lies in the relative interior of , we have for all . Since lies in the relative interior of , we have for all and all . It follows that
[TABLE]
If is tight, them we have and
[TABLE]
∎
6.3. Proof of Theorem 2.10
Suppose that is tight. Then is piecewise linear by Proposition 6.9. Suppose that such that is linear on each interval , and that is not differentiable at . Let , and define for . We have so is an -decomposition for all . By induction we get
[TABLE]
The area under the graph of is . The area under the graph of is
[TABLE]
So we have
[TABLE]
This proves that for all . This shows that is a slope decomposition.
Conversely, suppose that is a slope decomposition. We will show that is tight. Since is also a slope decomposition, so by induction we may assume that is tight, and
[TABLE]
Since is an -decomposition, it follows from Lemma 5.2 that
[TABLE]
Suppose that
[TABLE]
We have
[TABLE]
So we have
[TABLE]
It follows that
[TABLE]
We get , so is tight. We have proven part (1).
(2) Suppose that is tight and is a slope decomposition. Then the graph is a straight line segment from to where and . Since , we have that is the graph through the points for .
6.4. Properties of the slope decomposition
Lemma 6.10**.**
If is an -slope decomposition, then are linearly independent.
Proof.
Suppose that we have
[TABLE]
Because we get
[TABLE]
Contradiction. This proves that are linearly independent. ∎
Proposition 6.11**.**
Suppose that is an -decomposition and and are tight. Then is tight.
Proof.
Since and are tight, they have slope decompositions, say and . We have
[TABLE]
It follows that for all . If then
[TABLE]
is a slope decomposition.
Suppose that . We have
[TABLE]
so . Similarly, we have
[TABLE]
so . We conclude that and
[TABLE]
is a slope decomposition.
Since has a slope decomposition, it is tight. ∎
6.5. The unit ball of a tight norm
Proposition 6.12**.**
A norm is tight if and only if every face of the unit ball contains a unitangent vector that is perpendicular to .
Proof.
Suppose that is tight and that is a face of (other than itself). Choose in the relative interior. Then we have a slope decomposition
[TABLE]
If then we have and
[TABLE]
Now by Lemma 6.3. We have . For any other vector we have . So the functional on F is maximal at , and therefore maximal and constant on the face . It follows that is perpendicular to .
Now we show the converse. Suppose that every face of the unit ball contains a unitangent vector that is perpendicular to . Suppose that is a vector with . Let be the smallest face of that contains . By induction on we show that is tight. The case is clear. Suppose that . There exists a vector that is unitangent and orthogonal to . Choose maximal such that . Let and . Then we have . Since lies in a face of smaller dimension, we know by induction that and are tight. Because is unitangent, it is also tight. We have
[TABLE]
[TABLE]
[TABLE]
So . It follows that is an -decomposition. By Lemma 6.10, is tight. ∎
Example 6.13**.**
Consider again Examples 2.11 and 2.2. Suppose that and define by
[TABLE]
Define be the multiplicity of , i.e., is the number of values of for which . We define vectors as follows:
[TABLE]
and
[TABLE]
We use the convention . We have
[TABLE]
We have . and
[TABLE]
If then we have
[TABLE]
We have
[TABLE]
This shows that (4) is a slope decomposition. So the norms and are tight.
7. The Singular Value Decomposition of a matrix
7.1. Matrix norms
In this section, we will study the singular value decomposition of a matrix using our terminology and the results we have obtained. Let be the space of -matrices. We have a bilinear form on defined by
[TABLE]
where denotes the conjugate transpose of and denotes the real part. We will study 3 norms on the vector space namely the euclidean norm, the nuclear norm and the spectral norm, and express each of these in terms of the singular values of a matrix.
The matrix is a nonnegative definite Hermitian matrix and its eigenvalues are nonnegative and real. The euclidean -norm of a matrix is given by
[TABLE]
Since is positive semi-definite Hermitian, we can choose a unitary matrix such that is a diagonal matrix with diagonal entries where are the singular values of . We have
[TABLE]
Let be the diagonal matrix
[TABLE]
Then is the unique positive semi-definite Hermitian matrix whose square is , and we will denote this matrix by .
We define the spectral norm or operator norm of by
[TABLE]
We have
[TABLE]
The nuclear norm is defined by
[TABLE]
The proof of the following well-known result will be useful for the discussion that follows.
Lemma 7.1**.**
The norms and are dual to each other.
Proof.
Let . We use the notation as before, where , and is the diagonal matrix whose diagonal entries are the singular values . Let be the columns of . These vectors form an orthonormal basis and for any we have
[TABLE]
Because , the columns of are orthogonal. The matrices is unitary. So . If are the columns of , then the singular value decomposition of is
[TABLE]
Let be maximal such that and define the block matrix
[TABLE]
For we have
[TABLE]
From (5) and (6) follows that is the dual norm of . ∎
7.2. Slope decomposition for matrices
Suppose that is a complex matrix and let be the nonsingular values of with multiplicities respectively. We can write where are unitary and is of the form
[TABLE]
where is the identity matrix and the zeros represent possible empty zero blocks. The norms can be expressed as follows:
[TABLE]
Define
[TABLE]
for where and . We have
[TABLE]
Proposition 7.2**.**
The expression (7) is a slope decomposition. In particular, the spectral and the nuclear norms are tight.
Proof.
We have
[TABLE]
In particular, is strictly decreasing as increases.
If then we have
[TABLE]
This proves that (7) is the slope decomposition. ∎
7.3. Principal Component Analysis
In Principal Component Analysis (PCA), one finds a low rank matrix that approximates the matrix by truncating the singular value decomposition. For a given threshold , let be maximal such that . Then
[TABLE]
is a low rank approximation of . This method is called hard threshholding. Replacing by the approximation is an effective way to reduce the dimension of a large scale problem. Let us compare this to the -decomposition (or equivalently - or -decomposition) of . Let us define
[TABLE]
and
[TABLE]
Lemma 7.3**.**
The expression is an -decomposition.
Proof.
We have and . Now the lemma follows from
[TABLE]
∎
In particular, . The operator is soft-threshholding with threshold level (see [20]). Unlike hard thresholding, soft thresholding is continuous. The Pareto frontier (which is also the sub-frontier) is given by
[TABLE]
7.4. The singular value region for matrices
The Pareto frontier of encodes the singular values. We will describe in detail how to obtain the singular values from the Pareto frontier. Note that we consider as a function of , rather then considering its inverse function . If we differentiate with respect to we get
[TABLE]
if and (with the convention that ).
If we plot against then we get the singular value region. From the descriptions above, it is now clear that this region can be described as an bar, followed by an bar etc. So the singular value region from Definition 2.20 is the same as the singular value region for matrices as described in the introduction.
7.5. Rank minimization and low rank matrix completion
Let be the set of matrices. We will study the low rank matrix completion from the viewpoint of competing dual norms.
Problem 7.4** (Low Rank Matrix Completion (LRMC)).**
Given a matrix where the entries are missing. Fill in the missing entries such that the resulting matrix has minimal rank.
The low rank matrix completion problem has applications in collaborative filtering and recommender systems such as the Netflix problem. The low rank matrix completion problem is a special case of the rank minimization problem.
Problem 7.5** (Rank Minimization (RM)).**
Suppose that is a subspace of , and let . Find a matrix of minimal rank.
The Low Rank Matrix Completion problem can be formulated as a rank minimization problem as follows. Complete to a matrix in some way (for example, set all the missing entries equal to [math]). Then, Let be the subspace spanned by all matrices , . Here is the matrix with all [math]’s except for a 1 in position . Find with minimal rank using RM. Then is also the solution to the LRMC problem.
Let us consider the Rank Minimization problem. Using the philosophy of convex relaxation, we consider the following problem instead (see [12, 15, 45]):
Problem 7.6**.**
Find a matrix with minimal.
Let be the orthogonal complement of and let be the orthogonal projection onto . The problem does not change when we replace by , so we may assume that without loss of generality. We define a norm on by
[TABLE]
So Problem 7.6 is essentially the problem of determining the value of . In the presence of noise, we would like to find a matrix such that and are small. This leads to the following optimization problem.
Problem 7.7**.**
For a fixed parameter , minimize
[TABLE]
We can write such that . We can reformulate the problem as:
Problem 7.8**.**
For a fixed parameter , minimize
[TABLE]
The dual norm to is defined by
[TABLE]
Problem 7.7 is equivalent to
Problem 7.9**.**
Minimize under the constraint .
8. Restricting Norms
Suppose that is a finite dimensional -vector space with a positive definite bilinear form , and is a norm on . For a subspace of , it is natural to ask whether the -decompositions of vectors in are always within the space . In this section we will give a sufficient criterion for to have this property.
Definition 8.1**.**
A subspace is called a nice slice if we have for all , where is the orthogonal projection.
Lemma 8.2**.**
If is a nice slice, then we also have for all , where is the norm dual to .
Proof.
Choose a vector with and . We have
[TABLE]
∎
Let be the orthogonal group consisting of all with the property
[TABLE]
for all .
Lemma 8.3**.**
Suppose that is a subgroup with the properties for all and all . Then the space of -invariant vectors is a nice slice.
Proof.
We can replace with its closure, so without loss of generality we may assume that is a compact Lie group. Let be the projection onto . We have
[TABLE]
where is the normalized Haar measure. This shows that lies in the convex hull of all , . Since for all , we also have for all . ∎
Suppose that is a nice slice. Let and be the restrictions of the norms and to .
Lemma 8.4**.**
The norms and are also dual to each other.
Proof.
If and then we have
[TABLE]
For a given , there exists with and
[TABLE]
We get
[TABLE]
Define . We have . It follows that
[TABLE]
because . So . ∎
Lemma 8.5**.**
Suppose that .
- (1)
If is an -decomposition, then and is an decomposition. 2. (2)
If is an decomposition then is an decomposition, and .
Proof.
(1) If is an demposition, then we have
[TABLE]
and . It follows that is also an -decomposition and by uniqueness we have and . Suppose that with and . We get
[TABLE]
because is an -decomposition. This shows that is an -decomposition.
(2) Suppose that is an -decomposition. We get
[TABLE]
and . It follows that is also an decomposition. A similar argument as in (1) shows that this is also an -decomposition. Since , we must have , because is an -decomposition. It follows that
[TABLE]
Similarly, we get . ∎
In particular, we have and .
Example 8.6**.**
Suppose that and . The orthogonal projection is given by . Suppose that where . Choose a unit vector such that . Then we have
[TABLE]
This shows that is a nice slice.
9. 1D total variation denoising
9.1. The total variation norm and its dual
In this section we discuss the application to 1-dimensional total variation denoising. This example is particularly interesting because the corresponding norms are tight.
Define the difference map by
[TABLE]
The map is surjective, and the kernel is spanned by the vector . The dual map is given by
[TABLE]
If we compose the two maps we get
[TABLE]
The linear map is invertible. Let be the subspace defined by
[TABLE]
The image of is If , then is the vector of minimal length with the property .
We define a norm on by
[TABLE]
Another norm on is given by
[TABLE]
Lemma 9.1**.**
Suppose that is a vector with . Let and . Then we have .
Proof.
The vectors that map to under are of the form . We have . This quantity is minimal if . In that case we have . ∎
Lemma 9.2**.**
The norms and are dual to each other.
Proof.
Suppose that . Choose such that and . Then we have
[TABLE]
Suppose that is nonzero and let . Define
[TABLE]
Because , the set has positive and negative elements. This implies that and . If then we have .
[TABLE]
This shows that the norms and are dual. ∎
For a vector , is its total variation. Given a signal and an , a solution to the problem minimizes the total variation under the constraint . The function is typical a piecewise constant function. Below is an example, where the blue function is and the red function is .
As we increase the value of , the sparsity decreases. Below we draw the signal in blue, and (a vertical translation of) the denoised signal for various values of in red.
9.2. Description of the unit ball
We now will describe the unit balls and .
Definition 9.3**.**
A vector is called a signature sequence if and . For a signature sequence , we define as the set of all vectors such that when , and if . We define .
Lemma 9.4**.**
The set is a face of the unit ball .
Proof.
The set is a face of the unit ball for the norm. Suppose that with and . Then we can write with for . We have
[TABLE]
for some . We get that and differ by a multiple of . Since the maximum and minimum entry of are and respectively, and we deduce that . Now lie in the unit ball and lies in the face . It follows that and . ∎
The dimension of is equal to the number of [math]’s in the signature sequence . The restriction of to is injective, so and have the same dimension.
Proposition 9.5**.**
The faces of of dimension are exactly all where is a signature sequence.
Proof.
Suppose that is a proper face of the polytope . Then there exists a vector with and . Let and for . Since , and , the vector must have positive and negative coordinates. In the sequence the elements and both must appear. For a vector with , let be the unique vector with and . Then we have
[TABLE]
with . Now if and only if for every , or . In other words, if and only if . So if and only if . ∎
Theorem 9.6**.**
The norms and are tight.
Proof.
Suppose that is a signature vector. It suffices to construct a unitangent vector by Proposition 6.12. We define a vector as follows. If then . The other coordinates of are obtained by linear interpolation. If , and then we define
[TABLE]
whenever . If and then we define for . If and then we define for . For example, if then .
From the construction follows that for every we have . We define . We have . From the construction it is clear that and . We have
[TABLE]
so is unitangent. ∎
We briefly discuss the combinatorics of the polytope . Let be the number of faces of of dimension . For , is the number of signature sequences with zeroes. We also have . The generating function for is . The generating function for the set is . The generating function for and for is . The generating function for is . So the generating function for the set of signature sequences, using inclusion-exclusion, is . There is one face of dimension that does not correspond to a signature sequences, so we have
[TABLE]
In particular, is the number of vertices of the ball , and is the number of facets of , which is the number of vertices of . The total number of faces of (and ) is .
Example 9.7**.**
Let . We have the following signature sequences and corresponding unitangent vectors (written as row vectors):
[TABLE]
Every vertex of the unit ball is unitangent. These vectors are marked with and they correspond to signature sequences that have no zeroes. Below is a 2-dimensional projection of the unit ball .
The unit ball is dual to the polytope and is shown below:
An example of an -slope decomposition is:
[TABLE]
If , then we have
[TABLE]
9.3. The taut string method
The restriction of to gives an isomorphism between and . Now we can view the bilinear form as a bilinear form on and for we have
[TABLE]
In particular, we have
[TABLE]
We can also view and as norms on and for we have
[TABLE]
and
[TABLE]
For , we consider the following optimization problem:
: Find a vector with such that is minimal.
We call this the Taut String problem. This problem, and some generalizations to higher order, were studied in [39]. If the vectors are discretized functions, Then the graph lies between and . Now is a discrete version of the second derivative. The value is a measure of how much the graph bends at vertex . So is the total amount of bending and we try to minimize this. Visually we can see as a string between and and we pull on both ends so that the string is taut. The Taut String Algorithm described in [18] computes in time (see also [11]). The function is piecewise linear and is piecewise constant. Total Variation Denoising of time signals has applications in statistics to estimate a density function from a collection of measurements (see [3]). It was also used in [5, 40] for analysing heart rate variabilty signals to predict hemodynamic decompensation. In the graph below, we have drawn and for a function , as well as the solution to the Taut String problem . It appears as a tight string that is in between the graphs of and and is a piecewise linear approximation of .
As the value of increases, the sparsity decreases:
Let be the projection onto defined by
[TABLE]
Lemma 9.8**.**
Suppose that and minimizes under the constraint . Then is a solution to and is an -decomposition, where .
Proof.
We have . Suppose that with and . Then there is a vector with and . If we define , then and we have
[TABLE]
This shows that is a solution to . ∎
9.4. An ECG example
For a Hz noisy electrocardiogram (ECG) signal of 4 seconds long, we graph the Pareto frontier (and sub-frontier) of :
(If , then we can remove the baseline by replacing with .) The graph is -shaped, where the vertical leg corresponds to the sparse signal, and the horizontal leg corresponds to noise. The vertical leg starts near the point . This means that there exists a decomposition with and . The signal is the denoised signal. The singular value region for is shown below:
The -axis is cut of here in order to better visualize the graph. The graph approaches the -axes slowly and meets the -axis near . The horizontal leg corresponds to noise. To estimate the noise level, we look at where the horizontal leg starts. To find a cutoff for the singular values one proceeds as in principal component analysis. A reasonable cutoff is again . The value 8 is an estimation of the maximum amplitude of noise, which is more than the standard deviation (which is closer to in this case). Below we graph the noisy signal, and the denoised signals for . The denoised signal for is colored green.
9.5. higher order total variation denoising
Suppose that is a function. One can also denoise by using a higher derivative to regularize the norm by minimizing
[TABLE]
is small. For we just get 1D Total Variation Denoising. We consider a discrete version.
Define as the composition . We can view as the -th discrete derivative. For example, for we have
[TABLE]
Define a semi-norm on by . Restricting the norm to
[TABLE]
gives a norm, and let be the dual to this norm.
Problem 9.9**.**
Given , minimize .
For , this problem is called -trend filtering. An overview of -trend filtering is given in [33]. Some applications are in financial time series ([53]), macroeconomics ([52]) automatic control ([41]), oceanography ([51]) and geophysics ( [34]). In -trend filtering, the function is piecewise linear. More generally, for , the -th derivative of will be piecewise constant. This case has been studied in [39]. For , the norms and are not tight.
10. The ISTA algorithm for -decompositions
A general formulation of the Iterative Shrinkage-Tresholding Algorithm (ISTA) was given in [17]. A map is called non-expansive if it is Lipschitz with Lipschitz constant 1, i.e., for all .
Lemma 10.1**.**
If is a norm on , then the functions and are non-expansive.
Proof.
Suppose that , and let and . For we have . By definition of we have
[TABLE]
Squaring both sides yields
[TABLE]
If we take the limit , we get
[TABLE]
By symmetry, we also get
[TABLE]
Adding both equations, yields
[TABLE]
It follows that
[TABLE]
This shows that and are nonexpansive.
∎
Suppose that . Let be a surjective linear map and suppose that is a norm on . We define a norm on by
[TABLE]
The dual norm of is defined by
[TABLE]
where is the dual norm to . Assume that we can easily compute the norms , and the projection function . We will also assume that the singular values of all lie in . We have the following algorithm for computing for and . There is one more parameter, , which specifies the accuracy of the output. We assume and the closer is to , the more accurate the output will be.
1:function ()
2:
3: while do
4:
5: end while
6: return
7:end function
11. Basis Pursuit Denoising, LASSO and the Dantzig selector
In Basis Pursuit (BP) one tries to solve the equation where is a given matrix and is a sparse vector (few nonzero entries). We will assume that has rank and that the system is underdetermined, (). It was shown in [13] that often can be found by minimizing under the constraint . This can be done efficiently using linear programming. We define a norm on by
[TABLE]
Note that evaluating the norm of some vector is a Basis Pursuit problem. We also have
[TABLE]
In the presence of noise, one minimizes under the constraint . This is called Basis Pursuit Denoising (BPDN), see [9, 16]. If we set , then we minimize under the the constraint . If we set , then we minimize under the constraint . This is the optimization problem .
Sometimes BPDN is formulated as the problem of minimizing -regularized function
[TABLE]
This is the same problem as minimizing
[TABLE]
The equivalence between the two formulations of BPDN is well-known (see also Proposition 4.6).
The LASSO problem asks to minimize under the constraint . This is equivalent to minimizing under the constraint . This is the optimization problem .
The dual norm of is defined by
[TABLE]
Since the -decompositions and the decompositions are the same, we have two more approaches for finding the -decompositions (dual LASSO/BPDN):
- (1)
: Under the constraint we minimize ([43, 6]). 2. (2)
: Under the constraint , we minimize .
To find an -decomposition we can minimize under the constraint (). If we set then this is equivalent to minimizing under the constraint . This optimization problem is called the Dantzig selector ([14]).
The Dantzig selector does not always have the same solution as LASSO (or the other equivalent problems). Some conditions were given in [1] when the Dantzig selector and LASSO have the same solutions. If are the columns of , then the unit ball is the convex hull of . Theorem 6.12 gives a necessary and sufficient condition for this polytope so that Dantig selector and LASSO have the same solutions for every .
12. Total Variation Denoising in Imaging
12.1. The 2D total variation norm
We can view a grayscale image as a function . In the anisotropic Rudin-Osher-Fatemi ([46]) total variation model, we seek a decomposition such that
[TABLE]
is small. (In the isotropic model we replace by .)
A discrete formulation of the model is as follows. We can view a grayscale image of pixels as a matrix . We define a map by
[TABLE]
where and for all . We define a total variation semi-norm by
[TABLE]
The restriction of to the set
[TABLE]
is a norm. We can always normalize an image by subtracting the average value to obtain an element of . Let be the dual of . We have
[TABLE]
with the conventions that for all .
The dual norm of is
[TABLE]
Total Variation Denoising is usually formulated as follows
Problem 12.1**.**
Minimize
[TABLE]
is minimal.
Since the paper of Rudin, Osher and Fatemi ([46]), several other algorithms have been proposed, for example Chambolle’s algorithm ([10]), the split Bregman method ([24]) and the efficient primal-dual hybrid gradient algorithm ([54]).
A region in
12.2. Spareseness and total variation
We will study the notion of -sparsity in this context. Suppose that is an image. An -region is a maximal connected subset of on which is constant.
Proposition 12.2**.**
For an image we have where is the number of connected regions on which is constant.
Proof.
Let be the smallest facial -cone containing . By Lemma 6.6, lies in if and only if for some . So lies in if and only if the following properties are satisfied for all :
- (1)
implies ; 2. (2)
implies ; 3. (3)
implies ; 4. (4)
implies .
Taking the contrapositive in each statement (and changing the indexing), we see that for all we have:
- (1)
implies ; 2. (2)
implies ; 3. (3)
implies ; 4. (4)
implies .
It is clear that for all , we have that is constant on the connected regions on which is constant. The function can have arbitrary values on these connected regions as long as the 4 inequalities above and the linear constraint are satisfied. It follows that . ∎
12.3. The total variation norm is not tight
Below we denoised the image of the letter C using ROF total variation denoising. The original image has only 2 colors, black and white, and has geometric -sparsity 1. In the denoised images, there are various shades of grey, and the geometric -sparsity is more than 1. So ROF denoising may increase the geometric sparsity, so the norms and are not tight.
13. Tensor Decompositions
13.1. CP decompositions
One of the motivations for writing this paper is the study of tensor decompositions. Suppose that is the field or , and that are finite dimensional -vector spaces. Define
[TABLE]
With a tensor we will mean an element of . Elements of can be thought of as multi-way arrays of size where . A simple tensor (also called rank one tensor or simple tensor) is a tensor of the form
[TABLE]
where for all . Not every tensor is simple, but every tensor can be written as a sum of simple tensors.
Problem 13.1** (Tensor Decomposition).**
Given a tensor , find a decomposition where are simple tensors and is minimal.
Hitchcock defined in [31] the rank of the tensor as the smallest for which such a decomposition exists and this minimal rank decomposition is called the canonical polyadic decomposition. Problem 13.1 is also known as the PARAFAC ([29]) or CANDECOMP ([8]) model. Finding the rank of a tensor is an NP-hard problem. Over this was shown in [30] and in our case, or , this was proved in [32].
13.2. The CoDe model and the nuclear norm
Even in relatively small dimensions, there are examples of tensors for which the rank is unknown. Using the heuristic of convex relaxation, we consider the following problem:
Problem 13.2** (CoDe model).**
Given a tensor , find a decomposition where are simple tensors and is minimal.
The nuclear norm for tensors was explicitly given [37, 38], but the ideas go back to [25] and [47] :
Definition 13.3**.**
The nuclear norm of the tensor as the smallest possible value of such that are simple tensors and .
A matrix can be viewed as a 2-way tensor and in this case the nuclear norm for tensors coincides with the nuclear norm of the matrix, which is defined as
[TABLE]
where is the complex conjugate transpose of , is the unique nonnegative definite Hermitian matrix whose square is and are the singular values of . Although finding the nuclear norm of a higher order tensors is also NP-complete (see [23]), it is often easier than determining its rank. In [19] some examples of tensors are given for which the nuclear norm and the optimal decomposition can be computed, but where the rank of the tensors are unknown.
Let be the nuclear norm. The dual norm, , is equal to the spectral norm:
Definition 13.4**.**
The spectral norm of a tensor is defined by
[TABLE]
Finding the spectral norm of a higher-order tensor is also an NP-complete problem (see [32]).
From now on we consider the case , where is the tensor product (over ) of several finite dimensional Hilbert spaces. We have a positive definite Hermitian inner product on . A real inner product is given by .
13.3. Examples of unitangent tensors
The following examples come from [19]:
Example 13.5**.**
The space of complex matrices has the usual basis where and . Define and define the tensor
[TABLE]
This tensor is related to matrix multiplication. It is known that if , then two matrices can be multiplied using arithmetic operations in . For most , the rank of is unknown. For example, the best known lower bound for is 19, and follows from [7]. The best known upper bound is 23 and comes from [35]. It was shown in [19] that and . Now (8) is a convex decomposition, because
[TABLE]
Also, we have
[TABLE]
so is unitangent. This means that
[TABLE]
is an -decomposition, a -decomposition and an -decomposition (as well as , and ) if . If we take , then we have . It follows that we get
[TABLE]
Similarly, we get inequalities
[TABLE]
and
[TABLE]
Example 13.6**.**
Let be the set of permutation of , and for a permutation denote its sign by . The determinant tensor is defined by
[TABLE]
It was shown in [19] that , and . In particular, is unitangent. Let
[TABLE]
be the permanent tensor. In [19] it was calculated that , and . The tensor is also unitangent.
13.4. The diagonal SVD and the slope decomposition
Following [19], we make the following definitions.
Definition 13.7**.**
Suppose that are simple tensors with for all . For a real number we say that are -orthogonal if
[TABLE]
for every simple tensors with .
Note that -orthogonality implies orthogonality because we can take so that
[TABLE]
implies that is orthogonal to all with . By Pythagoras’ theorem, orthogonality is equivalent to -orthogonality.
Definition 13.8**.**
The expression
[TABLE]
is called a Diagonal Singular Value Decomposition (DSVD) if are -orthogonal simple tensors of length and .
If (9) is a DSVD, then we have
[TABLE]
Theorem 13.9**.**
Suppose that a tensor has a DSVD with singular values and multiplicities respectively. Then we can write
[TABLE]
such that
[TABLE]
for all and
[TABLE]
is a sequence of -orthogonal simple unit tensors. Then the slope decomposition of is given by
[TABLE]
where
[TABLE]
Proof.
We have
[TABLE]
so we have
[TABLE]
For we have
[TABLE]
So we also have
[TABLE]
This proves that is the slope decomposition.
∎
It was shown in [19] that the tensor has a diagonal singular value decomposition, but and do not for .
13.5. Group algebra tensors
Suppose that is a finite group of order . Suppose that there are irreducible representations of dimension . Then we have . Let , be an orthonormal basis of and consider the tensor
[TABLE]
which is related to the multiplication in the group algebra. Then has singular value with multiplicity for all . We have
[TABLE]
In the figure we drew the Pareto sub-frontier of for all groups of order . The blue graph represents the abelian groups and with only -dimensional representations, the red graph represents the dihedral group and and the semi-direct product with representations of dimension , and the yellow graph represents the alternating group with representations of dimension .
13.6. symmetric tensors in
For the remainder of the section, let us consider the tensor product space . In particular, for , we will study the symmetric tensor
[TABLE]
Proposition 13.10**.**
- (1)
We have
[TABLE] 2. (2)
We have
[TABLE]
Proof.
(1) By the definition of the spectral norm, we have
[TABLE]
By Banach’s Theorem (see [2, 22]), we may take . If we write with , then we have
[TABLE]
Let for . We get . If then has no roots in and
[TABLE]
If , then the roots of are
[TABLE]
We have
[TABLE]
(2) Note that
[TABLE]
This implies that
[TABLE]
If , then let . We get
[TABLE]
So it follows that
[TABLE]
so we must have equality.
For we get . For we get
[TABLE]
We have
[TABLE]
so this shows that .
∎
Corollary 13.11**.**
For
[TABLE]
is a -decomposition, and for
[TABLE]
is a -decomposition. A parametrization of the subpareto curve is given by
[TABLE]
if and
[TABLE]
if .
Let us plot the Pareto sub-frontier:
The blue part of the graph is linear, but the red part is non-linear. The graph is not piecewise linear, so does not have a slope decomposition. This shows that the nuclear norm and the spectral norm on are not tight.
Below we have plotted the singular value region. The singular value appears with multiplicity , the singular value appears with multiplicity . The singular values between and appear with infinitesemal multiplicities. The height of the region is the spectral norm , the area of the region is the nuclear norm , and if we integrate over the region we get the square of the euclidean norm which is .
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] M. S. Asif and J. Romberg, On the LASSO and Dantzig selector equivalence , Conference on Information Sciences and Systems (CISS), Princeton, NJ, 2010.
- 2[2] S. Banach, Über homogene Polynome in ( L 2 superscript 𝐿 2 L^{2} ) , Studia Math. 7 (1938), 36–44.
- 3[3] R. Barlow, D. Bartholomew, J. Bremner and H. Brunk, Statistical Inference under Order Restrictions , Wiley, New York, 1972.
- 4[4] A. Barvinok, A Course in Convexity , Graduate Studies in Mathematics 54 , American Mathematical Socieity, 2002.
- 5[5] A. Belle, S. Asgari, M. Spadafore, V. A. Convertino, K. R. Ward, H. Derksen and K. Najarian, A signal processing approach for detection of hemodynamic instability before decompensation , P Lo S ONE 11 (2):e 0148544 (2016).
- 6[6] E. van den Berg and M. P. Friedlander, Probing the Pareto frontier for basic pursuit solutions , SIAM J. Sci. Comput 21 (2008), no. 2, 890–912.
- 7[7] M. Bläser, On the complexity of the multiplication of matrices of small formats , J. of Complexity 19 (2003), no. 1, 43–60.
- 8[8] J. D. Carroll, J. Chang, Analysis of individual differences in multidimensional scaling via an N 𝑁 N -way generalization of an Eckart-Young decomposition , Psychometrika 35 (1970), 283.
