A linear programming approach to sparse linear regression with quantized   data

Vito Cerone; Sophie M. Fosson; Diego Regruto

arXiv:1903.07156·math.OC·March 22, 2019·ACC

A linear programming approach to sparse linear regression with quantized data

Vito Cerone, Sophie M. Fosson, Diego Regruto

PDF

TL;DR

This paper introduces a linear programming method for sparse linear regression with quantized data, providing theoretical robustness guarantees and demonstrating improved performance over existing methods.

Contribution

It presents a novel linear programming approach specifically designed for sparse regression with low-precision data, addressing non-convexity issues.

Findings

01

Proves robustness guarantees for the proposed method.

02

Shows improved numerical performance compared to state-of-the-art techniques.

03

Effectively handles quantized and low-precision data in sparse regression.

Abstract

The sparse linear regression problem is difficult to handle with usual sparse optimization models when both predictors and measurements are either quantized or represented in low-precision, due to non-convexity. In this paper, we provide a novel linear programming approach, which is effective to tackle this problem. In particular, we prove theoretical guarantees of robustness, and we present numerical results that show improved performance with respect to the state-of-the-art methods.

Figures10

Click any figure to enlarge with its caption.

Equations40

x \in R^{n} min ∥ x ∥_{1} \leavevmode \leavevmode \leavevmode s. t. y = A x Q (y) = y + δ_{y} Q (A) = A + δ_{A} ∥ δ_{y} ∥_{\infty} \leq Δ_{y} ∥ δ_{A} ∥_{\infty} \leq Δ_{A} .

x \in R^{n} min ∥ x ∥_{1} \leavevmode \leavevmode \leavevmode s. t. y = A x Q (y) = y + δ_{y} Q (A) = A + δ_{A} ∥ δ_{y} ∥_{\infty} \leq Δ_{y} ∥ δ_{A} ∥_{\infty} \leq Δ_{A} .

\begin{split}\min_{x\in\mathbb{R}^{n}}\|x\|_{1}\leavevmode\nobreak\ &\text{ s. t. }Cx\preceq c\\ &\text{ where }\\ &C=\left(\begin{array}[]{c}\mathcal{Q}(A)-\Delta_{A}\mathbf{1}_{m}\mathbf{1}_{n}^{T}\\ -\mathcal{Q}(A)-\Delta_{A}\mathbf{1}_{m}\mathbf{1}_{n}^{T}\\ \end{array}\right)\in\mathbb{R}^{2m,n}\\ &c=\left(\begin{array}[]{c}\mathcal{Q}(y)+\Delta_{y}\mathbf{1}_{m}\\ -\mathcal{Q}(y)+\Delta_{y}\mathbf{1}_{m}\\ \end{array}\right)\in\mathbb{R}^{2m}.\\ \end{split}

\begin{split}\min_{x\in\mathbb{R}^{n}}\|x\|_{1}\leavevmode\nobreak\ &\text{ s. t. }Cx\preceq c\\ &\text{ where }\\ &C=\left(\begin{array}[]{c}\mathcal{Q}(A)-\Delta_{A}\mathbf{1}_{m}\mathbf{1}_{n}^{T}\\ -\mathcal{Q}(A)-\Delta_{A}\mathbf{1}_{m}\mathbf{1}_{n}^{T}\\ \end{array}\right)\in\mathbb{R}^{2m,n}\\ &c=\left(\begin{array}[]{c}\mathcal{Q}(y)+\Delta_{y}\mathbf{1}_{m}\\ -\mathcal{Q}(y)+\Delta_{y}\mathbf{1}_{m}\\ \end{array}\right)\in\mathbb{R}^{2m}.\\ \end{split}

x \in R^{n} min ∥ x ∥_{1} \leavevmode s. t. ∥ Q (y) - (Q (A) - δ_{A}) x ∥_{\infty} \leq Δ_{y} ∥ δ_{A} ∥_{\infty} \leq Δ_{A} .

x \in R^{n} min ∥ x ∥_{1} \leavevmode s. t. ∥ Q (y) - (Q (A) - δ_{A}) x ∥_{\infty} \leq Δ_{y} ∥ δ_{A} ∥_{\infty} \leq Δ_{A} .

μ := j \neq = h max ∣ Q (A)_{j}^{T} Q (A)_{h} ∣ \leavevmode \leavevmode \leavevmode \leavevmode j, h \in {1, \dots, n}

μ := j \neq = h max ∣ Q (A)_{j}^{T} Q (A)_{h} ∣ \leavevmode \leavevmode \leavevmode \leavevmode j, h \in {1, \dots, n}

k \leq \frac{1}{2} \frac{2 - ρ ^{2} + μ}{μ + Δ _{A}^{2} + 2 Δ _{A} ρ + ( ρ m + Δ _{A} m ) 2 Δ _{y} / T} .

k \leq \frac{1}{2} \frac{2 - ρ ^{2} + μ}{μ + Δ _{A}^{2} + 2 Δ _{A} ρ + ( ρ m + Δ _{A} m ) 2 Δ _{y} / T} .

∥ x^{⋆} - α ∥_{1} < T .

∥ x^{⋆} - α ∥_{1} < T .

D := {x \in R^{n} : ∥ Q (y) - A x ∥_{\infty} \leq Δ_{y}}

D := {x \in R^{n} : ∥ Q (y) - A x ∥_{\infty} \leq Δ_{y}}

β min ∥ β ∥_{1} - ∥ α ∥_{1} s. t. β \in G, \leavevmode α \in D

β min ∥ β ∥_{1} - ∥ α ∥_{1} s. t. β \in G, \leavevmode α \in D

∥ β ∥_{1} - ∥ α ∥_{1} \geq ∥ w ∥_{1} - 2 h \in S \sum ∣ w_{h} ∣

∥ β ∥_{1} - ∥ α ∥_{1} \geq ∥ w ∥_{1} - 2 h \in S \sum ∣ w_{h} ∣

∥ A w ∥_{\infty} \leq 2 Δ_{y}

∥ A w ∥_{\infty} \leq 2 Δ_{y}

∣ A_{j}^{T} A w ∣ \leq ∥ A_{j} ∥_{2} ∥ A w ∥_{2} \leq (ρ + Δ_{A} m) (2 Δ_{y} m) .

∣ A_{j}^{T} A w ∣ \leq ∥ A_{j} ∥_{2} ∥ A w ∥_{2} \leq (ρ + Δ_{A} m) (2 Δ_{y} m) .

∣ w ∣ = ∣ (A^{T} A - A^{T} A + I_{n}) w ∣ ⪯ ∣ A^{T} A w ∣ + ∣ A^{T} A - I_{n} ∣∣ w ∣.

∣ w ∣ = ∣ (A^{T} A - A^{T} A + I_{n}) w ∣ ⪯ ∣ A^{T} A w ∣ + ∣ A^{T} A - I_{n} ∣∣ w ∣.

A_{i}^{T} A_{j} = (Q (A)_{i}^{T} + δ_{A, i}^{T}) (Q (A)_{j} + δ_{A, j}) \leq μ + Δ_{A}^{2} + 2 Δ_{A} ρ .

A_{i}^{T} A_{j} = (Q (A)_{i}^{T} + δ_{A, i}^{T}) (Q (A)_{j} + δ_{A, j}) \leq μ + Δ_{A}^{2} + 2 Δ_{A} ρ .

A_{i}^{T} A_{i} \leq ρ^{2} + Δ_{A}^{2} + 2 Δ_{A} ρ .

A_{i}^{T} A_{i} \leq ρ^{2} + Δ_{A}^{2} + 2 Δ_{A} ρ .

∣ A^{T} A - I_{n} ∣ ⪯ (μ + Δ_{A}^{2} + 2 Δ_{A} ρ) (1_{n, n} - I_{n}) \leavevmode \leavevmode + (ρ^{2} + Δ_{A}^{2} + 2 Δ_{A} ρ) I_{n} - I_{n} = (μ + Δ_{A}^{2} + 2 Δ_{A} ρ) 1_{n, n} + (ρ^{2} - μ - 1) I_{n}

∣ A^{T} A - I_{n} ∣ ⪯ (μ + Δ_{A}^{2} + 2 Δ_{A} ρ) (1_{n, n} - I_{n}) \leavevmode \leavevmode + (ρ^{2} + Δ_{A}^{2} + 2 Δ_{A} ρ) I_{n} - I_{n} = (μ + Δ_{A}^{2} + 2 Δ_{A} ρ) 1_{n, n} + (ρ^{2} - μ - 1) I_{n}

∣ w ∣ ⪯ (ρ + Δ_{A} m) (2 Δ_{y} m) 1 + (μ + Δ_{A}^{2} + 2 Δ_{A} ρ) 1_{n, n} ∣ w ∣ + (ρ^{2} - μ - 1) ∣ w ∣.

∣ w ∣ ⪯ (ρ + Δ_{A} m) (2 Δ_{y} m) 1 + (μ + Δ_{A}^{2} + 2 Δ_{A} ρ) 1_{n, n} ∣ w ∣ + (ρ^{2} - μ - 1) ∣ w ∣.

∣ w ∣ ⪯ \frac{( ρ m + Δ _{A} m ) 2 Δ _{y}}{2 - ρ ^{2} + μ} 1 + \frac{μ + Δ _{A}^{2} + 2 Δ _{A} ρ}{2 - ρ ^{2} + μ} 1_{n, n} ∣ w ∣.

∣ w ∣ ⪯ \frac{( ρ m + Δ _{A} m ) 2 Δ _{y}}{2 - ρ ^{2} + μ} 1 + \frac{μ + Δ _{A}^{2} + 2 Δ _{A} ρ}{2 - ρ ^{2} + μ} 1_{n, n} ∣ w ∣.

(I_{n} - \frac{μ + Δ _{A}^{2} + 2 Δ _{A} ρ}{2 - ρ ^{2} + μ} 1_{n, n}) ∣ w ∣ ⪯ \frac{( ρ m + Δ _{A} m ) 2 Δ _{y}}{2 - ρ ^{2} + μ} 1_{n} .

(I_{n} - \frac{μ + Δ _{A}^{2} + 2 Δ _{A} ρ}{2 - ρ ^{2} + μ} 1_{n, n}) ∣ w ∣ ⪯ \frac{( ρ m + Δ _{A} m ) 2 Δ _{y}}{2 - ρ ^{2} + μ} 1_{n} .

v \in R^{n} min (1_{n} - 2 1_{n}^{S})^{T} v s. t. (I_{n} - \frac{μ + Δ _{A}^{2} + 2 Δ _{A} ρ}{2 - ρ ^{2} + μ} 1_{n, n}) v ⪯ \frac{( ρ m + Δ _{A} m ) 2 Δ _{y}}{2 - ρ ^{2} + μ} 1_{n} 1_{n}^{T} v \geq T, \leavevmode v ⪰ 0

v \in R^{n} min (1_{n} - 2 1_{n}^{S})^{T} v s. t. (I_{n} - \frac{μ + Δ _{A}^{2} + 2 Δ _{A} ρ}{2 - ρ ^{2} + μ} 1_{n, n}) v ⪯ \frac{( ρ m + Δ _{A} m ) 2 Δ _{y}}{2 - ρ ^{2} + μ} 1_{n} 1_{n}^{T} v \geq T, \leavevmode v ⪰ 0

u \in R^{n} max - \frac{( ρ m + Δ _{A} m ) 2 Δ _{y}}{2 - ρ ^{2} + μ} 1_{n}^{T} u + T u_{0} \leavevmode \leavevmode s. t 1_{n} u_{0} - (I_{n} - \frac{μ + Δ _{A}^{2} + 2 Δ _{A} ρ}{2 - ρ ^{2} + μ} 1_{n, n}) u ⪯ 1_{n} - 2 1_{n}^{S} u ⪰ 0, u_{0} \geq 0

u \in R^{n} max - \frac{( ρ m + Δ _{A} m ) 2 Δ _{y}}{2 - ρ ^{2} + μ} 1_{n}^{T} u + T u_{0} \leavevmode \leavevmode s. t 1_{n} u_{0} - (I_{n} - \frac{μ + Δ _{A}^{2} + 2 Δ _{A} ρ}{2 - ρ ^{2} + μ} 1_{n, n}) u ⪯ 1_{n} - 2 1_{n}^{S} u ⪰ 0, u_{0} \geq 0

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

A linear programming approach to

sparse linear regression with quantized data

V. Cerone, S. M. Fosson*∗*, D. Regruto ∗ Corresponding author. The authors are with the Dipartimento di Automatica e Informatica, Politecnico di Torino, corso Duca degli Abruzzi 24, 10129 Torino, Italy; e-mail: [email protected], [email protected], [email protected];

Abstract

The sparse linear regression problem is difficult to handle with usual sparse optimization models when both predictors and measurements are either quantized or represented in low-precision, due to non-convexity. In this paper, we provide a novel linear programming approach, which is effective to tackle this problem. In particular, we prove theoretical guarantees of robustness, and we present numerical results that show improved performance with respect to the state-of-the-art methods.

I Introduction

Sparse optimization refers to those optimization problems where the solution is encouraged to be sparse, i.e., to have few non-zero components. This research area has dramatically increased in the last decades in many different fields including signal processing, machine learning, and system identification. In signal processing, the widespread presence of signals that admit sparse representations has promoted the research on new sparse optimization problems and algorithms, see, e.g., [1, 2]. In machine learning and system identification, sparsity is desirable to reduce as much as possible the complexity of the estimated models. In the literature, sparsity is exploited for the identification of linear systems, see, e.g., [3, 4, 5]; non-linear functions in [6]; polynomial models in [7]; time-varying systems in [8, 9].

A popular paradigm in sparse optimization is given by the case where the collected measurements $y$ are linearly related, through a predictor matrix $A$ , to the sparse signal or vector of parameters $x$ to be estimated. This paradigm leads to the problem of computing the sparsest solution of the linear system of equation $Ax=y$ , $x\in\mathbb{R}^{n}$ , $y\in\mathbb{R}^{m}$ , $A\in\mathbb{R}^{m,n}$ . In the literature, this problem is generically referred to as sparse linear regression. The undetermined case, $m<n$ , has attracted the attention of many researchers, whose work originated the theory of compressed sensing (CS) [1, 2]. Finding the sparsest solution of an undetermined linear system is an NP-hard problem. However, it is well known that sparsity can also be achieved by minimizing the $\ell_{1}$ -norm of $x,$ under suitable constraints accounting for the linear structure of the problem, which makes the problem convex. Specifically, the minimization of the $\ell_{1}$ -norm of $x$ subject to $Ax=y$ is known as Basis Pursuit [2, Chapter 4]. When a measurement noise is present, the constraint is generally formulated as $\|Ax-y\|_{p}\leq\epsilon$ , where $\|\cdot\|_{p}$ is a suitable norm and $\epsilon>0$ is a known bound; this is referred to as Basis Pursuit Denoising (BPDNp).

In the literature, $p=2$ and $p=\infty$ are the most common choices. In particular, BPDN2 is very popular, as it is suitable to cope with Gaussian noise, which is the typical model in a number of applications, such as transmission systems. The choice $p=2$ provides solutions that are more tolerant to possible outliers, since it bounds the mean energy of the error.

The case $p=\infty$ was first analyzed in [10], where results on robustness to noise are proven based on the coherence properties of $A$ , and has been recently retrieved to deal with quantized or low-precision measurements in the CS setting. When $y$ is quantized, in fact, there is a bounded error on each component $y_{i}$ , $i=1,\dots,m$ , which makes the $\ell_{\infty}$ -norm description more suitable than the $\ell_{2}$ one, as illustrated in [11, 12]. In particular, the $\ell_{\infty}$ -norm supports the consistency principle: the measurements obtained from the recovered signal lie in the same quantization intervals of the observed measurements, as considered in [13, 14, 11, 12].

As to CS, the study of quantization is strongly motivated from the practical point of view. As a matter of fact, the CS paradigm moves the computational burden from the acquisition-compression phase (which simply consists in computing $y=Ax$ ) to the recovery phase. This feature is successfully exploited in systems where signals’ acquisition is performed by remote devices with reduced computational capability, e.g., either space probes or environmental sensors, while recovery is performed in powerful computational centers. However, for transmission purposes, measurements are often not only compressed, but also quantized. We refer the interested reader to [15, 16] for a complete overview on CS with quantized measurements.

In many applications, also $A$ is generated by the remote device and has to be transmitted. Therefore, it is more realistic to assume that also $A$ undergoes quantization. It is worth noticing that, in some cases, $A$ can be designed on purpose, then it may be quantized in its original form, which prevents the addition of a quantization error. For example, Bernoulli matrices (whose entries are binary) are suitable for CS. However, in most of the applications, $A$ is not arbitrarily chosen, instead it depends either on the hardware of the remote device or on the physical nature of the problem.

The quantization of $A$ plays the role of an undesired perturbation in the recovery phase. Inaccuracies in $A$ are a serious drawback because they lead to an uncertain sparse linear regression problem which is not convex anymore. The problem of perturbed $A$ in sparse optimization and CS is addressed in [17, 18, 19].

The goal of this paper is to tackle the problem of sparse linear regression when both $A$ and $y$ are quantized, with main focus on the CS setting. To the best of our knowledge, this joint problem has been considered only in [20], while, as already mentioned, the two single problems of quantized $y$ and perturbed $A$ have already been studied. In [20], the normalized iterative hard thresholding algorithm [21] is adapted to tackle the problem of quantized $A$ and $y$ . The algorithm is tested in the case of a specific stochastic quantization function, with possible applications in the radio astronomy framework. In this setting, a robustness result is proved, which shows that the mean recovery error is controlled by the quantization error.

In this paper, we propose a novel approach to this problem, which can be applied in the presence of any quantization function with bounded error. We extend the formulation of BPDN*∞* and the results in [10] to the case where also $A$ is quantized. First we recast the considered problem into the framework of the static error-in-variables estimation considered in [22]. Then, by exploiting results from [22], we show that the solution can be computed by solving a suitable number of linear programming (LP) problems. The paper is organized as follows. In Section II, we formally state the problem and specify the considered assumptions. In Section III, we introduce the novel LP formulation. In Section IV, we prove theoretical results on the robustness of the proposed approach. In Section V, we show some numerical simulations that support the efficiency of the proposed method with respect to the state-of-the-art. Finally, some conclusions are drawn in Section VI.

II Problem statement

Let us consider a device performing compressed data acquisition according to the equation $Ax=y$ , $A\in\mathbb{R}^{m,n}$ , where $m<n$ . The device is assumed to transmit quantized/low-precision versions of $A$ and $y$ to a recovery center. We denote by $\mathcal{Q}(A)$ and $\mathcal{Q}(y)$ the quantized versions of $A$ and $y$ , respectively. The problem considered in this work is to recover a sparse $x$ such that $Ax=y$ , given $\mathcal{Q}(A)$ and $\mathcal{Q}(y)$ . The quantization strategy is supposed to be unknown, though a bound on the maximum quantization error is given. Apart from the quantization, for simplicity, no other sources of uncertainty/noise are considered here, although such extension is under investigation. More precisely, we formulate the following optimization problem.

Problem 1

Given $\mathcal{Q}(y)$ , $\mathcal{Q}(A)$ , $\Delta_{A}>0$ , and $\Delta_{y}>0$ ,

[TABLE]

In the rest of the paper, we assume that the signs of the components of $x$ are known, in the sense that for each $x_{i}$ , $i=1,\dots,n$ , we know if it is either non-negative or non-positive. More specifically, without loss of generality, we assume the non-negativity of $x$ .

Assumption 1

$x_{i}\geq 0$ * for all $i=1,\dots,n$ .

This assumption naturally occurs in a number of applications, such as localization problems [23], image processing [24], and power allocation [25]. However, extensions to more general classes are possible and will be studied in future work. Here, we only observe that if the signs are not known, one can split the problem in $2^{n}$ LP problems, trying all the possible combinations of signs. Therefore, there is a way to compute the global minimum of the general non-convex problem, though computationally not efficient for large $n$ . In future work, we will also analyze possible pre-processing methods to obtain prior information on the signs from available data.

III A linear programming approach

Thanks to the following result, we show that, under Assumption 1, Problem 1 is equivalent to an LP problem.

In the following, given two vectors $a,b\in\mathbb{R}^{n}$ , we write $a\succeq b$ to indicate that $a_{i}\geq b_{i}$ for each $i=1,\dots,n$ . We denote by $I_{n}\in\mathbb{R}^{n,n}$ the identity matrix. Moreover, $\mathbf{1}_{n}:=(1,1,\dots,1)^{T}\in\mathbb{R}^{n}$ , where $T$ is for transpose.

Result 1

Under Assumption 1, Problem 1 can be equivalently formulated as the following LP problem:

[TABLE]

Proof of Result 1 is obtained by first noticing that Problem 1 can be equivalently rewritten in a more compact way as follows:

[TABLE]

Model (2) is obtained by applying the results about bounded errors-in-variables identification of static linear systems presented in [22] to the set of constraints of problem (3), under Assumption 1. Model (2) with Assumption 1 is an LP problem whose solution is straightforward. Moreover, its formulation is compliant with the quantization consistency principle.

Remark 1

*It is worth noting that, by suitably exploiting results in [22], Result 1 can be extended to cover the general case where the signs of $x\in\mathbb{R}^{n}$ are unknown, although in that case the solution is obtained by solving a larger number of LP’s. However, this general case is outside the scope of this conference contribution and will be the subject of a future work.

Remark 2

*We notice that a working hypothesis similar to Assumption 1 is considered in [19, Theorem 6]. In [19], the perturbation on $A$ is assumed to have a specific structure, namely, it is equal to $B\text{diag}(\beta_{0})$ , where $B\in\mathbb{R}^{m,n}$ is known, and $\beta_{0}\in[-r,r]^{n}$ , $r>0$ , is unknown. In other terms, the direction of each column of the perturbation is known, and the dimension of the unknown is reduced from $mn$ to $n$ . The proposed model [19, Equation 11] is a BPDN2 with an additive perturbation bounded in the $\ell_{\infty}$ -norm. This model is biconvex and can be approached with alternating minimization, which only achieves a local minimum. However, in [19, Theorem 6], it is observed that if $x_{i}\geq 0$ , for all $i=1,\dots,n$ , the problem admits convex formulation, which guarantees to get the global minimum.

IV Analysis of robustness

In this section, we show that Model (2) with Assumption 1 is robust, i.e., the distance between its solution and the original signal is bounded by a quantity $T>0$ , which is controlled by $\Delta_{A}$ and $\Delta_{y}$ . In particular, if $\Delta_{A}=0$ , the robustness result in [10] is obtained.

The result that we now prove is based on the mutual coherence of $\mathcal{Q}(A)$ , which is defined as:

[TABLE]

where the index $j$ denotes the $j$ -th column.

From the theory on underdetermined linear systems and CS, it is well known that a sufficiently small coherence guarantees a successful sparse linear regression, up to errors due to noise [26, 27, 10]. The following theorem shows that sparse linear regression from quantized data is successful when coherence is sufficiently small, up to quantization errors.

Theorem 1

Let $\mathcal{Q}(y)=A\alpha+\delta_{y}$ , where the unknown $\alpha\in\mathbb{R}^{n}$ has $k\ll n$ non-zero components. $\mathcal{Q}(A)=A+\delta_{A}$ and $\mathcal{Q}(y)$ are known. Let us assume that: $\|\mathcal{Q}(A)_{j}\|_{2}\leq\rho$ for some $\rho>0$ , for any $j=1,\dots,n$ , $2(\Delta_{A}+\rho)^{2}<2-\mu-\rho^{2}$ , and

[TABLE]

Then, the solution $x^{\star}$ of problem (2) is robust, that is,

[TABLE]

Proof:

Let

[TABLE]

be the feasible set of Model (2). By definition, $\alpha\in\mathcal{D}$ . Let us consider the subset $\mathcal{G}:=\{\beta\in\mathcal{D}:\|\beta-\alpha\|_{1}\geq T\}$ . We then prove that for any $\beta\in\mathcal{G}$ we have $\|\beta\|_{1}\geq\|\alpha\|_{1}$ , which means that there is no solution in $\mathcal{G}$ . This implies that all the solutions are in $\mathcal{D}\setminus\mathcal{G}$ , which proves the thesis. Thus, we study the problem:

[TABLE]

Let $w:=\beta-\alpha.$ As illustrated in [28], it is straightforward to prove that

[TABLE]

where $\mathcal{S}$ is the support of $\alpha$ . Since $\alpha,\beta\in\mathcal{D}$ , we obtain

[TABLE]

from which we also get $\|Aw\|_{2}\leq 2\Delta_{y}\sqrt{m}.$ By assuming $\|\mathcal{Q}(A)_{j}\|_{2}\leq\rho$ for any column $j$ , we have:

[TABLE]

where we use the triangle inequality $\|A_{j}\|_{2}\leq\|\mathcal{Q}(A)_{j}\|_{2}+\|\delta_{A,j}\|_{2}$ . Now, we notice that

[TABLE]

where $|w|=(|w_{1}|\,\dots,|w_{n}|)^{T}$ . From (8), we obtain a bound for $|A^{T}Aw|$ . Furthermore, for the off-diagonal elements of $A^{T}A$ , we have:

[TABLE]

Similarly, for the diagonal elements, we obtain:

[TABLE]

Therefore,

[TABLE]

where $\mathbf{1}_{n,n}:=\mathbf{1}_{n}\mathbf{1}_{n}^{T}$ . Coming back to (9),

[TABLE]

By assuming $2-\rho^{2}+\mu\geq 0$ , we can write

[TABLE]

Finally,

[TABLE]

Let $v=|w|.$ We can now rewrite problem (5), using (6), as follows:

[TABLE]

where $\mathbf{1}_{n}^{\mathcal{S}}$ is the $n$ -dimensional column vector with entries equal to 1 in the positions of the support $S$ of $\alpha$ , and 0 otherwise.

As in [10, equations (20)-(21)], we consider the dual problem:

[TABLE]

Exploiting the zero duality gap between primal and dual in LP problems [29], if (12) has solution that originates a positive penalty, the penalty is positive also for (11), which is our final aim. From this point, the thesis can be obtained following the same procedure used in [10, pages 518-519], since problem (12) is analogous to problem (21) in [10], with different constants. We omit the details for brevity. We just notice that the constraint $2(\Delta_{A}+\rho)^{2}<2-\mu-\rho^{2}$ is necessary to fulfill equations (22)-(23) in [10]. ∎

We remark that if $\Delta_{A}=0$ , Theorem 1 provides the same bound of Theorem 3 in [10] (with $\delta=\epsilon$ ).

It is worth noticing that, in CS, coherence-based analyses [26, 27] have the drawback of providing less tight bounds with respect to other properties, such as the restricted isometry property (RIP) [2]. Nevertheless, RIP is difficult to assess for a specific matrix (in the literature, RIP is proved for some classes of random matrices). Coherence, instead, can be easily computed for any matrix. This work provides results in terms of coherence, while future extensions might envisage other properties.

V Numerical results

We propose the following experiment. A system acquires a sparse vector $x\in\mathbb{R}^{n}$ through a predictor matrix $A\in\mathbb{R}^{m,n}$ . The dimensions are $n=100$ , $m=40$ , $k=10$ . $A$ is generated according to a Gaussian distribution $\mathcal{N}(0,\frac{1}{m})$ ; the support of $x$ is generated uniformly at random, while the non-zero entries uniformly distributed in $[0,r]$ with $r=10$ . The quantization is uniform: fixed a certain number of equidistant quantization levels, each entry of $A$ and of $y=Ax$ is approximated with the closest point in the quantization codebook. We assume a quantization range sufficiently large so that saturation problems are negligible.

We compare the proposed LP method to three state-of-the-art methods: BPDN*∞, BPDN2*, and the normalized iterative hard thresholding (NIHT), as presented in [20].

To design the measurement noise bounds for BPDN*∞* and BPDN2, we propose two different settings.

Setting 1: quantization of $A$ is ignored, while we know $\Delta_{y}$ . Therefore, we impose $\|Ax-y\|_{\infty}\leq\Delta_{y}$ and, as a consequence, $\|Ax-y\|_{2}\leq\sqrt{m}\Delta_{y}$ .
Setting 2: quantization of $A$ is known, though the related error is moved on the measurements. Assuming to know $\Delta_{A}$ , $k$ , and a bound $r>0$ such that $x_{i}<r$ , for any $i=1,\dots,n$ , from $\mathcal{Q}(A)x-\mathcal{Q}(y)=\delta_{A}x-\delta_{y}$ we obtain $\|\mathcal{Q}(A)x-\mathcal{Q}(y)\|_{\infty}\leq\Delta_{A}kr+\Delta_{y}$ and, as a consequence, $\|\mathcal{Q}(A)x-\mathcal{Q}(y)\|_{2}\leq\sqrt{m}(\Delta_{A}kr+\Delta_{y}).$

We notice that NIHT requires the knowledge of $k$ .

In Fig. 1 and Fig. 2, we show the performance with respect to different quantization levels, from 100 to 5000. For simplicity, we consider that same quantization levels for $A$ and for $y$ , while distinguished approximations could be suitably designed. The total range is assumed to be $[-r,r]$ with $r=10$ . which is generally sufficient to avoid problems of saturation for the proposed setting. We show the relative square $\ell_{2}$ error, defined as $\|\widehat{x}-x\|_{2}^{2}/\|x\|_{2}^{2}$ ; the relative square $\ell_{1}$ error, defined as $\|\widehat{x}-x\|_{1}/\|x\|_{1}$ ; the normalized sparsity level of the estimation, the false positive rate, defined as the number of events where $\widehat{x}_{i}\neq 0$ while $x_{i}=0$ , over $n-k$ ; the false negative rate, defined as the number of events where $\widehat{x}_{i}=0$ while $x_{i}\neq 0$ , over $k$ .

In Fig. 1, we show the simulations in Setting 1. In this case, the distance from the desired vector is smaller for BDPNp, $p=2,\infty$ , with respect to our LP: the smaller feasible set forces a closer consistency to data. However, the smaller feasible set limits the sparsity of the obtained solution, which is between $30\%$ and $40\%$ for BDPNp, while the correct one is $10\%$ . The false positive rate is then large. This is in contrast to the wish of producing sparse solutions.

In order to obtain sparser solutions for BDPNp, $p=2,\infty$ , we implement the Setting 2, which is depicted in Fig. 2. By assuming to know $\Delta_{A},k,r$ , we can suitably enlarge the feasible set and obtain sparse solutions. However, in this case BDPNp, $p=2,\infty$ , has a larger number of false negatives, and the recovery accuracy is worse than the proposed LP.

Finally, we notice that the low precision NIHT [20, Algorithm 1] does not show good performance in this experiment. NIHT is proved to be robust when a specific stochastic quantizer is used [20, Section 3.1], while this experiment shows that further adjustment should be done for other quantizers. We specify that our approach does not require a specific quantization operator and uses only information about the maximum quantization error.

In conclusion, the proposed LP method provides the best performance in this experiment, since at similar sparsity levels, it obtains the best recovery accuracy.

VI Conclusions

In this paper, we have addressed the problem of sparse linear regression, with particular attention to the compressed case, when only quantized versions of the predictors and of the measurements are available. This problem is relevant in the applications, while difficult to tackle due to its intrinsic non-convexity. In this work, we have undertaken an $\ell_{\infty}$ approach, based on results in error-in-variables system identification, which allows us to recast the problem into a linear programming model, under suitable sign assumptions. The proposed approach is theoretically proved to be robust, i.e., a finer quantization leads to a smaller recovery error. Moreover, numerical simulations show an improved recovery accuracy with respect to known methods. Generalizations of the sign assumption are possible, as well as the addition of other sources of noise. This will be the subject of future work.

Bibliography29

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] D. L. Donoho, “Compressed sensing,” IEEE Trans. Inf. Theory , vol. 52, no. 4, pp. 1289–1306, 2006.
2[2] S. Foucart and H. Rauhut, A Mathematical Introduction to Compressive Sensing . New York: Springer, 2013.
3[3] Y. Gu, J. Jin, and S. Mei, “ ℓ 0 subscript ℓ 0 \ell_{0} norm constraint LMS algorithm for sparse system identification,” IEEE Signal Process. Lett. , vol. 16, no. 9, pp. 774–777, 2009.
4[4] R. Tóth, B. M. Sanandaji, K. Poolla, and T. L. Vincent, “Compressive system identification in the linear time-invariant framework,” in Proc. IEEE Conf. Decision Control (CDC) , 2011, pp. 783–790.
5[5] R. Toth, H. Hjalmarsson, and C. Rojas, “Sparse estimation of rational dynamical models,” in Proc. IFAC SYSID , 2012, pp. 983–988.
6[6] C. Novara, “Sparse identification of nonlinear functions and parametric set membership optimality analysis,” in Proc. American Control Conf. (ACC) , 2011, pp. 663–668.
7[7] G. Calafiore, L. E. Ghaoui, and C. Novara, “Sparse identification of polynomial and posynomial models,” in Proc. IFAC World Congress , 2014, pp. 3239–3243.
8[8] B. M. Sanandaji, T. L. Vincent, M. B. Wakin, and R. Tóth, “Compressive system identification of lti and ltv arx models,” in Proc. IEEE Conf. Decision Control (CDC) , 2011, pp. 783–790.