Graph Sampling for Covariance Estimation

Sundeep Prabhakar Chepuri; Geert Leus

arXiv:1704.07661·cs.IT·May 8, 2018

Graph Sampling for Covariance Estimation

Sundeep Prabhakar Chepuri, Geert Leus

PDF

TL;DR

This paper introduces methods for efficiently estimating second-order statistics of signals on graphs by subsampling vertices, enabling accurate reconstruction without spectral priors, applicable to various graph types and real-world datasets.

Contribution

It proposes a novel subsampling approach for covariance estimation on graphs that requires fewer samples and no spectral priors, including algorithms for different models and graph types.

Findings

01

Successful reconstruction of graph signal statistics from reduced samples.

02

Development of near-optimal greedy algorithms for subsampling design.

03

Validation on synthetic and real datasets demonstrating effectiveness.

Abstract

In this paper the focus is on subsampling as well as reconstructing the second-order statistics of signals residing on nodes of arbitrary undirected graphs. Second-order stationary graph signals may be obtained by graph filtering zero-mean white noise and they admit a well-defined power spectrum whose shape is determined by the frequency response of the graph filter. Estimating the graph power spectrum forms an important component of stationary graph signal processing and related inference tasks such as Wiener prediction or inpainting on graphs. The central result of this paper is that by sampling a significantly smaller subset of vertices and using simple least squares, we can reconstruct the second-order statistics of the graph signal from the subsampled observations, and more importantly, without any spectral priors. To this end, both a nonparametric approach as well as parametric…

Equations160

S

S

= [u_{1}, \dots, u_{N}] diag [λ_{1}, \dots, λ_{N}] [u_{1}, \dots, u_{N}]^{H},

x_{f} := U^{H} x \Leftrightarrow x =: U x_{f} .

x_{f} := U^{H} x \Leftrightarrow x =: U x_{f} .

H

H

= U [h_{0} I + h_{1} Λ + \dots + h_{L - 1} Λ^{L - 1}] U^{H},

H_{f} = l = 0 \sum L - 1 h_{l} Λ^{l} = diag [V_{L} h] = diag [h_{f, 1}, \dots, h_{f, N}]

H_{f} = l = 0 \sum L - 1 h_{l} Λ^{l} = diag [V_{L} h] = diag [h_{f, 1}, \dots, h_{f, N}]

R_{x}

R_{x}

= U diag [∣ h_{f, 1} ∣^{2}, \dots, ∣ h_{f, N} ∣^{2}] U^{H}

= U diag [p] U^{H},

p_{n} = u_{n}^{H} R_{x} u_{n}, n = 1, 2, \dots, N .

p_{n} = u_{n}^{H} R_{x} u_{n}, n = 1, 2, \dots, N .

y = Φ x,

y = Φ x,

x = i = 1 \sum N x_{f, i} u_{i} .

x = i = 1 \sum N x_{f, i} u_{i} .

R_{x} = i = 1 \sum N E {∣ x_{f, i} ∣^{2}} u_{i} u_{i}^{H} = i = 1 \sum N p_{i} u_{i} u_{i}^{H} = i = 1 \sum N p_{i} Q_{i},

R_{x} = i = 1 \sum N E {∣ x_{f, i} ∣^{2}} u_{i} u_{i}^{H} = i = 1 \sum N p_{i} u_{i} u_{i}^{H} = i = 1 \sum N p_{i} Q_{i},

r_{x} = vec (R_{x}) = i = 1 \sum N p_{i} vec (Q_{i}) = Ψ_{s} p,

r_{x} = vec (R_{x}) = i = 1 \sum N p_{i} vec (Q_{i}) = Ψ_{s} p,

Ψ_{s} = [\overset{ˉ}{u}_{1} \otimes u_{1}, \dots, \overset{ˉ}{u}_{N} \otimes u_{N}] = \overset{ˉ}{U} \circ U .

Ψ_{s} = [\overset{ˉ}{u}_{1} \otimes u_{1}, \dots, \overset{ˉ}{u}_{N} \otimes u_{N}] = \overset{ˉ}{U} \circ U .

R_{y} = Φ R_{x} Φ^{T} = i = 1 \sum N p_{i} Φ Q_{i} Φ^{T} .

R_{y} = Φ R_{x} Φ^{T} = i = 1 \sum N p_{i} Φ Q_{i} Φ^{T} .

r_{y} = vec (R_{y}) = (Φ \otimes Φ) vec (R_{x}) \in C^{K^{2}}

r_{y} = vec (R_{y}) = (Φ \otimes Φ) vec (R_{x}) \in C^{K^{2}}

r_{y}

r_{y}

= (Φ \otimes Φ) Ψ_{s} p .

p = [(Φ \otimes Φ) Ψ_{s}]^{†} r_{y} .

p = [(Φ \otimes Φ) Ψ_{s}]^{†} r_{y} .

p \in P minimize ∥ r_{y} - (Φ \otimes Φ) Ψ_{s} p ∥_{2}^{2} .

p \in P minimize ∥ r_{y} - (Φ \otimes Φ) Ψ_{s} p ∥_{2}^{2} .

u_{n} = [ω_{n}^{0}, ω_{n}, ω_{n}^{2}, \dots, ω_{n}^{N - 1}]^{T}

u_{n} = [ω_{n}^{0}, ω_{n}, ω_{n}^{2}, \dots, ω_{n}^{N - 1}]^{T}

x = H (h) n = l = 0 \sum L - 1 h_{l} S^{l} n = U (l = 0 \sum L - 1 h_{l} Λ^{l}) U^{H} n

x = H (h) n = l = 0 \sum L - 1 h_{l} S^{l} n = U (l = 0 \sum L - 1 h_{l} Λ^{l}) U^{H} n

R_{x}

R_{x}

= U (l = 0 \sum L - 1 h_{l} Λ^{l}) (l = 0 \sum L - 1 \overset{ˉ}{h}_{l} Λ^{l}) U^{H},

R_{x} = k = 0 \sum Q - 1 b_{k} S^{k},

R_{x} = k = 0 \sum Q - 1 b_{k} S^{k},

R_{x} = h_{0}^{2} I

R_{x} = h_{0}^{2} I

+ 2 h_{1} h_{2} S^{3} + h_{2}^{2} S^{4} .

r_{x} = vec (R_{x}) = k = 0 \sum Q - 1 b_{k} vec (S^{q}) = Ψ_{MA} b,

r_{x} = vec (R_{x}) = k = 0 \sum Q - 1 b_{k} vec (S^{q}) = Ψ_{MA} b,

Ψ_{MA} = [vec (S^{0}), vec (S^{1}), \dots, vec (S^{Q - 1})],

Ψ_{MA} = [vec (S^{0}), vec (S^{1}), \dots, vec (S^{Q - 1})],

R_{y} = Φ R_{x} Φ^{T} = k = 0 \sum Q - 1 b_{k} Φ S^{k} Φ^{T} .

R_{y} = Φ R_{x} Φ^{T} = k = 0 \sum Q - 1 b_{k} Φ S^{k} Φ^{T} .

r_{y} = vec (R_{y})

r_{y} = vec (R_{y})

= (Φ \otimes Φ) Ψ_{MA} b .

b = [(Φ \otimes Φ) Ψ_{MA}]^{†} r_{y} .

b = [(Φ \otimes Φ) Ψ_{MA}]^{†} r_{y} .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Graph Sampling for Covariance Estimation

Sundeep Prabhakar Chepuri, Geert Leus, The authors are with the Faculty of Electrical Engineering, Mathematics and Computer Science, Delft University of Technology, The Netherlands. Email: {s.p.chepuri;g.j.t.leus}@tudelft.nl.This work was supported by the KAUST-MIT-TUD consortium grant OSR-2015-Sensors-2700. A conference precursor of this manuscript appeared in the the Ninth IEEE Sensor Array and Multichannel Signal Processing Workshop (SAM), Rio de Janeiro, Brazil, June 2016 [1].

Abstract

In this paper the focus is on subsampling as well as reconstructing the second-order statistics of signals residing on nodes of arbitrary undirected graphs. Second-order stationary graph signals may be obtained by graph filtering zero-mean white noise and they admit a well-defined power spectrum whose shape is determined by the frequency response of the graph filter. Estimating the graph power spectrum forms an important component of stationary graph signal processing and related inference tasks such as Wiener prediction or inpainting on graphs. The central result of this paper is that by sampling a significantly smaller subset of vertices and using simple least squares, we can reconstruct the second-order statistics of the graph signal from the subsampled observations, and more importantly, without any spectral priors. To this end, both a nonparametric approach as well as parametric approaches including moving average and autoregressive models for the graph power spectrum are considered. The results specialize for undirected circulant graphs in that the graph nodes leading to the best compression rates are given by the so-called minimal sparse rulers. A near-optimal greedy algorithm is developed to design the subsampling scheme for the non-parametric and the moving average models, whereas a particular subsampling scheme that allows linear estimation for the autoregressive model is proposed. Numerical experiments on synthetic as well as real datasets related to climatology and processing handwritten digits are provided to demonstrate the developed theory.

\IEEEkeywords

Graph signal processing, stationary graph signals, sparse sampling, graph power spectrum estimation, compressive covariance sensing.

I Introduction

Graphs are mathematical objects that can be used for describing and explaining relationships in complex datasets, which appear commonly in modern data analysis. The nodes of the graph denote the entities themselves and the edges encode the pairwise relationship between these entities. Some examples of such complex-structured data beyond traditional time-series include gene regulatory networks [2], brain networks [3], transportation networks [4], social and economic networks [5], and so on. Processing signals residing on the nodes of a graph taking into account the relationships between them as explained by the edges of the graph is recently receiving a significant amount of interest. In particular, generalizing as well as drawing parallels of classical time-frequency analysis tools to graph data analysis while incorporating the irregular structure on which the graph signals are defined is an emerging area of research [6, 7].

Graph signals could be stochastic in nature and they can be modeled as the output of a graph filter [8] whose input is also a random signal (e.g., white noise). We are interested in sampling and processing stationary graph signals, which are stochastic signals defined on graphs with second-order statistics that are invariant similar to time series, but in the graph setting. Second-order stationary graph signals are characterized by a well-defined graph power spectrum. They can be generated by graph filtering white noise (or any other stationary graph signal) and the graph power spectrum of the filtered signal will be characterized by the squared magnitude of the frequency response of the filter; see [9, 10, 11, 12].

The second-order statistics of graph signals, or equivalently the graph power spectrum, are essential to solve inference problems on graphs in the Bayesian setting such as smoothing, prediction, inpainting, and deconvolution; see [13] and [10] for some Bayesian inference problems. These inference problems are solved by designing optimum (in the minimum mean squared error sense) Wiener-like filters and the graph power spectrum forms a crucial component of such filter designs. In order to compute the graph power spectrum, traditional methods require the processing of signals on all graph nodes. The sheer quantity of data and scale of the graph often inhibit this reconstruction method. Therefore, the main question that we address in this paper is, can we reconstruct the graph power spectrum by observing a small subset of graph nodes?

I-A Related works and main results

The notion of stationarity of signals on graphs and related definitions can be found in [9, 10, 11, 12], and it will be briefly explained in the next section as well. Several techniques for graph power spectrum estimation have been discussed in [10] and [11], and they are based on observations from all the nodes. In this paper, we consider the problem of reconstructing the second-order statistics of signals on graphs, but from subsampled observations. The fact that we are reconstructing the graph power spectrum, instead of the graph signal, enables us to subsample the graph signal (or sparsely sample the graph nodes), even without any spectral priors (e.g., sparsity, bandlimited with known support). This is a new and different perspective as compared to subsampling for graph signal reconstruction [14, 15, 16, 17], which imposes some spectral prior that enables graph signal reconstruction. The proposed concept basically generalizes the field of compressive covariance sensing [18, 19, 20] to the graph setting.

The aim of this paper is to reconstruct second-order statistics of stationary graph signals from observations available at a few nodes using simple reconstruction methods such as least squares. The contributions are summarized as the following main results:

•

Non-parametric approach: Without any spectral priors, second-order statistics of length- $N$ stationary graph signals can be recovered using least squares from a reduced subset of $\mathcal{O}(\sqrt{N})$ observations, i.e., by observing $\mathcal{O}(\sqrt{N})$ graph nodes. In this case, the processing is done in the graph spectral domain.

•

Circulant graphs: As a special case, when the graphs are circulant, the identifiability results are elegant. That is, the subset of nodes resulting in the best compression rates are given by the so-called minimal sparse rulers. This is reminiscent of compressive covariance sensing [20] for data that reside on a regular support such as time series, which is a specific instance of a circulant graph.

•

Parametric approach: It is also possible to model the graph power spectrum using a small number of parameters, e.g., the graph signals may be modeled by moving average or autoregressive graph filters. The reconstruction of the second-order statistics of the graph signal then boils down to the estimation of moving average or autoregressive coefficients. Such a parameterization allows for a higher compression. When the graph power spectrum is modeled using a moving average graph filter, the second-order statistics can be recovered using least squares from $\mathcal{O}(\sqrt{Q})$ observations, where $Q=\min\{2L-1,N\}$ with $L$ being the number of moving average filter coefficients. When the graph power spectrum is modeled using an autoregressive graph filter, $P$ autoregressive filter coefficients can be recovered using linear least squares by observing $\mathcal{O}(P)$ nodes.

•

Subsampler design: The proposed samplers are deterministic and they perform node subsampling. Subsampler design, therefore, becomes a discrete combinatorial optimization problem. For the spectral domain and moving average case, the subsampler can be designed using a near-optimal greedy algorithm. However, for the autoregressive approach, the sampler design depends also on (unobserved) data, and thus a mean squared error optimal design is not possible. This is due to the fact that we restrict ourselves to a low-complexity linear estimator for the autoregressive filter coefficients. Nevertheless, we present a suboptimal technique to design a subsampler for the autoregressive case as well.

I-B Outline and notation

The remainder of the paper is organised as follows. The preliminary concepts of graph signal processing are discussed in Section II. The proposed least squares based reconstruction of the second-order statistics based on the subsampled observations are discussed in Section III. Connections of compressive covariance sensing for time-series with sensing data residing on circulant graphs are discussed in Section IV. In Section V, the graph power spectrum is represented with a small number of parameters under moving average and autoregressive models, and these parameters are then reconstructed using least squares from subsampled observations. In Section VI, we discuss the validity of the results provided in this paper for finite data records. Under the assumption that the data follows a Gaussian distribution, the maximum likelihood estimator and the related Cramér-Rao bound are also derived. In Section VII, the design of sparse sampling matrices based on low-complexity greedy algorithms is discussed. A few examples to illustrate the proposed framework are provided in Section VIII. Finally, the paper concludes with Section IX.

The notation used in this paper is described as follows. Upper (lower) boldface letters are used for matrices (column vectors). Overbar $\bar{(\cdot)}$ denotes complex conjugation, $(\cdot)^{T}$ denotes the transpose, and $(\cdot)^{H}$ denotes the complex conjugate (Hermitian) transpose. $(\cdot)^{-T}$ is a shorthand notation for $\left((\cdot)^{-1}\right)^{T}$ . $\mathrm{diag}[\cdot]$ refers to a diagonal matrix with its argument on the main diagonal. ${\rm diag_{r}}[\cdot]$ represents a diagonal matrix with the argument on its diagonal, but with the all-zero rows removed. ${\boldsymbol{1}}$ $({\boldsymbol{0}})$ denotes the vector of all ones (zeros). ${\boldsymbol{I}}$ is an identity matrix. $\mathbb{E}\{\cdot\}$ denotes the expectation operation. The $\ell_{0}$ -(quasi) norm of ${\boldsymbol{w}}=[w_{1},w_{2},\ldots,w_{N}]^{T}$ refers to the number of non-zero entries in ${\boldsymbol{w}}$ , i.e., ${\|{\boldsymbol{w}}\|}_{0}:=|\{n\,:\,w_{n}\neq 0\}|$ . The $\ell_{1}$ -norm of ${\boldsymbol{w}}$ is denoted by ${\|{\boldsymbol{w}}\|}_{1}=\sum_{n=1}^{N}|w_{n}|$ . The notation $\thicksim$ is read as “is distributed according to”. Unless and otherwise noted, logarithms are natural. ${\rm tr}\{\cdot\}$ is the matrix trace operator. ${\rm det}\{\cdot\}$ is the matrix determinant. ${\rm rank}(\cdot)$ denotes the rank of a matrix. $\lambda_{\rm min}\{{\boldsymbol{A}}\}$ ( $\lambda_{\rm max}\{{\boldsymbol{A}}\}$ ) denotes the minimum (maximum) eigenvalue of a symmetric matrix ${\boldsymbol{A}}$ . ${\boldsymbol{A}}\succeq{\boldsymbol{B}}$ means that ${\boldsymbol{A}}-{\boldsymbol{B}}$ is a positive semidefinite matrix. $\mathbb{S}^{N}$ ( $\mathbb{S}^{N}_{+}$ ) denotes the set of symmetric (symmetric positive semi-definite) matrices of size $N\times N$ . $|\mathcal{U}|$ denotes the cardinality of the set $\mathcal{U}$ . $\otimes$ denotes the Kronecker product, $\circ$ denotes the Khatri-Rao or columnwise Kronecker product, and ${\rm vec}(\cdot)$ refers to the matrix vectorization operator. For a full column rank tall matrix ${\boldsymbol{A}}$ , the left inverse is given by ${\boldsymbol{A}}^{\dagger}=({\boldsymbol{A}}^{H}{\boldsymbol{A}})^{-1}{\boldsymbol{A}}^{H}$ . The column span of ${\boldsymbol{A}}$ and row null space of ${\boldsymbol{A}}$ are denoted by ${\rm ran}({\boldsymbol{A}})$ and ${\rm null}({\boldsymbol{A}})$ , respectively. Properties that are frequently used in this paper:

•

${\rm vec}({\boldsymbol{A}}{\boldsymbol{B}}{\boldsymbol{C}})=({\boldsymbol{C}}^{T}\otimes{\boldsymbol{A}}){\rm vec}({\boldsymbol{B}});$

•

${\rm vec}({\boldsymbol{A}}{\rm diag}[{\boldsymbol{b}}]{\boldsymbol{C}})=({\boldsymbol{C}}^{T}\circ{\boldsymbol{A}}){\boldsymbol{b}}.$

II Preliminaries

In this section, we introduce some preliminary concepts related to deterministic and stochastic signals defined on graphs.

II-A Graph signals and filtering

Consider a dataset with $N$ elements denoted as ${\boldsymbol{x}}\in\mathbb{C}^{N}$ , which live on an irregular structure represented by an undirected graph $\mathcal{G}=(\mathcal{V},\mathcal{E})$ , where the vertex set $\mathcal{V}=\{v_{1},\cdots,v_{N}\}$ denotes the set of nodes, and the edge set $\mathcal{E}$ reveals any connection between the nodes, i.e., $(i,j)\in\mathcal{E}$ means that node $i$ is connected to node $j$ . The $n$ th entry of ${\boldsymbol{x}}$ , i.e., $x_{n}$ , is indexed by node $v_{n}$ of the graph $\mathcal{G}$ . Therefore, we refer to the dataset ${\boldsymbol{x}}$ as a length- $N$ graph signal.

Let us introduce an operator ${\boldsymbol{S}}\in\mathbb{C}^{N\times N}$ , where the $(i,j)$ th entry of ${\boldsymbol{S}}$ denoted by $s_{i,j}$ is nonzero only if $(i,j)\in\mathcal{E}$ and $s_{i,j}$ can also be nonzero if $i=j$ for $(i,j)\in\mathcal{E}$ , and is zero otherwise. The pattern of ${\boldsymbol{S}}$ captures the local structure of the graph. More specifically, for a graph signal ${\boldsymbol{x}}$ , the signal ${\boldsymbol{S}}{\boldsymbol{x}}$ denotes the unit shifted version of ${\boldsymbol{x}}$ . Hence ${{\boldsymbol{S}}}$ is referred to as the graph-shift operator [8]. Different choices for ${\boldsymbol{S}}$ include the graph Laplacian ${\boldsymbol{L}}$ [6], the adjacency matrix ${\boldsymbol{A}}$ [8], or their respective variants. For undirected graphs, ${\boldsymbol{S}}$ is symmetric (more generally, Hermitian), and thus it admits the following eigenvalue decomposition

[TABLE]

where the eigenvectors $\{{\boldsymbol{u}}_{n}\}_{n=1}^{N}$ and the eigenvalues $\{\lambda_{n}\}_{n=1}^{N}$ of ${\boldsymbol{S}}$ provide the notion of frequency in the graph setting [6, 7]. Specifically, $\{{\boldsymbol{u}}_{n}\}_{n=1}^{N}$ forms an orthonormal Fourier-like basis for graph signals with the graph frequencies denoted by $\{\lambda_{n}\}_{n=1}^{N}$ . Hence, the graph Fourier transform of a graph signal, ${\boldsymbol{x}}_{f}=[x_{f,1},x_{f,2},\ldots,x_{f,N}]^{T}\in\mathbb{C}^{N}$ , is given by

[TABLE]

The frequency content of graph signals can be modified using linear shift-invariant graph filters [8, 6]. Let us call the system ${\boldsymbol{H}}\in\mathbb{C}^{N\times N}$ as a graph filter. If the eigenvalues of ${\boldsymbol{S}}$ are distinct, a shift-invariant graph filter, which satisfies ${\boldsymbol{H}}({\boldsymbol{S}}{\boldsymbol{x}})={\boldsymbol{S}}({\boldsymbol{H}}{\boldsymbol{x}})$ , can be expressed as a polynomial in ${\boldsymbol{S}}$ as [8]

[TABLE]

where the filter ${\boldsymbol{H}}$ is of degree $L-1$ with filter coefficients ${\boldsymbol{h}}=[h_{0},h_{1},\ldots,h_{L-1}]^{T}\in\mathbb{C}^{L}$ , and $L\leq N$ as $N$ is the degree of the minimal polynomial (equal to the characteristic polynomial) of ${\boldsymbol{S}}$ . The diagonal matrix

[TABLE]

can be viewed as the frequency response of the graph filter. Here, ${\boldsymbol{V}}_{L}$ is an $N\times L$ Vandermonde matrix with the $(i,j)$ th entry as $\lambda_{i}^{j-1}$ .

II-B Stationary graph signals

Let ${\boldsymbol{x}}=[x_{1},x_{2},\cdots,x_{N}]^{T}\in\mathbb{C}^{N}$ be a stochastic signal defined on the vertices of the graph $\mathcal{G}$ with expected value ${\boldsymbol{m}}_{\boldsymbol{x}}=\mathbb{E}\{{\boldsymbol{x}}\}$ and covariance matrix ${\boldsymbol{R}}_{\boldsymbol{x}}=\mathbb{E}\{({\boldsymbol{x}}-{\boldsymbol{m}}_{\boldsymbol{x}})({\boldsymbol{x}}-{\boldsymbol{m}}_{\boldsymbol{x}})^{H}\}$ . Efforts to generalize some of the concepts of statistical time invariance or stationarity of signals defined over regular structures to random graph signals have been made in [9, 10, 11, 12]. For the sake of completeness, we will summarize the definitions from [9, 10, 11, 12] as follows.

Definition 1 (Second-order stationarity).

A random graph signal ${\boldsymbol{x}}$ is second-order stationary, if and only if, the following properties hold:

The mean of the graph signal is collinear to an eigenvector of ${\boldsymbol{S}}$ corresponding to the smallest eigenvalue, i.e., ${\boldsymbol{m}}_{\boldsymbol{x}}=m_{\boldsymbol{x}}{\boldsymbol{u}}_{1}$ .

2.

Matrices ${\boldsymbol{S}}$ and ${\boldsymbol{R}}_{\boldsymbol{x}}$ can be simultaneously diagonalized.

Since we assume that the eigenvalues of ${\boldsymbol{S}}$ are distinct and ${\boldsymbol{U}}$ forms an orthonormal basis, property 2 in the above definition essentially means the statistical orthogonality of spectral components, i.e,. $\mathbb{E}\{x_{f,i}{\bar{x}_{f,j}}\}=0$ for $i\neq j$ [12].

For simplicity, from now on we will focus on graph signals with zero mean, where we assume that $m_{\boldsymbol{x}}$ is either known or $m_{\boldsymbol{x}}$ can be set to zero by preprocessing the data as discussed in Section VIII. We can generate zero-mean second-order stationary graph signals by graph filtering zero-mean white noise. Let ${\boldsymbol{n}}=[n_{1},n_{2},\ldots,n_{N}]^{T}\in\mathbb{C}^{N}$ be zero-mean unit-variance noise with covariance matrix ${\boldsymbol{R}}_{\boldsymbol{n}}={\boldsymbol{I}}$ . Then, a zero-mean second-order stationary graph signal ${\boldsymbol{x}}$ can be modeled as ${\boldsymbol{x}}={\boldsymbol{H}}{\boldsymbol{n}},$ where ${\boldsymbol{H}}$ can be any valid graph filter. The filtered signal will have zero mean and covariance matrix ${\boldsymbol{R}}_{\boldsymbol{x}}=\mathbb{E}\{({\boldsymbol{H}}{\boldsymbol{n}})({\boldsymbol{H}}{\boldsymbol{n}})^{H}\}$ given by

[TABLE]

where $h_{f,n}=h_{0}+h_{1}\lambda_{n}+\cdots+h_{L-1}\lambda_{n}^{L-1}$ is defined in (4). This conforms to the second property listed in Definition 1. More generally, graph filtering any second-order stationary graph signal also results in a second-order stationary graph signal (it is easy to verify this using property $2$ in Definition 1). The nonnegative vector ${\rm diag}[{\boldsymbol{p}}]$ in (5) is referred to as the graph power spectral density or graph power spectrum. We now formally introduce the graph power spectrum through the following definition.

Definition 2 (Graph power spectrum).

The graph power spectral density of a second-order stationary graph signal is a real-valued nonnegative length- $N$ vector ${\boldsymbol{p}}=[p_{1},p_{2},\ldots,p_{N}]^{T}\in\mathbb{R}_{+}^{N}$ with entries given by

[TABLE]

Alternatively, $p_{n}=|h_{f,n}|^{2}\geq 0$ , for $n=1,2,\ldots,N$ , where $h_{f,n}=h_{0}+h_{1}\lambda_{n}+\cdots+h_{L-1}\lambda_{n}^{L-1}$ is defined in (4).

Second-order stationarity is preserved by linear graph filtering. This means that stationary graph signals with a prescribed graph power spectrum can be generated by filtering white noise, where the graph power spectrum of the filtered signal is reshaped according to the frequency response of the graph filter [9, 10, 11]. As a result, the graph power spectrum reveals critical information about the second-order stationary graph signal, and thus estimating the graph power spectrum or recovering the second-order statistics of a graph signal is useful in many applications.

We end this section by summarizing the list of assumptions made in this paper.

The shift operator ${\boldsymbol{S}}$ is known. 2. 2.

The orthonormal basis ${\boldsymbol{U}}$ and the distinct eigenvalues $\{\lambda_{n}\}_{n=1}^{N}$ of ${\boldsymbol{S}}$ are known a priori.

III Non-parametric Spectral Domain Approach

The size of the datasets often inhibits a direct computation of the second-order statistics, e.g., by observing all the $N$ nodes and using (6) to compute the graph power spectrum. This would computationally cost $\mathcal{O}(N^{3})$ . As such, compression or data reduction is preferred especially for large-scale data in the graph setting [7]. In the context of graph signal processing, most works consider subsampling the graph signal ${\boldsymbol{x}}$ assuming some spectral prior to reconstruct it [14, 15, 16, 17]. This approach is, in principle, also possible for recovering the second-order statistics of ${\boldsymbol{x}}$ . However, when the goal is to reconstruct the second-order statistics of ${\boldsymbol{x}}$ (and not ${\boldsymbol{x}}$ itself), it is computationally advantageous, and allows for a stronger compression, when we avoid the intermediate step of reconstructing and storing ${\boldsymbol{x}}$ . In this paper, we will therefore focus on recovering graph second-order statistics directly from subsampled graph signals. We refer to this problem as graph covariance subsampling.

The extension of compressive covariance sensing [18, 19, 20] to graph covariance subsampling is non-trivial. This is because for second-order (or wide-sense) stationary signals with a regular support, the covariance matrix has a clear structure (e.g., Toeplitz, circulant) that enables an elegant subsampler design, but for second-order stationary graph signals residing on arbitrary graphs, the covariance matrix does not admit any clear structure that can be easily exploited, in general.

Consider the problem of estimating the graph power spectrum of the second-order stationary graph signal ${\boldsymbol{x}}\in\mathbb{C}^{N}$ from a set of $K\ll N$ linear observations stacked in the vector ${\boldsymbol{y}}\in\mathbb{C}^{K}$ , given by

[TABLE]

where ${\boldsymbol{\Phi}}$ is a known $K\times N$ selection matrix with Boolean entries, i.e., ${\boldsymbol{\Phi}}\in\{0,1\}^{K\times M}$ (we will discuss the subsampler design in Section VII) and where several realizations of ${\boldsymbol{y}}$ may be available. The matrix ${\boldsymbol{\Phi}}$ is referred to as the subsampling or sparse sampling matrix, where the compression is achieved by setting $K\ll N$ . For applications where graph nodes correspond to sensing devices (e.g., weather stations in climatology, electroencephalography (EEG) probes in brain networks), such a sparse sampling scheme results in a significant reduction in the hardware, storage and communications costs next to the reduction in the processing costs.

The covariance matrices ${\boldsymbol{R}}_{\boldsymbol{x}}=\mathbb{E}\{{\boldsymbol{x}}{\boldsymbol{x}}^{H}\}\in\mathbb{C}^{N\times N}$ and ${\boldsymbol{R}}_{\boldsymbol{y}}=\mathbb{E}\{{\boldsymbol{y}}{\boldsymbol{y}}^{H}\}\in\mathbb{C}^{K\times K}$ contain the second-order statistics of ${\boldsymbol{x}}$ and ${\boldsymbol{y}}$ , respectively. In practice, typically, multiple snapshots, say $N_{s}$ snapshots, are observed to form a sample covariance matrix. Forming the sample covariance matrix from $N_{s}$ snapshots of ${\boldsymbol{x}}$ costs $\mathcal{O}(N^{2}N_{s})$ , while forming the sample covariance matrix from $N_{s}$ snapshots of ${\boldsymbol{y}}$ only costs $\mathcal{O}(K^{2}N_{s})$ . We now state the problem of interest as follows.

Problem.

(Recovering second-order statistics) For a known undirected graph $\mathcal{G}$ , given a number of realizations , say $N_{s}$ , of the subsampled length- $K$ graph signal ${\boldsymbol{y}}$ or the subsampled covariance matrix ${\boldsymbol{R}}_{\boldsymbol{y}}$ , recover the graph power spectrum ${\boldsymbol{p}}$ and thus the covariance matrix ${\boldsymbol{R}}_{\boldsymbol{x}}$ .

Let us decompose the graph signal ${\boldsymbol{x}}$ in terms of its graph Fourier transform coefficients as [cf. (2)]

[TABLE]

This allows us to represent the covariance matrix ${\boldsymbol{R}}_{\boldsymbol{x}}=\mathbb{E}\{{\boldsymbol{x}}{\boldsymbol{x}}^{H}\}$ in the graph Fourier domain using the graph power spectrum ${\boldsymbol{p}}$ as

[TABLE]

where we use the fact that for $i\neq j$ we have $\mathbb{E}\{x_{f,i}\bar{x}_{f,j}\}=0$ and ${\boldsymbol{Q}}_{i}={\boldsymbol{u}}_{i}{\boldsymbol{u}}_{i}^{H}$ is a size- $N$ rank-one matrix. Here, we expand ${\boldsymbol{R}}_{\boldsymbol{x}}$ using a set of $N$ Hermitian matrices $\{{\boldsymbol{Q}}_{1},{\boldsymbol{Q}}_{2},\ldots,{\boldsymbol{Q}}_{N}\}$ as a basis. Vectorizing ${\boldsymbol{R}}_{x}$ in (8) results in

[TABLE]

where we have stacked ${\rm vec}({\boldsymbol{Q}}_{i})=\bar{{\boldsymbol{u}}}_{i}\otimes{\boldsymbol{u}}_{i}$ to form the $N^{2}\times N$ matrix ${\boldsymbol{\Psi}}_{\rm s}$ as

[TABLE]

The subscript “ ${\rm s}$ ” in the matrix ${\boldsymbol{\Psi}}_{\rm s}$ , which is constructed using the graph Fourier basis vectors, stands for spectral domain.

Using the compression scheme described in (7), the covariance matrix ${\boldsymbol{R}}_{\boldsymbol{y}}\in\mathbb{C}^{K\times K}$ of the subsampled graph signal ${\boldsymbol{y}}$ can be related to ${\boldsymbol{R}}_{\boldsymbol{x}}$ as

[TABLE]

This means that the expansion coefficients of ${\boldsymbol{R}}_{\boldsymbol{y}}$ with respect to the set $\{{\boldsymbol{\Phi}}{\boldsymbol{Q}}_{1}{\boldsymbol{\Phi}}^{T},{\boldsymbol{\Phi}}{\boldsymbol{Q}}_{2}{\boldsymbol{\Phi}}^{T},\cdots,{\boldsymbol{\Phi}}{\boldsymbol{Q}}_{N}{\boldsymbol{\Phi}}^{T}\}$ are the same as those of ${\boldsymbol{R}}_{\boldsymbol{x}}$ with respect to the set $\{{\boldsymbol{Q}}_{1},{\boldsymbol{Q}}_{2},\cdots,{\boldsymbol{Q}}_{N}\}$ , and they are preserved under linear compression. It is not yet clear though whether these expansion coefficients, which basically represent the power spectrum, can be uniquely recovered from ${\boldsymbol{R}}_{\boldsymbol{y}}$ .

Vectorizing ${\boldsymbol{R}}_{\boldsymbol{y}}$ as

[TABLE]

we obtain

[TABLE]

This linear system with $N$ unknowns has a unique solution if $({\boldsymbol{\Phi}}\otimes{\boldsymbol{\Phi}}){\boldsymbol{\Psi}}_{\rm s}$ has full column rank, which requires $K^{2}\geq N$ . Assuming that this is the case, the graph power spectrum (thus the second-order statistics of ${\boldsymbol{x}}$ ) can be estimated in closed form via least squares:

[TABLE]

Computing this least squares solution costs $\mathcal{O}(K^{2}N^{2})$ [21]. Although for the non-parametric approach, cost of computing (11) is on the same order as that of the uncompressed case, the cost reduction will be prominent for problems discussed later on in Section V. Further, to compute (11), we have assumed that the true covariance matrix ${\boldsymbol{R}}_{\boldsymbol{y}}$ is available, but a practical scenario with finite data records is discussed in Section VI.

Definition 3.

A wide matrix ${\boldsymbol{\Phi}}$ is a valid graph covariance subsampler if it yields a full column rank matrix $({\boldsymbol{\Phi}}\otimes{\boldsymbol{\Phi}}){\boldsymbol{\Psi}}_{\rm s}$ .

We now derive the conditions under which ${\boldsymbol{\Phi}}$ is a valid graph covariance subsampler. To do this, we first introduce two important lemmas.

Lemma 1.

Since the matrix ${\boldsymbol{U}}\in\mathbb{C}^{N\times N}$ is full rank, the matrix ${\boldsymbol{\Psi}}_{\rm s}=\bar{{\boldsymbol{U}}}\circ{\boldsymbol{U}}$ of size ${N^{2}\times N}$ has full column rank.

Proof.

See Appendix A. ∎

Lemma 2.

If the matrix ${\boldsymbol{\Phi}}\in\mathbb{R}^{K\times N}$ has full row rank, then the matrix ${\boldsymbol{\Phi}}\otimes{\boldsymbol{\Phi}}$ of size $K^{2}\times N^{2}$ also has full row rank.

Proof.

Follows from the singular value decomposition of ${\boldsymbol{\Phi}}$ and the property $({\boldsymbol{A}}\otimes{\boldsymbol{B}})({\boldsymbol{C}}\otimes{\boldsymbol{D}})=({\boldsymbol{A}}{\boldsymbol{C}}\otimes{\boldsymbol{B}}{\boldsymbol{D}})$ . ∎

Using the above two lemmas, we can provide the necessary and sufficient conditions under which the solution in (11) is unique.

Theorem 1.

A full row rank matrix ${\boldsymbol{\Phi}}\in\mathbb{R}^{K\times N}$ is a valid graph covariance subsampler if and only if the matrix $({\boldsymbol{\Phi}}\otimes{\boldsymbol{\Phi}}){\boldsymbol{\Psi}}_{\rm s}$ is tall, i.e., $K^{2}\geq N$ , and ${\rm null}({\boldsymbol{\Phi}}\otimes{\boldsymbol{\Phi}})\cap{\rm ran}({\boldsymbol{\Psi}}_{s})=\emptyset$ .

Proof.

See Appendix B. ∎

Although the linear system of equations (10) can be solved using (unconstrained) least squares, nonnegativity constraints or any spectral prior can be easily accounted for while solving (10) as summarized in the following remark.

Remark 1 (Spectral priors).

Any available prior information about the graph spectrum might allow for a higher compression with $K^{2}<N$ , or an improvement of the solution (11). Suppose we have some prior knowledge about the graph spectrum, i.e., ${\boldsymbol{p}}\in\mathcal{P}$ with $\mathcal{P}$ being the constraint set. For instance, suppose we know a priori that (a) the spectrum is bandlimited (e.g., lowpass) with known support such that $\mathcal{P}=\{{\boldsymbol{p}}\,|\,p_{n}=0,n\notin[N_{l},N_{u}]\}$ , where $[N_{l},N_{u}]$ denotes the support set, (b) the spectrum is sparse, but with unknown support such that $\mathcal{P}:=\{{\boldsymbol{p}}\,|\,\sum_{n=1}^{N}p_{n}=S\}$ , where $S$ denotes the sparsity order (here, we use the convex relaxation of the cardinality constraint), or (c) the power spectrum is nonnegative (by definition), for which $\mathcal{P}:=\{{\boldsymbol{p}}\,|\,p_{n}\geq 0,\forall n\}$ . With such spectral priors, the following constrained least squares problem may be solved

[TABLE]

In what follows, we will discuss and illustrate the connections with compressive covariance sensing [18, 20] for datasets that reside on regular structures (e.g., time series) using a circulant graph (e.g., a cycle graph). We will also see that designing a compression matrix is much more elegant for such circulant graphs.

IV Circulant Graphs

Discrete-time finite or periodic data can be represented using directed cycle graphs, where the direction of the edge represents the evolution of time from past to future. The edge directions may be ignored in some cases, e.g., when we are only interested in exploiting the regular Fourier transform, when we are dealing with the spatial domain, or when the underlying data is a time-reversible stochastic process that is invariant under the reversal of the time scale [22]. In such cases, the data can be represented using an undirected cycle graph, see Fig. 1.

Consider the adjacency matrix of this undirected cycle graph as its graph-shift operator, which will be an $N\times N$ symmetric circulant matrix. We know that a circulant matrix can be diagonalized with a discrete Fourier transform matrix. In other words, the graph Fourier transform matrix ${\boldsymbol{U}}$ related to this graph will consist of the orthonormal vectors

[TABLE]

with ${\omega}_{n}=\exp(-\imath 2\pi n/N)/\sqrt{N}$ and it will be a Vandermonde matrix (here, $\imath^{2}=-1$ ). In general, for circulant graphs with circulant graph-shift operators, an eigenvalue decomposition is not required to compute the graph Fourier transform matrix ${\boldsymbol{U}}$ or the model matrix ${\boldsymbol{\Psi}}_{\rm s}$ , which was introduced in Section III.

Let the set $\mathcal{K}\subset\mathcal{N}$ denote the indices of the selected graph nodes. Now, if we can smartly select the entries of ${{\boldsymbol{u}}}_{n}$ such that the related entries of $\bar{{\boldsymbol{u}}}_{n}\otimes{\boldsymbol{u}}_{n}$ contain all the distinct values $\{{\omega}_{n}^{m}\}$ for $m=0,\cdots,N-1$ , the matrix $({\boldsymbol{\Phi}}\otimes{\boldsymbol{\Phi}}){\boldsymbol{\Psi}}_{\rm s}$ will be a full-column rank Vandermonde matrix. In particular, this means that, for every $m=0,\ldots,N-1$ , there must exist at least one pair of elements $n_{i},n_{j}\in\mathcal{K}$ that satisfies $n_{i}-n_{j}=m$ , where the difference $n_{i}-n_{j}$ is due to the Kronecker product $\bar{{\boldsymbol{u}}}_{n}\otimes{\boldsymbol{u}}_{n}$ . Sets $\mathcal{K}$ having this property are called sparse rulers [20]. Furthermore, if the set contains a minimum number of elements, they are called minimal sparse rulers, which results in the best possible compression.

Let us illustrate this with an example for $N=10$ . In this case, the set $\mathcal{K}=\{0,1,4,7,9\}$ with $K=|\mathcal{K}|=5$ elements is a minimal sparse ruler. In other words, by choosing the subsampling matrix ${\boldsymbol{\Phi}}=\operatorname*{\mathrm{diag}}_{\rm r}[{\boldsymbol{w}}]$ with ${\boldsymbol{w}}=[1,1,0,0,1,0,0,1,0,1]^{T}$ we can ensure that the matrix $({\boldsymbol{\Phi}}\otimes{\boldsymbol{\Phi}}){\boldsymbol{\Psi}}_{\rm s}$ is full column rank, and hence the second-order statistics of ${\boldsymbol{x}}$ can be estimated using (11) by subsampling only $K=5$ nodes. Here, we achieve a compression rate of $K/N=0.5$ . Similarly, for $N=80$ , the minimal sparse ruler has $K=15$ elements, and this results in a compression rate of $K/N=0.1875$ (we will see an example related to $N=80$ and $K=15$ in Section VIII). Sparse rulers for other values of $N$ are tabulated in [23].

Computing minimal sparse rulers is a combinatorial problem with no known expressions. Nevertheless, subsamplers such as coprime [24] and nested sparse samplers [25], which can be computed using a closed-form expression for any $N$ , are also valid covariance subsamplers. However, they are not minimal sparse rulers and thus they do not provide the best compression rate.

Subsampler design for reconstructing the second-order statistics of signals residing on a circulant graph is as elegant as that for reconstructing the second-order statistics of stationary time-series. The design of subsamplers for general graphs, however, is more challenging. This is the subject of Section VII.

V Parameteric Models

In this section, we will focus on a parametric representation of the graph power spectrum. In particular, the focus will be on moving average and autoregressive parametric models. Typically, the model order (i.e., the number of parameters) is much smaller than the length of the graph signal, and since we now have to recover only these parameters, a much stronger compression can be achieved. Also, this means that, we need to store or transmit only fewer parameters, which could be used to generate realizations of second-order stationary graph signals (we will illustrate this with an example in Section VIII)

Parametric methods can be viewed as an alternative approach, where going to the graph spectral domain may be avoided, and instead, all the processing is done directly in the graph vertex domain.

V-A Graph moving average models

As before, we assume that the stationary graph signal ${\boldsymbol{x}}$ is generated by graph filtering zero-mean unit-variance white noise. Recall that in Section III, we did not impose any structure to the graph filter, but now we will assume that the graph filter has a finite impulse response with an all-zero form as in (3); see [10, 11].

Let us begin by writing the graph signal ${\boldsymbol{x}}$ as

[TABLE]

with covariance matrix

[TABLE]

where ${\boldsymbol{x}}$ is a moving average graph signal (G-MA) of order $L-1$ with G-MA coefficients $\{h_{k}\}_{k=0}^{L-1}$ , and the length- $L$ vector ${\boldsymbol{h}}$ collects the G-MA coefficients as ${\boldsymbol{h}}=[h_{0},h_{1},\ldots,h_{L-1}]^{T}$ . Moving average models are particularly useful to represent a smooth graph power spectrum [10, 11].

The expression (12) basically means that we can express the covariance matrix ${\boldsymbol{R}}_{\boldsymbol{x}}$ as a polynomial of the graph shift operator:

[TABLE]

where $Q=\min\{2L-1,N\}$ unknown expansion coefficients $\{b_{k}\}_{k=0}^{Q-1}$ collected in the vector ${\boldsymbol{b}}=[b_{0},b_{1},\cdots,b_{Q-1}]^{T}\in\mathbb{R}^{Q}$ completely characterize the covariance matrix ${\boldsymbol{R}}_{\boldsymbol{x}}$ . In other words, we assume a linear parametrization of the covariance matrix ${\boldsymbol{R}}_{\boldsymbol{x}}$ using the set of $Q$ Hermitian matrices $\{{\boldsymbol{S}}^{0},{\boldsymbol{S}},\cdots,{\boldsymbol{S}}^{Q-1}\}$ as a basis.

The expansion coefficients ${\boldsymbol{b}}$ depend on the G-MA coefficients ${\boldsymbol{h}}$ . To see this, let us consider an example G-MA model with $L=3$ having coefficients ${\boldsymbol{h}}=[h_{0},h_{1},h_{2}]^{T}$ , for which (13) simplifies to

[TABLE]

This means that, ${\boldsymbol{b}}({\boldsymbol{h}})$ will be of length $2L-1$ with entries ${\boldsymbol{b}}({\boldsymbol{h}})=[h_{0}^{2},2h_{0}h_{1},h_{1}^{2}+2h_{2}h_{0},2h_{2}h_{1},h_{2}^{2}]^{T}$ that are related to the G-MA parameters ${\boldsymbol{h}}$ . To arrive a simple (unconstrained) least squares estimator, we will ignore this structure in ${\boldsymbol{b}}$ (we will discuss the how to account for this structure at the end of this subsection). Therefore, with a slight abuse of notation we will henceforth refer to ${\boldsymbol{b}}({\boldsymbol{h}})$ as the G-MA coefficients.

Depending on the shape of the power spectrum, $Q$ can be much smaller than the number of graph nodes (i.e., the length of the vector ${\boldsymbol{p}}$ ) thus allowing a higher compression. In any case, the value of $Q$ will be at most $N$ , recalling that $N$ is the degree of the minimal (and characteristic) polynomial of ${\boldsymbol{S}}$ . That is to say, for $Q\geq N$ , the set of matrices $\{{\boldsymbol{S}}^{0},{\boldsymbol{S}},\cdots,{\boldsymbol{S}}^{Q-1}\}$ are linearly dependent.

Vectorizing ${\boldsymbol{R}}_{\boldsymbol{x}}$ in (13) yields

[TABLE]

where we have stacked ${\rm vec}({\boldsymbol{S}}^{q})$ to form the columns of the matrix ${\boldsymbol{\Psi}}_{\rm MA}\in\mathbb{R}^{N^{2}\times Q}$ as

[TABLE]

and the subscript “ ${\rm MA}$ ” in ${\boldsymbol{\Psi}}_{\rm MA}$ stands for moving average.

The covariance matrix of the subsampled graph signal ${\boldsymbol{y}}$ in (7) will then be

[TABLE]

As in the graph spectral domain approach discussed in Section III, the G-MA coefficients $\{b_{k}\}_{k=0}^{Q-1}$ of ${\boldsymbol{R}}_{\boldsymbol{y}}$ with respect to the set $\{{\boldsymbol{\Phi}}{\boldsymbol{S}}^{0}{\boldsymbol{\Phi}}^{T},{\boldsymbol{\Phi}}{\boldsymbol{S}}{\boldsymbol{\Phi}}^{T},\cdots,{\boldsymbol{\Phi}}{\boldsymbol{S}}^{Q-1}{\boldsymbol{\Phi}}^{T}\}$ are the same as those of ${\boldsymbol{R}}_{\boldsymbol{x}}$ with respect to the set $\{{\boldsymbol{S}}^{0},{\boldsymbol{S}},\cdots,{\boldsymbol{S}}^{Q-1}\}$ .

Vectorizing ${\boldsymbol{R}}_{\boldsymbol{y}}$ , we get a set of $K^{2}$ equations in $Q$ unknowns, given by

[TABLE]

If the matrix $({\boldsymbol{\Phi}}\otimes{\boldsymbol{\Phi}}){\boldsymbol{\Psi}}_{\rm MA}$ has full column rank, which requires $K^{2}\geq Q$ , then the overdetermined system (17) can be uniquely solved using least squares as

[TABLE]

Corollary 1.

A full row rank matrix ${\boldsymbol{\Phi}}\in\mathbb{R}^{K\times N}$ is a valid graph covariance subsampler if and only if the matrix $({\boldsymbol{\Phi}}\otimes{\boldsymbol{\Phi}}){\boldsymbol{\Psi}}_{\rm MA}$ is tall, i.e., $K^{2}\geq Q$ , and ${\rm null}({\boldsymbol{\Phi}}\otimes{\boldsymbol{\Phi}})\cap{\rm ran}({\boldsymbol{\Psi}}_{\rm MA})=\emptyset$ .

Proof.

Follows from Theorem 1. ∎

Although knowing the moving average filter coefficients ${\boldsymbol{b}}$ is equivalent to knowing ${\boldsymbol{R}}_{\boldsymbol{x}}$ , it might be interesting to study the relation between ${\boldsymbol{b}}$ and the power spectrum ${\boldsymbol{p}}$ . We can relate the vector ${\boldsymbol{p}}$ and the vector ${\boldsymbol{b}}$ , by using (6) and (13). That is, we can write $p_{n}=\sum_{k=0}^{Q-1}b_{k}{\lambda}_{n}^{k}$ , or in matrix-vector form we have

[TABLE]

where ${\boldsymbol{V}}_{Q}$ is an $N\times Q$ Vandermonde matrix with $(i,j)$ th entry equal to $\lambda_{i}^{j-1}$ . To recover ${\boldsymbol{p}}$ from ${\boldsymbol{b}}$ , however, we need all the $N$ eigenvalues of ${\boldsymbol{S}}$ to construct ${\boldsymbol{V}}_{Q}$ .

This relation between ${\boldsymbol{p}}$ and ${\boldsymbol{b}}$ can be used to show the equivalence between the linear models (10) and (17) as follows. The fact that ${\boldsymbol{S}}^{q}={\boldsymbol{U}}{\boldsymbol{\Lambda}}^{q}{\boldsymbol{U}}^{H}$ from (1) allows us to express ${\boldsymbol{\Psi}}_{\rm MA}$ in (17) as ${\boldsymbol{\Psi}}_{\rm MA}=({\bar{\boldsymbol{U}}}\circ{\boldsymbol{U}}){\boldsymbol{V}}_{Q}.$ Using this in (17), we obtain ${\boldsymbol{r}}_{\boldsymbol{y}}=({\boldsymbol{\Phi}}\otimes{\boldsymbol{\Phi}})({\bar{\boldsymbol{U}}}\circ{\boldsymbol{U}}){\boldsymbol{V}}_{Q}{\boldsymbol{b}}=({\boldsymbol{\Phi}}{\bar{\boldsymbol{U}}}\circ{\boldsymbol{\Phi}}{\boldsymbol{U}}){\boldsymbol{p}}.$

In the following, we exploit the structure in ${\boldsymbol{b}}$ , which we ignored while solving (17), to develop a constrained least squares estimator.

Remark 2 (Constrained least squares).

To reveal the structure in ${\boldsymbol{b}}({\boldsymbol{h}})$ , let us recall the example (14) with $L=3$ . The coefficients in ${\boldsymbol{b}}({\boldsymbol{h}})$ are related to the squared polynomial $p(t)=(h_{0}+h_{1}t+h_{2}t^{2})^{2}$ , which can also be written as

[TABLE]

The polynomial $p(t)$ can more generally be written as

[TABLE]

where the $L\times L$ Hankel matrix ${\boldsymbol{\Theta}}$ is related to the model order $L-1$ ,

[TABLE]

is an $L\times L$ matrix with ones on its $l$ th anti-diagonal and zeros elsewhere (e.g., ${\boldsymbol{\Theta}}_{0}$ will have a one on its (1,1) entry and zeros elsewhere),

[TABLE]

and ${\boldsymbol{t}}=[1,t,\cdots,t^{2L-2}]$ contains monomials up to order $2(L-1)$ . This means that, we can write

[TABLE]

which together with (17) leads to the constrained least squares:

[TABLE]

with ${\boldsymbol{C}}:=({\boldsymbol{\Phi}}\otimes{\boldsymbol{\Phi}}){\boldsymbol{\Psi}}_{\rm MA}{\boldsymbol{M}}$ . The above least squares problem that accounts for the Kronecker structure in the unknowns can be solved using algebraic methods developed in [26], or by introducing a rank-1 matrix ${{\boldsymbol{H}}}_{\rm kr}={{\boldsymbol{h}}}{{\boldsymbol{h}}}^{H}$ and then solve for ${\boldsymbol{H}}_{\rm kr}$ and ${\boldsymbol{h}}$ using standard rank relaxation techniques [27].

In sum, if the subsampling matrix ${\boldsymbol{\Phi}}$ is carefully designed (subject of Section VII), we can recover the moving average graph power spectrum of a length- $N$ graph signal by observing only $\mathcal{O}(\sqrt{Q})$ nodes.

V-B Graph autoregressive models

A graph autoregressive signal (G-AR) of order $P$ may be generated by filtering zero-mean unit-variance white noise, ${\boldsymbol{n}}$ , with an all-pole filter of the form [11]

[TABLE]

where the G-AR coefficients $\{\alpha_{k}\}_{k=1}^{P}$ are collected in the length- $P$ vector ${\boldsymbol{\alpha}}$ . Such all-pole filters are useful to model, e.g., diffusion processes [11] and graph power spectra with sharp transitions.

The covariance matrix ${\boldsymbol{R}}_{\boldsymbol{x}}$ of the G-AR signal, ${\boldsymbol{x}}={\boldsymbol{H}}({\boldsymbol{\alpha}}){\boldsymbol{n}}$ , given by

[TABLE]

does not admit a linear parameterization in ${\boldsymbol{\alpha}}$ (unlike the moving average approach that we have seen earlier). The subsampled covariance matrix ${\boldsymbol{R}}_{\boldsymbol{y}}\in\mathbb{C}^{{K}\times{K}}$ of the subsampled observations ${\boldsymbol{y}}={\boldsymbol{\Phi}}{\boldsymbol{x}}={\boldsymbol{\Phi}}{\boldsymbol{H}}({\boldsymbol{\alpha}}){\boldsymbol{n}}\in\mathbb{C}^{K}$ , given by

[TABLE]

is also non-linear in ${\boldsymbol{\alpha}}$ . Consequently, vectorizing ${\boldsymbol{R}}_{\boldsymbol{y}}$ leads to a set of $K^{2}$ non-linear equations in $P$ unknowns

[TABLE]

Solving this system of non-linear equations is not trival (e.g., it has to be solved using iterative Newton’s methods). Therefore, in what follows, we will develop a technique for G-AR modeling as well as for graph sampling so that the G-AR parameters can be recovered using non-iterative linear estimators.

The all-pole filter (19) can be alternatively expressed as

[TABLE]

where $\{a_{k}\}_{k=1}^{P}$ are the so-called G-AR parameters. Thus, the G-AR signal satisfies the equations

[TABLE]

In other words, the graph signal ${\boldsymbol{x}}$ depends linearly on the $P$ -shifted graph signals $\{{\boldsymbol{S}}^{k}{\boldsymbol{x}}\}_{k=1}^{P}$ according to the above autoregressive model. So the covariance matrix of ${\boldsymbol{x}}$ can be expressed as

[TABLE]

which is also linear in the G-AR parameters, and where ${\boldsymbol{R}}_{{\boldsymbol{n}}{\boldsymbol{x}}}=\mathbb{E}\{{\boldsymbol{n}}{\boldsymbol{x}}^{H}\}$ may be seen as an error term. Given the (uncompressed) observations, ${\boldsymbol{x}}$ , the above linear model can be used to compute the G-AR coefficients using least squares.

Let $\mathcal{N}_{k}(p)$ denote the set of nodes in the $p$ -hop neighborhood of the $k$ th node, i.e.,

[TABLE]

Using this notation, we will now describe the specific subsampling scheme that we adopt for G-AR models, and we will explain later the advantage of this particular subsampling scheme. Suppose we observe $K_{0}$ graph nodes through a sparse subsampling matrix ${\boldsymbol{\Phi}}_{0}\in\{0,1\}^{K_{0}\times N}$ . Let us denote the set containing the indices of the subsampled nodes by $\mathcal{K}_{0}$ such that $|\mathcal{K}_{0}|=K_{0}$ . Furthermore, we will also observe nodes in the $P$ -hop neighborhood of those $K_{0}$ nodes through $\{{\boldsymbol{\Phi}}_{p}\}_{p=1}^{P}$ . More specifically, with ${\boldsymbol{\Phi}}_{p}$ we observe nodes in the set $\mathcal{N}_{k}(p)$ for ${k\in\mathcal{K}_{0}}$ such that the matrix ${\boldsymbol{\Phi}}_{p}$ will have $K_{p}:=\sum_{k\in\mathcal{K}_{0}}|\mathcal{N}_{k}(p)|$ rows with ${{\boldsymbol{\Phi}}_{p}\in\{0,1\}^{K_{p}\times N}}$ . Mathematically, the above subsampling scheme ${{\boldsymbol{y}}}={\boldsymbol{\Phi}}{\boldsymbol{x}}$ can be expressed as follows:

[TABLE]

where ${\boldsymbol{y}}$ is a vector of length $K=\sum_{l=0}^{P}K_{l}$ , which is also the total number of observations we gather. This sampling scheme is inspired from [28], and we extend it for reconstructing second-order statistics by recognizing the fact that the compressed observations (and their covariance matrices) satisfy the G-AR model. For the sake of presentation, we make abstraction of the redundancies in the observations ${\boldsymbol{y}}$ that may arise due to the nonzero diagonal entries of the powers of the shift-operator or due to overlapping nodes within different neighborhoods. Note that the subsampling scheme for the G-AR model is different from the subsampling schemes discussed in Sections III and V-A as we observe a subset of nodes and its related neighborhood as well. For example, suppose each node has degree $n$ , then we acquire $\mathcal{O}(K_{0}[1+n+n^{2}+\cdots+n^{P}])=\mathcal{O}(K_{0}(1-n^{P+1})/(1-n))$ observations in total.

Using (22), we can express the observations ${\boldsymbol{y}}_{0}={\boldsymbol{\Phi}}_{0}{\boldsymbol{x}}$ as

[TABLE]

where the second equality is due to the structure of the shift operator that operates (locally) on the neighboring nodes, and thus can be expressed via a column selection operation ${\boldsymbol{\Phi}}_{k}^{T}\in\{0,1\}^{N\times K_{k}}$ . Due to the choice of this particular subsampling scheme, the compressed observation ${\boldsymbol{y}}_{0}$ can be expressed as a linear combination of the compressed observations $\{{\boldsymbol{y}}_{k}\}_{k=1}^{P}$ with the G-AR parameters being the combining weights.

By defining ${\boldsymbol{R}}_{p,q}=\mathbb{E}\{{\boldsymbol{y}}_{p}{\boldsymbol{y}}_{q}^{H}\}={\boldsymbol{\Phi}}_{p}{\boldsymbol{R}}_{\boldsymbol{x}}{\boldsymbol{\Phi}}_{q}^{T}\in\mathbb{C}^{K_{p}\times K_{q}}$ , we can express the covariance matrix ${\boldsymbol{R}}_{0,0}$ in terms of the available observations as

[TABLE]

which on vectorizing leads to $K_{0}^{2}$ equations in $P$ unknowns given by

[TABLE]

where $\approx$ is due to the error term. Here, we have stacked ${\rm vec}({\boldsymbol{\Phi}}_{0}{\boldsymbol{S}}^{k}{\boldsymbol{\Phi}}_{k}^{T}{\boldsymbol{R}}_{k,0})$ to form the columns of the matrix ${\boldsymbol{G}}_{0}\in\mathbb{R}^{K_{0}^{2}\times P}$ as

[TABLE]

If the $K_{0}^{2}\times P$ matrix ${\boldsymbol{G}}_{0}$ has full column rank, which requires $K_{0}^{2}\geq P$ , then the overdetermined system (26) can be solved using least squares as

[TABLE]

Therefore, with a carefully chosen subsampling matrix ${\boldsymbol{\Phi}}$ , we can recover a G-AR spectrum of a length- $N$ graph signal, residing on a graph with per node degree $n$ with $\mathcal{O}(\sqrt{P}(1-n^{P+1})/(1-n))$ samples.

Previously in (25), we used only the equations related to the covariance matrix of ${\boldsymbol{y}}_{0}$ , i.e., ${\boldsymbol{\Phi}}_{0}{\boldsymbol{R}}_{\boldsymbol{x}}{\boldsymbol{\Phi}}_{0}^{H}$ , which resulted in $K^{2}_{0}$ equations in $P$ unknowns. In addition to this, since we have access to $\{{\boldsymbol{y}}_{k}\}_{k=1}^{P}$ , we can also use the equations corresponding to the covariances between ${\boldsymbol{y}}_{0}$ and observations $\{{\boldsymbol{y}}_{k}\}_{k=1}^{P}$ . This results in the following system of equations for $q=0,1,\ldots,P$ :

[TABLE]

where ${\boldsymbol{R}}_{0,q}\in\mathbb{C}^{K_{0}\times K_{q}}$ . Vectorizing ${\boldsymbol{R}}_{0,q}$ in (27) for $q=0,1,\ldots,P$ , we get

[TABLE]

where we have stacked ${\rm vec}\left({\boldsymbol{\Phi}}_{0}{\boldsymbol{S}}^{k}{\boldsymbol{\Phi}}_{k}^{T}{\boldsymbol{R}}_{k,q}\right)$ to form the columns of the matrix ${\boldsymbol{G}}_{q}\in\mathbb{R}^{K_{0}K_{q}\times P}$ as

[TABLE]

Now, collecting $\{{\boldsymbol{r}}_{0,q}\}_{q=0}^{P}$ in ${\boldsymbol{r}}_{y}$ as

[TABLE]

and $\{{\boldsymbol{G}}_{q}\}_{q=0}^{P}$ in ${\boldsymbol{G}}$ as

[TABLE]

we have $K_{0}\sum_{q=0}^{P}K_{q}$ equations in $P$ unknowns, i.e.,

[TABLE]

where recall that $K=K_{0}\sum_{q=0}^{P}K_{q}$ . It can be shown that the observation matrix ${\boldsymbol{G}}$ can be expressed as $({\boldsymbol{\Phi}}\otimes{\boldsymbol{\Phi}}_{0}){\boldsymbol{\Psi}}_{\rm AR}$ for some matrix ${\boldsymbol{\Psi}}_{\rm AR}$ (“AR” stands for autoregressive), which now depends on the compressed observations, sampling matrices, and the graph shift operator.

The above linear system (29) can be solved using least squares as

[TABLE]

if the observation matrix ${\boldsymbol{G}}$ has full column rank. This requires $K_{0}\sum_{q=0}^{P}K_{q}\geq P$ . Suppose the graph is connected such that every node has at least one neighbor, then by picking one node would already lead to an overdetermined system. In other words, we can recover a G-AR spectrum with $K_{0}=1$ , which amounts to observing more than $P$ nodes. For example, recall the cycle graph in Fig. 1 with $N$ nodes, where every node has a degree of two. In order to recover two G-AR parameters on such graphs (more generally, for any arbitrary graph with per node degree 2) we need to observe at least $K_{0}+K_{1}+K_{2}=5$ nodes using this technique. Depending on the graph, this scheme as such might not lead to any compression at all (e.g., in dense graphs) because all $N$ nodes might be in these $K_{0}P$ -hop neighborhoods. In other words, the proposed scheme is more useful for sparse graphs or with small $P$ .

VI Finite Data Records

So far to recover the graph second-order statistics we have assumed that the true compressed covariance matrix ${\boldsymbol{R}}_{\boldsymbol{y}}=\mathbb{E}\{{\boldsymbol{y}}{\boldsymbol{y}}^{H}\}\in\mathbb{C}^{K\times K}$ is available. However, in practice we only have a finite number of snapshots, call it ${N_{s}}$ , available. Suppose we observe ${N_{s}}$ subsampled graph signals denoted by the vectors $\{{\boldsymbol{y}}[k]\}_{k=1}^{{Ns}}$ , and they are collected in a $K\times{N_{s}}$ matrix ${\boldsymbol{Y}}:=\left[{\boldsymbol{y}}[1],{\boldsymbol{y}}[2],\ldots,{\boldsymbol{y}}[{N_{s}}]\right]$ . It is common to use the sample data covariance matrix $\widehat{{\boldsymbol{R}}}_{\boldsymbol{y}}=\frac{1}{{N_{s}}}{\boldsymbol{Y}}{\boldsymbol{Y}}^{H}\in\mathbb{C}^{K\times K}$ as an estimate of ${\boldsymbol{R}}_{\boldsymbol{y}}$ . We have seen in Sections III and V that the compressed covariance matrix ${\boldsymbol{R}}_{\boldsymbol{y}}$ has a special (linear) structure and it is parameterized by a small number of parameters ${\boldsymbol{\theta}}$ . In this section, we will provide the least squares estimator, maximum likelihood estimator, and the Cramér-Rao lower bound for this finite data records scenario.

Let us denote the structured matrix ${\boldsymbol{R}}_{\boldsymbol{y}}$ as ${\boldsymbol{R}}_{\boldsymbol{y}}({\boldsymbol{\theta}})$ . Generally, ${\boldsymbol{r}}_{\boldsymbol{y}}={\rm vec}({\boldsymbol{R}}_{\boldsymbol{y}}({\boldsymbol{\theta}}))$ can be expressed as

[TABLE]

where from (10) we have ${\boldsymbol{G}}:=({\boldsymbol{\Phi}}\otimes{\boldsymbol{\Phi}}){\boldsymbol{\Psi}}_{\rm s}$ and ${\boldsymbol{\theta}}:={\boldsymbol{p}}$ for the nonparametric spectral domain approach, from (17) we have ${\boldsymbol{G}}:=({\boldsymbol{\Phi}}\otimes{\boldsymbol{\Phi}}){\boldsymbol{\Psi}}_{\rm MA}$ and ${\boldsymbol{\theta}}:={\boldsymbol{b}}$ for the parametric moving average model, and from (29) we have ${\boldsymbol{G}}:=({\boldsymbol{\Phi}}\otimes{\boldsymbol{\Phi}}_{0}){\boldsymbol{\Psi}}_{\rm AR}$ and ${\boldsymbol{\theta}}:={\boldsymbol{a}}$ for the parametric autoregressive model. Before we present the least squares solution in the next subsection, we recall that, although we perform a linear compression on ${\boldsymbol{R}}_{\boldsymbol{x}}$ as ${\boldsymbol{R}}_{\boldsymbol{y}}={\boldsymbol{\Phi}}{\boldsymbol{R}}_{\boldsymbol{x}}{\boldsymbol{\Phi}}^{T}$ , the linear structure in ${\boldsymbol{R}}_{\boldsymbol{x}}({\boldsymbol{\theta}})$ is maintained in ${\boldsymbol{R}}_{\boldsymbol{y}}({\boldsymbol{\theta}})$ as well, as long as the compression matrix is a valid covariance subsampler.

VI-A Least squares estimator

Under the abstraction in (30), the question now is, how can the estimated covariance matrix $\widehat{{\boldsymbol{r}}}_{\boldsymbol{y}}={\rm vec}(\widehat{{\boldsymbol{R}}}_{\boldsymbol{y}})$ be matched to the true covariance matrix ${\boldsymbol{R}}_{\boldsymbol{y}}$ , which has a linear structure. This can for instance be solved in the least squares sense as

[TABLE]

Therefore, to summarize, the results derived so far in this paper (including estimators and subsampler designs) for infinite data records are also valid for scenarios with finite data records. Furthermore, the above least squares problem may be also solved with a constraint on ${\boldsymbol{\theta}}$ , which leads to a constrained least squares problem [cf. Remarks 1 and 2].

The least squares estimators derived thus far do not assume any data distribution and they are reasonable for any data probability density function. In what follows, we will discuss a special case, where the observations are Gaussian distributed.

VI-B Maximum likelihood estimator and Cramér-Rao bound

Suppose the compressed data consists of realizations from a sequence of independent and identically distributed (i.i.d.) Gaussian random vectors $\{{\boldsymbol{y}}[k]\}_{k=1}^{N_{s}}$ , where for each $k$ , the length- $K$ vector ${\boldsymbol{y}}[k]\thicksim\mathcal{CN}({\boldsymbol{0}},{\boldsymbol{R}}_{\boldsymbol{y}}({\boldsymbol{\theta}}))$ with the (positive definite) covariance matrix ${\boldsymbol{R}}_{\boldsymbol{y}}({\boldsymbol{\theta}})$ being a function of the parameters ${\boldsymbol{\theta}}$ as in (30).

The maximum likelihood estimate of ${\boldsymbol{\theta}}$ given ${\boldsymbol{Y}}$ is obtained by solving the optimization problem

[TABLE]

with log-likelihood function (with terms that depend only on the unknowns) [29, 30]

[TABLE]

where $\nu=1$ if ${\boldsymbol{R}}_{\boldsymbol{y}}$ has complex entries and $\nu=0.5$ if ${\boldsymbol{R}}_{\boldsymbol{y}}$ has real entries.

The maximum likelihood estimate of ${\boldsymbol{\theta}}$ can then be computed by setting the derivative of $l({\boldsymbol{Y}};{\boldsymbol{\theta}})$ with respect to ${\boldsymbol{\theta}}$ to zero, and it is the solution to the regression equation [30]:

[TABLE]

where ${\boldsymbol{g}}_{i}$ is the $i$ th column of ${\boldsymbol{G}}$ . The above equations must be solved iteratively using algorithms provided in [31, 19, 29, 32]. The above equations would hold, if ${\boldsymbol{r}}_{\boldsymbol{y}}=\widehat{{\boldsymbol{r}}}_{\boldsymbol{y}}$ . The solution (31) approximates ${\boldsymbol{r}}_{\boldsymbol{y}}\approx\widehat{{\boldsymbol{r}}}_{\boldsymbol{y}}$ , in the least squares sense. Also, from (32), we can recognize that the maximum likelihood estimator reduces to a weighted least squares problem

[TABLE]

with weighting matrix ${\boldsymbol{C}}_{w}={\nu N_{s}}({\boldsymbol{R}}_{\boldsymbol{y}}^{-T}({\boldsymbol{\theta}})\otimes{\boldsymbol{R}}_{\boldsymbol{y}}^{-1}({\boldsymbol{\theta}}))$ . For the weighting matrix, we may use the estimate $\widehat{\boldsymbol{C}}_{w}$ obtained by using $\widehat{{\boldsymbol{R}}}_{\boldsymbol{y}}$ instead of ${\boldsymbol{R}}_{\boldsymbol{y}}$ .

Next, we will provide the Cramér-Rao bound, which is a lower bound on the variance of the developed least squares estimators when the available data records are finite. (Note that this is a bound on the variance of $\widehat{{\boldsymbol{p}}}$ obtained from the nonparametric approach, and the Cramér-Rao bound for the power spectrum estimates from the parametric methods may be derived using transformation of parameters.) The Cramér-Rao bound matrix is the inverse of the Fisher information matrix. The $(i,j)$ th entry of the Fisher information matrix, ${\boldsymbol{F}}$ , is given by [30]

[TABLE]

It can be seen from the expression of the Cramér-Rao bound that the developed least squares estimators ignore the color of the residual, $\widehat{{\boldsymbol{r}}}_{\boldsymbol{y}}-{{\boldsymbol{r}}}_{\boldsymbol{y}}$ , which has a covariance matrix ${\boldsymbol{C}}_{w}^{-1}$ (not scaled identity). This means that the developed estimators are not efficient (i.e., they will not achieve the Cramér-Rao bound), but are computationally cheap as compared to the asymptotically efficient maximum likelihood estimators.

VII Sparse Sampler Design

We have seen so far that the design of the subsampling matrix ${\boldsymbol{\Phi}}$ is crucial for the reconstruction of the graph second-order statistics. From Theorem 1, we know the conditions under which a subsampling matrix will be a valid covariance subsampler, but still it has to be designed. Alternatively, random compression matrices drawn from a certain probability space (e.g., entries of the subsampling matrix are drawn from a Gaussian or Bernoulli distribution) may be used as they almost surely satisfy the conditions in Theorem 1 (see e.g., [33]). However, they might not be practical in the graph setting, because random compression matrices are usually dense in nature, and to compute linear combinations of the uncompressed graph signals they have to be made available at a central location. On the other hand, if we choose a sparse sampling matrix, which essentially does node selection, only the subsampled graph signals (very few samples as compared to the number of nodes) have to be processed. Therefore, in what follows, we will develop an algorithm to design a sparse subsampling matrix.

Consider a structured sparse sampling matrix ${\boldsymbol{\Phi}}\in\{0,1\}^{K\times N}$ , such that the entries of this matrix are determined by a binary sampling vector ${\boldsymbol{w}}$ . More specifically, let us denote the structured subsampling matrix ${\boldsymbol{\Phi}}$ as ${\boldsymbol{\Phi}}({\boldsymbol{w}})={\rm diag_{r}}[{\boldsymbol{w}}]\in\{0,1\}^{K\times N}$ , which is guided by a component selection vector ${\boldsymbol{w}}=[w_{1},\cdots,w_{N}]^{T}\in\{0,1\}^{N}$ , where $w_{i}=1$ indicates that the $i$ th graph node is selected, otherwise it is not selected. That is, ${\boldsymbol{\Phi}}({\boldsymbol{w}})$ essentially performs graph sampling.

VII-A Spectral domain and moving average case

In this subsection, we will design the subsampling matrix for the estimators based on the spectral domain approach [cf. Section III] and the vertex domain parametric moving average model [cf. Section V-A] as the observation matrices in these cases share a common structure. In particular, the aim is to design a full-column rank observation matrix ${{\boldsymbol{G}}}=[{\boldsymbol{\Phi}}({\boldsymbol{w}})\otimes{\boldsymbol{\Phi}}({\boldsymbol{w}})]{\boldsymbol{\Psi}}$ with ${\boldsymbol{\Psi}}:={\boldsymbol{\Psi}}_{\rm s}$ or ${\boldsymbol{\Psi}}:={\boldsymbol{\Psi}}_{\rm MA}$ , so that we can perfectly recover the second-order statistics by observing a reduced set of only $K$ graph nodes. To do this, we assume ${\boldsymbol{\Psi}}$ is perfectly known.

Uniqueness and sensitivity of the least squares solution developed in Sections III and V-A depends on the spectrum (i.e., the set of eigenvalues) of the matrix

[TABLE]

In other words, the performance of least squares is better if the spectrum of the matrix $({\boldsymbol{\Phi}}\otimes{\boldsymbol{\Phi}}){\boldsymbol{\Psi}}$ is more uniform [21]. Thus, a good sparse sampler ${\boldsymbol{w}}$ can be obtained by solving:

[TABLE]

with either $f({\boldsymbol{w}})=-{\rm tr}\{{\boldsymbol{T}}^{-1}({\boldsymbol{w}})\}$ , $f({\boldsymbol{w}})=\lambda_{\rm min}\{{\boldsymbol{T}}({\boldsymbol{w}})\}$ , or $f({\boldsymbol{w}})=\log\det\{{\boldsymbol{T}}({\boldsymbol{w}})\}$ , which tries to balance the spectrum of ${\boldsymbol{T}}({\boldsymbol{w}})$ . Alternatively, the Fisher information matrix (33) can be used instead of ${\boldsymbol{T}}({\boldsymbol{w}})$ to design samplers using techniques discussed in [34].

VII-A1 Convex relaxation

The above Boolean nonconvex problem with any one of the cost functions can be relaxed and solved using convex optimization (e.g., see [34, 35]). To express (34) as a convex optimization problem, we will introduce an auxiliary variable ${\boldsymbol{Z}}={{\boldsymbol{w}}}{{\boldsymbol{w}}}^{T}$ and its related length- $N^{2}$ vector ${\boldsymbol{z}}:={\rm vec}({\boldsymbol{Z}})$ . Since ${\rm diag}[{\boldsymbol{w}}]\otimes{\rm diag}[{\boldsymbol{w}}]={\rm diag}[{\boldsymbol{z}}]$ , we can write $f({\boldsymbol{w}})$ as $f({\boldsymbol{z}})$ , and relaxing (a) Boolean constraints on ${\boldsymbol{w}}$ to the box constraints, (b) the cardinality constraint to an $\ell_{1}$ -norm constraint, and (c) the rank-1 constraint on ${\boldsymbol{Z}}$ , we obtain the following optimization problem

[TABLE]

where ${\boldsymbol{Z}}\succeq{\boldsymbol{w}}{\boldsymbol{w}}^{T}$ can be expressed as a linear matrix inequality that is linear in ${\boldsymbol{w}}$ .

VII-A2 Submodular greedy optimization

Due to the involved complexity of solving the convex relaxed problem (LABEL:eq:cvx_case1) and keeping in mind the large scale problems that arise in the graph setting, we will now focus on the optimization problem (34) with $f({\boldsymbol{w}})=\log\det\{{\boldsymbol{T}}({\boldsymbol{w}})\}$ as it can be solved near-optimally using a low-complexity greedy algorithm.

Let us define an index set $\mathcal{X}$ that is related to the component selection vector ${\boldsymbol{w}}$ as $\mathcal{X}=\{m\,|\,w_{m}=1,m=1,\ldots,N\},$ where $\mathcal{X}\subseteq\mathcal{N}$ with $\mathcal{N}=\{1,\ldots,N\}$ . We can now express the cost function $f({\boldsymbol{w}})=\log\det\{{\boldsymbol{T}}({\boldsymbol{w}})\}$ equivalently as the set function given by

[TABLE]

where the length- $N^{2}$ column vectors $\{{\boldsymbol{\psi}}_{1,1},{\boldsymbol{\psi}}_{1,2},\cdots,{\boldsymbol{\psi}}_{N,N}\}$ are used to form the rows of ${\boldsymbol{\Psi}}$ as ${\boldsymbol{\Psi}}=[{\boldsymbol{\psi}}_{1,1},{\boldsymbol{\psi}}_{1,2},\cdots,{\boldsymbol{\psi}}_{N,N}]^{T}$ . We use such an indexing because the sampling matrix ${\boldsymbol{\Phi}}\otimes{\boldsymbol{\Phi}}$ results in a structured (row) subset selection. The notation $\sum\nolimits_{(i,j)}$ denotes the double summation; As an example, for $\mathcal{X}=\{1,2\}$ , we have $\sum\nolimits_{(i,j)\in\mathcal{X}\times\mathcal{X}}{\boldsymbol{\psi}}_{i,j}={\boldsymbol{\psi}}_{1,1}+{\boldsymbol{\psi}}_{1,2}+{\boldsymbol{\psi}}_{2,1}+{\boldsymbol{\psi}}_{2,2}$ .

Submodularity —a notion based on the property of diminishing returns, is useful for solving discrete combinatorial optimization problems of the form (34) (see e.g., [36]). Submodularity can be formally defined as follows.

Definition 4 (Submodular function).

Given two sets $\mathcal{X}$ and $\mathcal{Y}$ such that for every $\mathcal{X}\subseteq\mathcal{Y}\subseteq\mathcal{N}$ and $s\in\mathcal{N}\backslash\mathcal{Y}$ , the set function $f:2^{N}\rightarrow\mathbb{R}$ defined on the subsets of $\mathcal{N}$ is said to be submodular, if it satisfies

[TABLE]

Suppose the submodular function is monotone nondecreasing, i.e., $f(\mathcal{X})$ $\leq f(\mathcal{Y})$ for all $\mathcal{X}\subseteq\mathcal{Y}\subseteq\mathcal{N}$ and normalized, i.e., $f(\emptyset)=0$ , then a greedy maximization of such a function as summarized in Algorithm 1 is near optimal with an approximation factor of $(1-1/e)$ , where $e$ is Euler’s number [37]. In other words, we can achieve

[TABLE]

where $f({\rm opt})$ is the optimal value of the problem

[TABLE]

In order to have a non-empty input set $f(\emptyset)=0$ , the cost function (36) is slightly modified with a diagonal loading, and it satisfies the above properties as stated in the following theorem.

Theorem 2.

The set function $f:2^{N}\rightarrow\mathbb{R}$ given by

[TABLE]

is a normalized, nonnegative monotone, submodular function on the set $\mathcal{X}\subset\mathcal{N}$ . Here, $\epsilon>0$ is a small constant.

In (37), $\epsilon{\boldsymbol{I}}$ is needed to carry out the first few iterations of Algorithm 1 and $-N\log\epsilon$ ensures that $f(\emptyset)$ is zero. Using the result from [38] that the set function $g:2^{N}\rightarrow\mathbb{R}$ , given by

[TABLE]

with column vectors $\{{\boldsymbol{a}}_{i}\}_{i=1}^{N}$ is a normalized, nonnegative monotone, submodular function on the set $\mathcal{X}\subseteq\mathcal{N}$ , we can prove Theorem 2. Therefore, the solution based on the greedy algorithm summarized in Algorithm 1 results in a $(1-1/e)$ optimal solution for (34). Note that the number of summands in (38) and (37), is respectively, $|\mathcal{X}|$ and $|\mathcal{X}|^{2}$ . It is worth mentioning that the greedy algorithm is linear in $K$ , while computing (37) remains the dominating cost.

Other submodular functions that promote full-column rank model matrices, e.g., the frame potential [39] defined as $f({\boldsymbol{w}})={\rm tr}\{{\boldsymbol{T}}^{H}({\boldsymbol{w}}){\boldsymbol{T}}({\boldsymbol{w}})\}$ , are also reasonable costs to optimize. Finally, random subsampling (i.e., ${\boldsymbol{w}}$ has random [math] or $1$ entries) is not suitable as it might not always result in a full-column rank model matrix.

VII-B Autoregressive case

The subsampling matrix for the spectral domain and moving average approaches can be designed offline as the observation matrix ${\boldsymbol{\Psi}}$ was not depending on the data, but it depends only on the graphical model (i.e, either ${\boldsymbol{U}}$ or ${\boldsymbol{S}}$ ). In contrast, an optimal offline subsampler design for the autoregressive case is not possible due to the fact that the observation matrix depends on the data, and to choose the best subset of nodes requires observations from all the nodes. This is the side effect of modeling the graph autoregressive signal as (21) to arrive at an elegant linear estimator.

Nevertheless, suppose the second-order statistics are available, e.g., from training data, estimated from subsampled observations using the nonparametric or moving average approach (where the sampler is designed using Algorithm 1 as discussed in Section VII-A), or by approximating the second-order statistics with white noise, then a suboptimal sampler can be designed with techniques similar as those in Section VII-A.

Alternatively, if a high-complexity non-linear estimator can be afforded, then by modeling the graph autoregressive process using (19), the dependence of the observation matrix on the data can be avoided [cf. (20)]. In that case, the subsampler can be designed offline using techniques in [34, 40].

We underline that the algorithms provided here to design sparse samplers for different cases can also be used to design mean squared error optimal sparse samplers for the compressive covariance sensing framework [18, 19, 20]. In other words, although minimal sparse rulers satisfy the identifiability conditions to reconstruct the second-order statistics of stationary time-series, the algorithms provided in this paper are needed to guarantee a desired reconstruction performance.

VIII Numerical Experiments

The developed framework of sampling on graphs for power spectrum estimation is illustrated with numerical experiments111Software and datasets to reproduce results of this paper can be downloaded from http://cas.et.tudelft.nl/~sundeep/sw/jstsp16gpsd.zip on synthetic as well as real datasets as discussed next.

Synthetic data (random graph)

For experiments using synthetic data, a random sensor graph with $N=100$ nodes is generated using the GSPBOX [41]. The generated graph topology can be seen in Figure 2, where the colored nodes represent the value of the graph signal for one realization. Graph stationary signals are generated by graph filtering zero-mean unit-variance white noise with a filter, which has a squared magnitude frequency response as shown in Figure 3(a) (labeled as “True graph power spectrum”); such a frequency response can be, for instance, approximated using a filter with $L=7$ coefficients. For the shift operator, we use the graph Laplacian matrix. We use $N_{s}=1000$ snapshots to form a sample covariance matrix, which we use in the experiments.

For the non-parametric model, using Algorithm 1, we first design the subsampler by selecting rows of the matrix ${\boldsymbol{\Psi}}_{\rm s}$ in a structured manner determined by ${\boldsymbol{w}}$ . We show in Figure 3(a), that the least squares estimate of the graph power spectrum obtained by observing $K=50$ out of $N=100$ nodes ( $50\%$ compression) fits reasonably well to the true power spectrum. In Figure 2(a), the selected graph nodes are indicated with a black circle. However, no particular sampling pattern can be seen here.

For the parametric moving average model, recall that the graph power spectrum is parameterized with $Q$ parameters; we use $Q=13$ in this example. As before, we perform a row subset selection of the matrix ${\boldsymbol{\Psi}}_{\rm MA}$ in a structured manner using Algorithm 1. We show in Figure 3(a), the (unconstrained) least squares estimate of the graph power spectrum computed using observations from $K=26$ nodes out of $N=100$ nodes ( $74\%$ compression). The sampling pattern in this case is shown in Figure 2(b). It can be seen that the greedy algorithm selects graph nodes in a clustered manner as the moving average model assumes that the power spectrum is smooth.

For the parametric autoregressive approach, the graph power spectrum is parameterized with $P=3$ parameters. In this case, we choose $K_{0}=1$ graph node (indicated with a red circle) having the largest degree and we also observe nodes in the $3$ -hop neighborhood of the selected node; the observed nodes (indicated with black circles) are shown in Figure 2(c). In this example, we observe $K=26$ nodes out $N=100$ nodes to reconstruct the graph power spectrum. The least squares estimate of the G-AR power spectrum can be seen in Figure 3(a). Although we had to recover only $P=3$ parameters, we observe all the nodes in the $P$ -hop neighborhood of every selected node (i.e., we observe much more than $K_{0}P$ nodes).

In Figure 3, we also provide some performance results based on the synthetic dataset. In particular, we show for different number of snapshots the performance of the estimators in terms of the normalized mean squared error (NMSE) defined in dB as ${\rm NMSE}=10\log_{10}\,\sum_{m=1}^{\rm N_{\rm exp}}\|{\boldsymbol{p}}-\widehat{{\boldsymbol{p}}}_{m}\|_{2}^{2}/(N_{\rm exp}\|{\boldsymbol{p}}\|_{2}),$ where $\widehat{{\boldsymbol{p}}}_{m}$ denotes the graph power spectrum estimate during the $m$ th Monte-Carlo experiment and $N_{\rm exp}$ is the number of Monte-Carlo experiments. Here, we use $N_{\rm exp}=1000$ .

To begin with, Figure 3(b) shows the performance of the developed least squares estimator for the nonparametric approach with $K=50$ (50 $\%$ compression), and with $K=100$ , i.e., no compression. For this example, we can see about a $4$ dB performance loss due to compression, and this gap reduces as $K$ increases. Furthermore, we can also see that, although the least squares estimator has the same slope as that of the Cramér-Rao lower bound (labeled as “CRLB (50 $\%$ compression)”), it does not achieve the Cramér-Rao lower bound. This gap can be reduced by solving a weighted least squares estimator, but incurs an additional computational cost due to inverting and updating the weighting matrix. For this particular scenario, although a full-column rank matrix $({\boldsymbol{\Phi}}\otimes{\boldsymbol{\Phi}}){\boldsymbol{\Psi}}_{\rm s}$ can be obtained for $K\geq 20$ , but $K=20$ results in a very poor performance as ${\boldsymbol{\Psi}}_{\rm s}$ is highly sensitive to perturbations due to the finite sample effects. Nevertheless, the performance improves with the number of snapshots.

In Figure 3(c), we can see the performance of the moving average approach for $Q=13$ , for $K=10$ (90 $\%$ compression, which is also the maximum possible compression for this example), $K=26$ (74 $\%$ compression) and $K=100$ (i.e., no compression). As before, we see a performance loss due to compression, but also, as the number of snapshots increases, the performance saturates. This is due to the limited filter order, and the performance gets better with increasing filter order. However, increasing the filter order worsens the condition number of ${\boldsymbol{\Psi}}_{\rm MA}$ , and we might have to resort to singular value decomposition based techniques to solve the least squares problem (now we simply solve (31) using QR factorization technique through MATLAB’s backslash “\” operator). For this example, a full-column rank matrix $({\boldsymbol{\Phi}}\otimes{\boldsymbol{\Phi}}){\boldsymbol{\Psi}}_{\rm MA}$ is obtained for $K\geq 10$ . Such a high compression is possible because of the low value of $Q$ that is assumed to be known. Also, as compared to the non-parametric model, due to a smaller filter order, ${\boldsymbol{\Psi}}_{\rm MA}$ is less sensitive to perturbations. This can be see in Figure 3(c), where we get a reasonable performance for the maximum possible compression with $K=10$ .

Finally, in Figure 3(d), we show the performance of the autoregressive model for $P=3$ with $K=1$ and $K=100$ , and for $P=6$ with $K=100$ we solve (23) using least squares. Although we can see a similar behavior with respect to the performance loss due to compression and with respect to the error saturation due to a limited filter order, a more important thing to notice is that the autoregressive model has a similar performance as that of the moving average model, but with about $50\%$ fewer parameters.

Synthetic dataset (circulant graph)

We illustrate the graph sampling theory developed for circulant graphs using a Möbius ladder, which due to its structure finds applications within molecular chemistry (e.g., see [42]). A Möbius ladder with $N=80$ nodes is shown in Figure 4(a). This graph has a circulant adjacency matrix, which we use as the shift operator.

We have seen in Section IV that for such circulant graphs it is possible to elegantly compute the optimal sparse samplers. For $N=80$ , the minimal sparse rulers are length $K=15$ and one such (non-unique) sampling set is given by $\mathcal{K}=\{1,2,3,6,11,16,27,38,49,60,66,72,78,79,80\}$ ; see the corresponding selected nodes in Figure 4(a). Alternatively, we can also determine the sampling set using Algorithm 1; we show the selected nodes in Figure 4(b). Now, the question is, how does this greedily designed sparse sampler compare with the minimal sparse ruler. To answer this, we plot, in Figure 4(c) the singular values (i.e., the spectrum) of ${\boldsymbol{T}}({\boldsymbol{w}})={\boldsymbol{\Psi}}_{\rm s}^{T}({\rm diag}[{\boldsymbol{w}}]\otimes{\rm diag}[{\boldsymbol{w}}]){\boldsymbol{\Psi}}_{\rm s}$ with ${\boldsymbol{w}}$ being the minimal sparse ruler and for ${\boldsymbol{w}}$ computed using the greedy submodular design. For this example, we can see the resulting spectrum from both the sparse samplers are very similar, and that the greedy submodular design has a slightly worse condition number (i.e., the ratio of maximal singular value to minimal singular value).

Real dataset (climatology)

For the real dataset, we use temperature measurements collected across $32$ different weather stations in the French region of Brittany222This dataset was used in the context of stationary graph signal processing in [9, 10]. Also, we would like thank the authors of [10] for making this as well as the USPS (preprocessed) datasets public.. A nearest neighbor graph is constructed as in [10] using the available coordinates of the weather station such that each node has at least five neighbours. The reconstructed graph can be seen in Figure 5. Alternatively, the method suggested in [43] can be used to construct a sparse graph based on training data. There are $N_{s}=744$ observations (for 31 days and 24 observations per day) per weather station available. We use the adjacency matrix as the shift operator in this example.

We have removed the (sample) mean from each station independently, thus forcing the first moment to zero [10]. This way we artificially obtain ${\boldsymbol{m}}_{\boldsymbol{x}}=m_{\boldsymbol{x}}{\boldsymbol{u}}_{1}$ with $m_{\boldsymbol{x}}=0$ . After removing the mean, the temperature data records are nearly stationary on this graph, i.e., the sample covariance matrix (denoted by $\widehat{{\boldsymbol{R}}}_{\boldsymbol{x}}$ ) in the graph spectral domain (i.e., the spectral covariance matrix ${\boldsymbol{U}}\widehat{{\boldsymbol{R}}}_{\boldsymbol{x}}{\boldsymbol{U}}$ ) has most of its energy, i.e., about $89\%$ of the energy of ${\boldsymbol{U}}\widehat{{\boldsymbol{R}}}_{\boldsymbol{x}}{\boldsymbol{U}}$ , along the main diagonal; see the spectral covariance in Figure 5(d). The stationarity of this dataset on the shift operator increases when processing the so-called intrinsic mode functions of the temperature recordings instead of the raw data as detailed in [12], but we will simply use the mean-removed raw dataset here.

We carry out the same experiments as for the synthetic data. For the non-parametric and moving average approaches, the samplers are designed using a greedy algorithm as discussed in Section VII-A. In particular, for the non-parametric approach, we observe $K=20$ nodes out of $N=32$ nodes as shown with black circles in Figure 5(a). For the moving average approach, we use $Q=11$ , and observe $K=20$ out of $N=32$ nodes to recover the G-MA parameters. Finally, for the autoregressive approach, we model the graph power spectrum with $P=1$ scalar parameter. We select one node (i.e., $K_{0}=1$ ) that has the largest degree as indicated with a red circle in Figure 5(c), and we also observe nodes in the one-hop neighborhood of the selected node. So, we observe 9 nodes in total in this case. The uncompressed graph power spectrum computed from all the available temperature measurements as well as the least squares estimate of the graph power spectrum computed from the subsampled observations using the non-parametric and parametric approaches can be seen in Figure 5(e), where we can see that the shape of estimated power spectrum from different approaches is similar to that of the empirical graph power spectrum.

Real dataset (USPS handwritten digits)

Before concluding, we will demonstrate the potential of parametric modeling as well as sampling in the graph setting with an example using the USPS dataset, where we will focus only on digit 3 for the sake of illustration. We construct a 20 nearest neighbor graph with 50 images each containing $16\times 16$ pixels as in [10]. This means that the graph signal ${{\boldsymbol{x}}}$ is of length $256$ , where each pixel corresponds to a graph node, and the covariance matrix ${\boldsymbol{R}}_{\boldsymbol{x}}$ is of size $256\times 256$ . The stationarity of this dataset on such a graph has been demonstrated in [10]; see the diagonal dominance (with about $82\%$ of the energy in the diagonal entries) of the spectral covariance matrix in Figure 6(a).

We have seen in Section V that it is possible to model the graph power spectrum with fewer parameters, which means that (a) we need to store or transmit only a few parameters, and (b) we can achieve stronger compression rates. To illustrate this, we perform an experiment, where we view digit 3 of the USPS dataset as a realization of a graph second-order stationary signal obtained by graph filtering white noise using a graph moving average filter with $Q=7$ . In Figure 6(b), we show the empirical graph power spectrum computed from $50$ images and the graph power spectrum computed using the moving average method by sampling only $K=15$ pixels (96 $\%$ compression) as well as $K=256$ (i.e., no compression). That is to say, we can quickly learn the parameters of interest without visiting the entire training set. Next, based on the reconstructed graph power spectrum obtained by sampling $K=15$ pixels, we generate $25$ realizations of graph signals by graph filtering white noise, where the frequency response of the graph filter is simply computed as $h_{f,n}=|p_{n}|^{1/2}$ for $n=1,\ldots,N$ (here, we use the absolute value because we do not solve (31) with a nonnegativity constraint). These 25 realizations are shown in Figure 6(c), where we can see that the resulting signals have the shape of digit 3 corroborating that the signal is stationary on the nearest neighbor graph, and more importantly these signals can be generated from fewer parameters, which are estimated by observing only a small subset of pixels.

IX Concluding Remarks

In this paper we have focused on sampling and reconstructing the second-order statistics of stationary graph signals. The main contribution of the paper is that by observing a significantly smaller subset of vertices and using simple least squares estimators, we can reconstruct the second-order statistics of the graph signal from the subsampled observations, and more importantly, without any spectral priors. The results provided here generalize the compressive covariance sensing framework to the graph setting. Both a nonparametric approach as well as parametric approaches including moving average and autoregressive models for the graph power spectrum are discussed. A near-optimal low-complexity greedy algorithm is developed to design a sparse sampling matrix that selects the subset of graph nodes.

Appendix A Lemma 1: Rank of self Khatri-Rao products

By the definition in (1), ${\boldsymbol{U}}$ forms an orthogonal basis and hence full rank. As a result, the sum $a_{1}{\boldsymbol{u}}_{1}+a_{2}{\boldsymbol{u}}_{2}+\cdots+a_{N}{\boldsymbol{u}}_{N}$ equals zero only when $a_{1}=a_{2}=\cdots=a_{N}=0$ .

The remainder of the proof is based on contradiction. Assume that the matrix $\bar{{\boldsymbol{U}}}\circ{\boldsymbol{U}}=[\bar{{\boldsymbol{u}}}_{1}\otimes{\boldsymbol{u}}_{1},\cdots,\bar{{\boldsymbol{u}}}_{N}\otimes{\boldsymbol{u}}_{N}]$ does not have full column rank. This means that the sum

[TABLE]

when one or more $b_{i}\bar{u}_{i,j}$ are nonzero. This is possible only if ${\boldsymbol{U}}$ is singular. Hence a contradiction, implying that ${\rm rank}(\bar{{\boldsymbol{U}}}\circ{\boldsymbol{U}})=N$ .

Appendix B Theorem 1: Conditions for a Valid Sampler

The rank of the product of two matrices ${\boldsymbol{A}}$ and ${\boldsymbol{B}}$ is given by [44] ${\rm rank}({\boldsymbol{A}}{\boldsymbol{B}})\leq\min\{{\rm rank}({\boldsymbol{A}}),{\rm rank}({\boldsymbol{B}})\},$ and equality holds if and only if ${\rm null}({\boldsymbol{A}})\cap{\rm ran}({\boldsymbol{B}})=\emptyset$ .

We know from Lemma 2 that ${\rm rank}({\boldsymbol{\Phi}}\otimes{\boldsymbol{\Phi}})$ is $K^{2}$ if ${\rm rank}({\boldsymbol{\Phi}})=K$ and from Lemma 1 that ${\boldsymbol{\Psi}}_{\rm s}$ has full column rank. This implies that if $K^{2}\geq N$ , then $({\boldsymbol{\Phi}}\otimes{\boldsymbol{\Phi}}){\boldsymbol{\Psi}}_{\rm s}$ has full column rank provided that the null space of ${\boldsymbol{\Phi}}\otimes{\boldsymbol{\Phi}}$ (which is generated by the basis vectors in the null space of ${\boldsymbol{\Phi}}$ ) does not intersect with the space spanned by the columns of ${\boldsymbol{\Psi}}_{\rm s}$ .

Bibliography44

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] S. P. Chepuri and G. Leus, “Subsampling for graph power spectrum estimation,” in IEEE Sensor Array and Multichannel Signal Processing Workshop (SAM) , Rio de Janeiro, Brazil, July 2016.
2[2] A.-L. Barabasi and Z. N. Oltvai, “Network biology: understanding the cell’s functional organization,” Nature reviews genetics , vol. 5, no. 2, pp. 101–113, 2004.
3[3] E. Bullmore and O. Sporns, “Complex brain networks: graph theoretical analysis of structural and functional systems,” Nature Reviews Neuroscience , vol. 10, no. 3, pp. 186–198, 2009.
4[4] R. Guimera, S. Mossa, A. Turtschi, and L. N. Amaral, “The worldwide air transportation network: Anomalous centrality, community structure, and cities’ global roles,” Proc. of the National Acad. of Sciences , vol. 102, no. 22, pp. 7794–7799, 2005.
5[5] M. O. Jackson, Social and economic networks . Princeton university press, 2010.
6[6] D. I. Shuman, S. K. Narang, P. Frossard, A. Ortega, and P. Vandergheynst, “The emerging field of signal processing on graphs: Extending high-dimensional data analysis to networks and other irregular domains,” IEEE Signal Process. Mag. , vol. 30, no. 3, pp. 83–98, 2013.
7[7] A. Sandryhaila and J. M. Moura, “Big data analysis with signal processing on graphs: Representation and processing of massive data sets with irregular structure,” IEEE Signal Process. Mag. , vol. 31, no. 5, pp. 80–90, 2014.
8[8] ——, “Discrete signal processing on graphs,” IEEE Trans. Signal Process. , vol. 61, no. 7, pp. 1644–1656, 2013.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Graph Sampling for Covariance Estimation

Abstract

I Introduction

I-A Related works and main results

I-B Outline and notation

II Preliminaries

II-A Graph signals and filtering

II-B Stationary graph signals

Definition 1** (Second-order stationarity).**

Definition 2** (Graph power spectrum).**

III Non-parametric Spectral Domain Approach

Problem**.**

Definition 3**.**

Lemma 1**.**

Proof.

Lemma 2**.**

Proof.

Theorem 1**.**

Proof.

Remark 1** (Spectral priors).**

IV Circulant Graphs

V Parameteric Models

V-A Graph moving average models

Corollary 1**.**

Proof.

Remark 2** (Constrained least squares).**

V-B Graph autoregressive models

VI Finite Data Records

VI-A Least squares estimator

VI-B Maximum likelihood estimator and Cramér-Rao bound

VII Sparse Sampler Design

VII-A Spectral domain and moving average case

VII-A1 Convex relaxation

VII-A2 Submodular greedy optimization

Definition 4** (Submodular function).**

Theorem 2**.**

VII-B Autoregressive case

VIII Numerical Experiments

Synthetic data (random graph)

Synthetic dataset (circulant graph)

Real dataset (climatology)

Real dataset (USPS handwritten digits)

IX Concluding Remarks

Appendix A Lemma 1: Rank of self Khatri-Rao products

Appendix B Theorem 1: Conditions for a Valid Sampler

Definition 1 (Second-order stationarity).

Definition 2 (Graph power spectrum).

Problem.

Definition 3.

Lemma 1.

Lemma 2.

Theorem 1.

Remark 1 (Spectral priors).

Corollary 1.

Remark 2 (Constrained least squares).

Definition 4 (Submodular function).

Theorem 2.