Network Weight Estimation for Binary-Valued Observation Models

Yu Xing; Xingkang He; Haitao Fang; Karl Henrik Johansson

arXiv:1903.07350·cs.SY·March 19, 2019

Network Weight Estimation for Binary-Valued Observation Models

Yu Xing, Xingkang He, Haitao Fang, Karl Henrik Johansson

PDF

Open Access

TL;DR

This paper introduces a recursive estimation algorithm for network weights in systems with binary, quantized observations, addressing challenges posed by unknown quantization and system coupling, with proven consistency and applicability to real-time tasks.

Contribution

It presents a novel recursive estimation method using stochastic approximation for systems with binary observations, ensuring strong consistency and handling unknown quantization effects.

Findings

01

The proposed algorithm is strongly consistent.

02

The objective function is strictly concave with a unique maximum.

03

Applicable to online real-time decision-making and surveillance.

Abstract

This paper studies the estimation of network weights for a class of systems with binary-valued observations. In these systems only quantized observations are available for the network estimation. Furthermore, system states are coupled with observations, and the quantization parts are unknown inherent components, which hinder the design of inputs and quantizers. To fulfill the estimation, we propose a recursive algorithm based on stochastic approximation techniques. More precisely, to deal with the temporal dependency of observations and achieve the recursive estimation of network weights, a deterministic objective function is constructed based on the likelihood function by extending the dimension of observations and applying ergodic properties of Markov chains. It is shown that this function is strictly concave and has unique maximum identical to the true parameter vector. Finally, the…

Figures3

Click any figure to enlarge with its caption.

Equations78

Y_{t}

Y_{t}

S_{t}

\displaystyle\quad~{}P\left\{\tilde{S}_{t+1}=\begin{pmatrix}\bm{s}_{t+1}\\ \bm{s}_{t}\end{pmatrix}\Big{|}\tilde{S}_{t}=\begin{pmatrix}\bm{s}_{t}\\ \bm{s}_{t-1}\end{pmatrix}\right\}

\displaystyle\quad~{}P\left\{\tilde{S}_{t+1}=\begin{pmatrix}\bm{s}_{t+1}\\ \bm{s}_{t}\end{pmatrix}\Big{|}\tilde{S}_{t}=\begin{pmatrix}\bm{s}_{t}\\ \bm{s}_{t-1}\end{pmatrix}\right\}

= P {S_{t + 1} = s_{t + 1} ∣ S_{t} = s_{t}} .

P\left\{\tilde{S}_{t+1}=\begin{pmatrix}\bm{s}\\ \bm{u}\end{pmatrix}\Big{|}\tilde{S}_{t}=\begin{pmatrix}\bm{x}\\ \bm{y}\end{pmatrix}\right\}>0,

P\left\{\tilde{S}_{t+1}=\begin{pmatrix}\bm{s}\\ \bm{u}\end{pmatrix}\Big{|}\tilde{S}_{t}=\begin{pmatrix}\bm{x}\\ \bm{y}\end{pmatrix}\right\}>0,

P {\overset{ˉ}{S} = \overset{ˉ}{s} ∣ S = s} = P {S_{1} = \overset{ˉ}{s} ∣ S_{0} = s},

P {\overset{ˉ}{S} = \overset{ˉ}{s} ∣ S = s} = P {S_{1} = \overset{ˉ}{s} ∣ S_{0} = s},

\tilde{Y}_{t}

\tilde{Y}_{t}

S_{t}

l (n; θ)

l (n; θ)

= lo g P {S_{t} = s^{t}, 0 \leq t \leq T}

= lo g 1 \leq t \leq T \prod P {S_{t} = s^{t} ∣ S_{t - 1} = s^{t - 1}} P {S_{0} = s^{0}}

= lo g P {S_{0} = s^{0}} + 1 \leq t \leq T \sum lo g P {S_{t} = s^{t} ∣ S_{t - 1} = s^{t - 1}}

= lo g P {S_{0} = s^{0}} + 1 \leq t \leq T \sum 1 \leq i \leq n \sum lo g g_{i} (\tilde{s}^{t} ∣ θ^{(i)}),

T \to \infty lim \frac{1}{T} \sum_{1 \leq t \leq T} 1 \leq i \leq n \sum lo g g_{i} (\tilde{S}_{t} ∣ θ^{(i)})

T \to \infty lim \frac{1}{T} \sum_{1 \leq t \leq T} 1 \leq i \leq n \sum lo g g_{i} (\tilde{S}_{t} ∣ θ^{(i)})

\displaystyle=E\bigg{\{}\sum\nolimits_{1\leq i\leq n}\log g_{i}(\tilde{S}|\theta^{(i)})\bigg{\}},

T \to \infty lim \frac{1}{T} \sum_{1 \leq t \leq T} 1 \leq i \leq n \sum \nabla_{θ^{(i)}} lo g g_{i} (\tilde{S}_{t} ∣ θ^{(i)})

T \to \infty lim \frac{1}{T} \sum_{1 \leq t \leq T} 1 \leq i \leq n \sum \nabla_{θ^{(i)}} lo g g_{i} (\tilde{S}_{t} ∣ θ^{(i)})

\displaystyle=E\bigg{\{}\sum\nolimits_{1\leq i\leq n}\nabla_{\theta^{(i)}}\log g_{i}(\tilde{S}|\theta^{(i)})\bigg{\}},

E\bigg{\{}\sum\nolimits_{1\leq i\leq n}\log g_{i}(\tilde{S}|\theta^{(i)})\bigg{\}}=\sum\nolimits_{1\leq i\leq n}E\bigg{\{}\log g_{i}(\tilde{S}|\theta^{(i)})\bigg{\}}

E\bigg{\{}\sum\nolimits_{1\leq i\leq n}\log g_{i}(\tilde{S}|\theta^{(i)})\bigg{\}}=\sum\nolimits_{1\leq i\leq n}E\bigg{\{}\log g_{i}(\tilde{S}|\theta^{(i)})\bigg{\}}

K_{i} (θ^{(i)}, \tilde{S}_{t + 1}) := \nabla_{θ^{(i)}} lo g g_{i} (\tilde{S}_{t + 1} ∣ θ^{(i)}),

K_{i} (θ^{(i)}, \tilde{S}_{t + 1}) := \nabla_{θ^{(i)}} lo g g_{i} (\tilde{S}_{t + 1} ∣ θ^{(i)}),

K (θ, \tilde{S}_{t + 1}) := (K_{1} (θ^{(1)}, \tilde{S}_{t + 1})^{T}, \dots, K_{n} (θ^{(n)}, \tilde{S}_{t + 1})^{T})^{T},

K (θ, \tilde{S}_{t + 1}) := (K_{1} (θ^{(1)}, \tilde{S}_{t + 1})^{T}, \dots, K_{n} (θ^{(n)}, \tilde{S}_{t + 1})^{T})^{T},

θ_{t + 1} = θ_{t} + a_{t} K (θ_{t}, \tilde{S}_{t + 1}),

θ_{t + 1} = θ_{t} + a_{t} K (θ_{t}, \tilde{S}_{t + 1}),

\tilde{A} = .220 .147 0 .090 .120 .215 0 .178 .360 .344 1 .446 .300 .294 0 .286 .

\tilde{A} = .220 .147 0 .090 .120 .215 0 .178 .360 .344 1 .446 .300 .294 0 .286 .

P {S_{1} = s ∣ S_{0} = u}

P {S_{1} = s ∣ S_{0} = u}

= P {A_{i} S_{0} + D_{1, i} > c_{i}, \forall i s.t s_{i} = 1, A_{j} S_{0} + D_{1, j} \leq c_{j},

\forall j s.t. s_{j} = 0∣ S_{0} = u}

= P {A_{i} u + D_{1, i} > c_{i}, \forall i s.t s_{i} = 1, A_{j} u + D_{1, j} \leq c_{j},

\forall j s.t. s_{j} = 0∣ S_{0} = u}

= P {A_{i} u + D_{1, i} > c_{i}, \forall i s.t s_{i} = 1, A_{j} u + D_{1, j} \leq c_{j},

\forall j s.t. s_{j} = 0}

= 1 \leq i \leq n \prod (1 - F (c_{i} - A_{i} u))^{s_{i}} F (c_{i} - A_{i} u)^{1 - s_{i}} > 0,

P {\tilde{S} = (\overset{ˉ}{s}^{T} s^{T})^{T}} = \tilde{s} \in {0, 1}^{2 n} \sum P {\tilde{S} = \tilde{s}} \tilde{P} (\tilde{s}, (\overset{ˉ}{s}^{T} s^{T})^{T}) .

P {\tilde{S} = (\overset{ˉ}{s}^{T} s^{T})^{T}} = \tilde{s} \in {0, 1}^{2 n} \sum P {\tilde{S} = \tilde{s}} \tilde{P} (\tilde{s}, (\overset{ˉ}{s}^{T} s^{T})^{T}) .

S_{1} := {\tilde{s} \in {0, 1}^{2 n} : the first n entries of \tilde{s} are s},

S_{1} := {\tilde{s} \in {0, 1}^{2 n} : the first n entries of \tilde{s} are s},

P {\tilde{S} = (\overset{ˉ}{s}^{T} s^{T})^{T}} = \tilde{s} \in S_{1} \sum P {\tilde{S} = \tilde{s}} \tilde{P} (\tilde{s}, (\overset{ˉ}{s}^{T} s^{T})^{T}) .

P {\tilde{S} = (\overset{ˉ}{s}^{T} s^{T})^{T}} = \tilde{s} \in S_{1} \sum P {\tilde{S} = \tilde{s}} \tilde{P} (\tilde{s}, (\overset{ˉ}{s}^{T} s^{T})^{T}) .

P {S = s}

P {S = s}

= \tilde{s}^{1} \in S_{1} \sum \tilde{s}^{2} \in S_{2} \sum P {\tilde{S} = \tilde{s}^{1}} \tilde{P} (\tilde{s}^{1}, \tilde{s}^{2}),

S_{2} := {\tilde{s} \in {0, 1}^{2 n} : the last n entries of \tilde{s} are s} .

S_{2} := {\tilde{s} \in {0, 1}^{2 n} : the last n entries of \tilde{s} are s} .

P {\tilde{S} = (\overset{ˉ}{s}^{T} s^{T})^{T}} = \tilde{s} \in S_{1} \sum P {\tilde{S} = \tilde{s}} P (s, \overset{ˉ}{s}),

P {\tilde{S} = (\overset{ˉ}{s}^{T} s^{T})^{T}} = \tilde{s} \in S_{1} \sum P {\tilde{S} = \tilde{s}} P (s, \overset{ˉ}{s}),

P {S = s} = \tilde{s}^{1} \in S_{1} \sum \overset{ˉ}{s}^{2} \in {0, 1}^{n} \sum P {\tilde{S} = \tilde{s}^{1}} P (s, \overset{ˉ}{s}^{2}),

P {\overset{ˉ}{S} = \overset{ˉ}{s} ∣ S = s}

P {\overset{ˉ}{S} = \overset{ˉ}{s} ∣ S = s}

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsOpinion Dynamics and Social Influence · Complex Network Analysis Techniques · Distributed Sensor Networks and Detection Algorithms

Full text

Network Weight Estimation for Binary-Valued Observation Models

Yu Xing, Xingkang He, Haitao Fang, Karl Henrik Johansson This work is supported by National Key R&D Program of China (2016YFB0901900), the National Natural Science Foundation of China (61573345), Knut & Alice Wallenberg foundation of Sweden, and Swedish Research Council.Yu Xing and Haitao Fang are with Key Lab of Systems and Control, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, and School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100049, P. R. China [email protected]; [email protected] He and Karl Henrik Johansson are with Division of Decision and Control Systems, School of Electrical Engineering and Computer Science, KTH Royal Institute of Technology, SE-10044 Stockholm, Sweden [email protected]; [email protected]

Abstract

This paper studies the estimation of network weights for a class of systems with binary-valued observations. In these systems only quantized observations are available for the network estimation. Furthermore, system states are coupled with observations, and the quantization parts are unknown inherent components, which hinder the design of inputs and quantizers. To fulfill the estimation, we propose a recursive algorithm based on stochastic approximation techniques. More precisely, to deal with the temporal dependency of observations and achieve the recursive estimation of network weights, a deterministic objective function is constructed based on the likelihood function by extending the dimension of observations and applying ergodic properties of Markov chains. It is shown that this function is strictly concave and has unique maximum identical to the true parameter vector. Finally, the strong consistency of the algorithm is established. Our recursive algorithm can be applied to online tasks like real-time decision-making and surveillance for networked systems. This work also provides a new scheme for the identification of systems with quantized observations.

I INTRODUCTION

The estimation problem of networks for dynamical systems is fundamental in diverse domains such as bioinformatics, communication, as well as social networks. For example, the knowledge of gene regulatory networks can deepen our understanding of diseases and development [1]. Besides, relationship networks among individuals contain information of group structures, which is crucial for the prediction of group behavior [2]. There are various formulations for the network estimation, e.g., topological inference [3], latent node identification [4], etc. This paper focuses on the first one, and we define networks as weighted graphs.

The estimation of network weights has attracted multidisciplinary attention for the last decades. [3] reviews methods of recovering complex networks from nonlinear dynamics. Also for nonlinear systems, [5] utilizes input design and passivity approach to solve the estimation problem. Network estimation for consensus dynamics is considered in [6], in which the estimation problem is converted to a convex optimization one. Plenty of network estimation methods for opinion dynamics, such as DeGroot and Friedkin-Johnsen models, have also been investigated, such as compressed sensing [2], vector autoregressive processes [7], and least square algorithms [8].

Most existing works concentrate on systems with continuous observations. In practical scenarios, however, agents often present discrete outputs rather than continuous ones [9, 10]. For instance, binary-valued signals may be the only information transmitted and observed in communication networks because of limited storage and bandwidth resources. Therefore, the study of network estimation for systems with quantized observations is necessary. To tackle this challenge, we resort to identification methods for quantized output systems.

The estimation of quantized systems has developed rapidly in recent years. Based on full-rank periodic inputs, [11] introduces the optimal quasi-convex combination estimator. [12] replaces the full-rank periodic inputs assumption by general quantized inputs. Under conditions of sufficiently rich inputs and prior knowledge of parameters, [13, 14] study a recursive projection algorithm for finite impulse response (FIR) systems. Besides, input conditions can be relaxed by designing adaptive quantizers [15, 16]. The Expectation Maximization (EM) algorithms are utilized to solve maximum likelihood estimation (MLE) problems for FIR systems in [17] and for ARX systems in [18], but they are batch algorithms. Finally, [19] investigates recursive identification of systems with binary outputs and ARMA noises by using stochastic approximation (SA) algorithms.

In this paper, we study the estimation of network weights for a class of binary-valued observation systems, which may not allow the design of inputs and quantizers. In these systems, agents present binary-valued outputs, which can be interpreted as true/false or active/inactive signals, and update their states based on these binary outputs. An example is quantized opinion dynamics [10], in which agents display discrete opinions and update based on these quantized values. Other examples can be found in quantized consensus algorithms for engineering [9] and human face-to-face interactions [20]. This update rule implies that system states are coupled with observations that cannot be modeled as selected or i.i.d. inputs as in [12, 13, 19]. Additionally, the quantization parts of the systems are unknown inherent components and cannot be designed like in [15, 16].

Our contributions are summarized as follows. We formulate a dynamical model over networks with binary-valued observations. The stability of outputs and the identifiability of the model are investigated in detail. To estimate network weights for this model, a recursive algorithm based on SA techniques [21] is proposed. More precisely, to deal with the temporal dependency of observations and achieve the recursive estimation of network weights, a deterministic objective function is constructed based on the likelihood function by extending the dimension of observations and applying ergodic properties of Markov chains. It is shown that this function is strictly concave and has unique maximum identical to the true parameter vector. Finally, the strong consistency of the proposed algorithm is established. Unlike batch algorithms solving MLE problems in [17, 18, 22], our recursive algorithm can be applied to online tasks like real-time decision-making and surveillance for networked systems. This work also provides a new scheme for the identification of systems with quantized observations.

The remainder of this paper is organized as follows. Section II introduces some notations. We formulate the estimation problem in Section III, and study the model and its identifiability in Section IV. The estimation algorithm and numerical simulations are given in Section V. Section VI concludes the paper.

II NOTATIONS

In this paper, we use boldfaced lower-case or Greek letters to represent column vectors. Their entries are represented by lower-case letters with corresponding subscripts, e.g., $a_{i}$ is the $i$ -th entry of $\bm{a}$ . Matrices and random vectors are written as upper-case letters such as $A$ and $X$ , but we will not emphasize the meaning unless this causes ambiguity. The expectation of a random variable $X$ is denoted by $E\{X\}$ .

For a matrix $A$ , its entries, rows, and transpose are denoted by $a_{ij}$ , $A_{i}$ , and $A^{T}$ , respectively. For a sequence of random vectors, say $\{X_{t}\}_{t\geq 0}$ , $X_{k,i}$ is used to represent the $i$ -th entry of $X_{k}$ . Denote $|\bm{a}|=(|a_{1}|,\dots,|a_{n}|)^{T}$ and $|A|=(|a_{ij}|)$ , where $|x|$ is the absolute value of real number $x$ . The $n$ -length all-zeros and all-ones vectors are written as $\bm{0}_{n}$ and $\bm{1}_{n}$ , or simply $\bm{0}$ and $\bm{1}$ . The symbol $\bm{e}_{i}$ denotes a unit vector with $i$ -th entry being $1$ . Denote $a\vee b:=\max\{a,b\}$ and $a\wedge b:=\min\{a,b\}$ . We use $\{0,1\}^{m}:=\times_{i=1}^{n}\{0,1\}$ to represent the Descartes product of $m$ identical binary sets $\{0,1\}$ .

For a Markov chain $\{X_{t}\}$ in $\Omega$ , the transition probability from $x$ to $y$ is $P(x,y)=P\{X_{1}=y|X_{0}=x\}$ , and the t-step transition probability from $x$ to $y$ is $P^{t}(x,y)=P\{X_{t}=y|X_{0}=x\}$ , $x,y\in\Omega$ . We say that $y$ is reachable from $x$ , if there exists $t\geq 1$ such that $P^{t}(x,y)>0$ .

We say that $y$ is reachable from $x$ , if there exists $t\geq 1$ such that $P^{t}(x,y)>0$ . The Markov chain is said to be irreducible, if $y$ is reachable from $x$ for all $x,y\in\Omega$ . The greatest common divisor of set $\{t\geq 1:P^{t}(x,x)>0\}$ is called the period of $x$ , denoted by $d(x)$ . The Markov chain is aperiodic if $d(x)=1$ for all $x\in\Omega$ . We call a probability distribution $\pi$ on $\Omega$ as a stationary distribution, if $\forall y\in\Omega$ , $\pi(y)=\sum_{x\in\Omega}\pi(x)P(x,y)$ .

III PROBLEM FORMULATION

In the sequel, suppose that the network size $n\geq 2$ . The binary observation model is as follows,

[TABLE]

where $t\geq 1$ , $Y_{t}=(Y_{t,1},\dots,Y_{t,n})^{T}$ , $D_{t}=(D_{t,1},\dots,D_{t,n})^{T}$ , $S_{t}=(S_{t,1},\dots,S_{t,n})^{T}$ are the state vector, the disturbance, and the observation vector at time $t$ respectively. $A$ is the weight matrix, and $\bm{c}=(c_{1},\dots,c_{n})^{T}$ is the unknown quantized threshold vector. $\mathcal{Q}(Y_{t},\bm{c})=(\mathbb{I}_{[Y_{t,1}>c_{1}]},\dots,\mathbb{I}_{[Y_{t,n}>c_{n}]})^{T}$ is the quantizer. Here $\mathbb{I}_{A}(x)$ is the indicator function such that $\mathbb{I}_{A}(x)=1$ for $x\in A$ and $\mathbb{I}_{A}(x)=0$ for $x\not\in A$ .

In this model, the outputs rather than states or inputs are available for individual updates. This takes place in a variety of systems such as quantized opinion dynamics [10], human face-to-face interactions [23, 20], and quantized consensus algorithms [9]. Our main aim in this paper is to estimate the network weight matrix $A$ and the quantization threshold vector $\bm{c}$ . We propose a recursive algorithm based on stochastic approximation techniques, and prove the strong consistency of the algorithm.

For weight matrix $A$ , the $ij$ -th entry represents the influence weight of $j$ to $i$ . To cover more situations, we do not assume that the row sums of $A$ are 1. Negative weights are permitted, which represent antagonistic relationships. Without loss of generality, we assume that $|A|$ has no row with zero sum, i.e., $|A_{i}|\bm{1}>0$ for all $i$ , which means that every agent has certain connections with others.

The observations are only binary in this paper, but this assumption is sufficient for characterizing diverse scenarios. For example, in the voter model [24], agents have only two choices, i.e., to vote ( $1$ ) or not ([math]), and in human-human interactions, speaking or not can be defined as the individual outputs [20].

The disturbance $D_{t}$ can be interpreted as the unmodeled part of the process or the summation of observation noises. We give the following standard normal assumption for it. The normal distribution assumption is not unusual for quantized systems, since it facilitates the approximation of the MLE [17, 18, 22].

Assumption 1

$\{D_{t,i}\}_{1\leq i\leq n,t\geq 1}$ * are i.i.d. standard normal random variables.*

IV THE MODEL AND THE IDENTIFIABILITY

IV-A STOCHASTIC STABILITY

This section investigates the stability of observations and the identifiability of the model in detail.

As in (1), the observation sequence $\{S_{t}\}$ is actually a Markov chain with finite states. The existence of stationary distributions is a significant aspect of stochastic stability of Markov chains [25], and Assumption 1 guarantees stability for observations of our model, as the following shows.

Theorem 1

Suppose that Assumption 1 holds. The Markov chain $\{S_{t}\}_{t\geq 0}$ defined by (1) is irreducible and aperiodic, and hence converges in distribution to the unique stationary distribution positive on $\{0,1\}^{n}$ from any initial distribution.

Define $\tilde{S}_{t}:=(S_{t}^{T}~{}S_{t-1}^{T})^{T}$ , $t\geq 1$ . This chain is critical for our estimation. Note that $\{\tilde{S}_{t}\}_{t\geq 1}$ taking values in $\{0,1\}^{2n}$ is also a Markov chain. For $t\geq 1$ and $\bm{s}_{t-1},\bm{s}_{t}$ , $\bm{s}_{t+1}\in\{0,1\}^{n}$ ,

[TABLE]

So $\{\tilde{S}_{t}\}$ is aperiodic. For states $(\bm{s}^{T}~{}\bm{u}^{T})^{T},(\bm{x}^{T}~{}\bm{y}^{T})^{T}\in\{0,1\}^{2n}$ , since $\{S_{t}\}$ is irreducible, we have that there exists $t\geq 1$ such that $P^{t}(\bm{x},\bm{u})>0$ . Moreover from the proof of Theorem 1, $P(\bm{u},\bm{s})>0$ holds. Hence it follows from (2) that

[TABLE]

which implies that $\{\tilde{S}_{t}\}$ is also irreducible, and further we have the following result:

Theorem 2

Suppose that Assumption 1 holds. The Markov chain $\{\tilde{S}_{t}\}_{t\geq 1}$ converges in distribution to the unique stationary distribution positive on $\{0,1\}^{2n}$ from any initial distribution.

The next lemma illustrates the relation between $\{S_{t}\}$ and the stationary distribution of $\{\tilde{S}_{t}\}$ .

Lemma 1

Suppose that Assumption 1 holds, and $\tilde{S}$ is subject to the stationary distribution of $\{\tilde{S}_{t}\}$ . Then

[TABLE]

for all $\bar{\bm{s}},\bm{s}\in\{0,1\}^{n}$ , where $\bar{S}$ and $S$ are the first and last $n$ entries of $\tilde{S}$ respectively, i.e., $\tilde{S}=(\bar{S}^{T}~{}S^{T})^{T}$ .

IV-B IDENTIFIABILITY

One of the central concerns in system identification is whether parameters of different values can determine an identical model [26]. For model (1), when we fix the distribution of disturbances in advance, the answer is negative by considering the result below.

Theorem 3

Suppose that Assumption 1 holds. Then distinct parameters $(A~{}\bm{c})$ correspond to distinct Markov chain $\{S_{t}\}$ defined by (1), where $(A~{}\bm{c})$ is the parameter matrix of dimension $n\times(n+1)$ . That is to say, for two parameter matrices $(A~{}\bm{c})$ and $(\hat{A}~{}\hat{\bm{c}})$ such that $a_{ij}\not=\hat{a}_{ij}$ or $c_{i}\not=\hat{c}_{i}$ for some $i,j\in\{1,2,\dots,n\}$ , the corresponding Markov chains $\{S_{t}\}$ and $\{\hat{S}_{t}\}$ are not the same in the sense that their transition probability matrices are not the same.

If the noise assumption is relaxed to i.i.d. normal random variables with zero mean and variance $\sigma$ , then the noise distribution function is $F(x)=\Phi(\frac{x}{\sigma})$ , where $\Phi(\cdot)$ is the cumulative density function (c.d.f.) of the standard normal random variable. It follows from the proof of Theorem 3 that ${c_{i}}/{\sigma}={\hat{c}_{i}}/{\hat{\sigma}}$ , ${a_{ij}}/{\sigma}={\hat{a}_{ij}}/{\hat{\sigma}}$ , for all $1\leq i,j\leq n$ . This implies that the model (1) is unique up to constant multiples of the parameters. For the situation in which the quantized threshold $c_{i}\not=0$ is known for $\forall i$ , the model is uniquely defined. In general it is not true, but we can assume that $\sigma=1$ , because the proportion of network weights that each agent gives out to different agents is the only concern, and it remains the same when the weight matrix $A$ is multiplied by a diagonal matrix with nonzero diagonal entries.

In the literature of quantized consensus and opinion dynamics [9, 10], the influence weight matrix is assumed to be row stochastic ( $A_{i}\bm{1}=1$ , $\forall 1\leq i\leq n$ , and $a_{ij}\geq 0$ , $\forall 1\leq i,j\leq n$ ) or absolutely row stochastic ( $|A_{i}|\bm{1}=1$ , $\forall 1\leq i\leq n$ ). Our model can in fact capture this assumption. It is because, denoting $B=\text{diag}(a^{1},\dots,a^{n})$ as the diagonal matrix with diagonal entries $a_{1},\dots,a_{n}$ with $a^{i}=|A_{i}|\bm{1}$ , (1) can be written as

[TABLE]

where $\tilde{Y}_{t}=B^{-1}Y_{t}$ , $\tilde{A}=B^{-1}A$ , $\tilde{D}_{t}=B^{-1}D_{t}$ , and $\tilde{\mathcal{Q}}(\tilde{Y}_{t})=(\mathbb{I}_{[\tilde{Y}_{t,1}>\tilde{c}_{1}]},\dots,\mathbb{I}_{[\tilde{Y}_{t,n}>\tilde{c}_{n}]})^{T}$ . Here $\tilde{c}_{i}=(a^{i})^{-1}c_{i}$ , and $B^{-1}$ exists since $|A_{i}|\bm{1}>0$ . So $\tilde{A}$ is absolutely row stochastic in (3), and $\tilde{D}_{t,i}$ , $1\leq i\leq n$ , become heterogeneous Gaussian noises with different variances. Under this condition, the identifiability still holds.

V THE IDENTIFICATION ALGORITHM

V-A THE OBJECTIVE FUNCTION AND ITS PROPERTY

Our goal is to estimate parameters $\theta:=\text{vec}\big{\{}(A~{}\bm{c})\big{\}}$ , where $(A~{}\bm{c})$ is a matrix of dimension $n\times(n+1)$ , and $\text{vec}\{\cdot\}$ operator generates a vector from a matrix by stacking the transpose of its rows on one another. Denote $\theta^{(i)}=(A_{i}~{}c_{i})^{T}$ . To avoid ambiguity, $\theta^{*}:=\text{vec}\big{\{}(A^{*}~{}\bm{c}^{*})\big{\}}=(((\theta^{*})^{(1)})^{T},\dots,(\theta^{*})^{(n)})^{T})^{T}$ is used to represent the true parameters. Given observation data $\{\bm{s}^{t},0\leq t\leq T\}$ , the log maximum likelihood function is

[TABLE]

where $g_{i}(\tilde{\bm{x}}|\theta^{(i)}):=(1-\Phi(c_{i}-A_{i}\bm{x}))^{\tilde{x}_{i}}\Phi(c_{i}-A_{i}\bm{x})^{1-\tilde{x}_{i}}$ with $\tilde{\bm{x}}$ in $\{0,1\}^{2n}$ and $\bm{x}$ identical to the last $n$ entries of $\tilde{\bm{x}}$ , and $\tilde{\bm{s}}^{t}:=((\bm{s}^{t})^{T}~{}(\bm{s}^{t-1})^{T})^{T}$ .

For fixed $\theta$ , $g_{i}(\tilde{\bm{x}}|\theta^{(i)})$ and $\nabla g_{i}(\tilde{\bm{x}}|\theta^{(i)})$ are bounded since $\tilde{\bm{x}}$ takes values in $\{0,1\}^{n}$ . Thus, from Strong Law of Large Numbers for Markov chains (Theorem 17.1.7 in [25]), the following hold for the chain $\{\tilde{S}_{t}\}$ and fixed $\theta$ a.s.:

[TABLE]

where $\tilde{S}$ is subject to the stationary distribution of $\{\tilde{S}_{t}\}$ .

Therefore, the function

[TABLE]

will be used to fulfill the estimation of $\theta^{*}$ . It has an agreeable property:

Theorem 4

Under Assumption 1, the true parameter vector $\theta^{*}$ is the unique maximum of the function $E\{\sum_{1\leq i\leq n}\log g_{i}(\tilde{S}|\theta^{(i)})\}$ , and the unique solution of the equation $\nabla_{\theta}E\{\sum_{1\leq i\leq n}\log g_{i}(\tilde{S}|\theta^{(i)})\}=0$ , where $\tilde{S}$ is subject to the stationary distribution of $\{\tilde{S}_{t}\}$ .

V-B THE ESTIMATION ALGORITHM

We use the SA algorithm to address the estimation problem for the binary observation model. For $1\leq i\leq n$ and $t\geq 1$ , denote

[TABLE]

where $\theta=((\theta^{(1)})^{T},\dots,(\theta^{(n)})^{T})^{T}$ , and $g_{i}(\tilde{\bm{x}}|\theta^{(i)}):=(1-\Phi(c_{i}-A_{i}\bm{x}))^{\tilde{x}_{i}}\Phi(c_{i}-A_{i}\bm{x})^{1-\tilde{x}_{i}}$ with $\tilde{\bm{x}}$ in $\{0,1\}^{2n}$ and $\bm{x}$ identical to the last $n$ entries of $\tilde{\bm{x}}$ .

The estimation algorithm is as follows:

[TABLE]

where $\theta_{t}=((\theta^{(1)}_{t})^{T},\dots,(\theta^{(n)}_{t})^{T})^{T}$ is the estimation of $\theta$ at time step $t$ , and $a_{t}$ is the step size.

Remark 1

In this algorithm, we assume that $\theta_{t}$ is bounded. If this assumption does not hold, one can apply the SA algorithm with expanding truncations [21], in which estimate $\theta_{t}$ is also bounded because of truncation. It is also verified that the times of truncation is finite a.s.

Assumption 2

Let $a_{t}$ be the step size in (5), satisfying $a_{t}>0$ , $\sum\nolimits_{t=1}^{\infty}a_{t}=\infty$ , and $\sum\nolimits_{t=1}^{\infty}a_{t}^{2}<\infty$ .

Theorem 5

Suppose that Assumptions 1 and 2 hold. Then the estimates $\theta_{t}$ of the algorithm (5) converge to $\theta^{*}$ a.s. from any fixed initial value, where $\theta^{*}$ is the true parameter vector.

V-C NUMERICAL SIMULATIONS

We use an influence weight matrix with four individuals from an empirical study [27] to illustrate the consistency of the above algorithm. The weight matrix $\tilde{A}$ is given by

[TABLE]

The noises are set to be independent Gaussian with zero mean and variance $4$ , and $\tilde{c}$ is randomly selected as $\tilde{\bm{c}}=(0.13~{}0.28~{}0.08~{}0.24)^{T}$ . Therefore, as previous discussion, the parameters are identical to that $\bm{c}=(0.065~{}0.14~{}0.04~{}0.12)^{T}$ and $A=\tilde{A}/2$ in our model.

We set the step size $a_{t}=10/(t+200)$ , and run the algorithm for $100$ trials. Fig. $1$ shows the strong consistency of the algorithm, illustrated by two parameters $a_{12}$ and $a_{33}$ . The blue line represents one sample path, and the red line represents the true value. The gray ones are sample paths for all $100$ trials. Fig. $2$ shows the mean square error (MSE), which is defined as $\text{MSE}_{k}:=\frac{1}{N}\sum_{i=1}^{N}\|\theta_{k}-\theta^{*}\|^{2}$ with the number of trials $N=100$ .

VI CONCLUSION

In this paper we study the estimation of network weights for a class of binary observation systems. These systems are distinctly different from models studied in the literature of quantized identification, because there is no room for the design of inputs and quantizers. We propose a recursive algorithm based on stochastic approximation techniques, and prove its consistency. Future work includes investigation of the convergence rate and asymptotical efficiency, generalization of the model and noise conditions, and application of the algorithm in practice.

APPENDIX

Proof of Theorem 1:

Under Assumption 1, the probability transition matrix can be obtained via the following way:

[TABLE]

for all $\bm{s},\bm{u}\in\{0,1\}^{n}$ , $1\leq i\leq n$ . Therefore, the transition matrix of $\{S_{t}\}$ is irreducible and aperiodic, and the conclusion holds by Corollary 1.17 and Theorem 4.9 in [28]. $\Box$

Proof of Lemma 1:

Let $\tilde{P}$ be the transition probability matrix of $\{\tilde{S}_{t}\}$ . From the definition of stationary distribution, we have that

[TABLE]

Define

[TABLE]

and it follows from the definition of $\{\tilde{S}_{t}\}$ that $P(\tilde{\bm{s}},(\bar{\bm{s}}^{T}~{}\bm{s}^{T})^{T})=0$ for $\tilde{\bm{s}}\not\in\mathscr{S}_{1}$ . Hence,

[TABLE]

Similarly, we have that

[TABLE]

where

[TABLE]

Combining (2) (7) and (8),

[TABLE]

where the entries of $\bar{\bm{s}}^{2}$ are identical to the first $n$ entries of $\tilde{\bm{s}}^{2}$ . Hence,

[TABLE]

$\Box$

Proof of Theorem 3:

From (APPENDIX) in the proof of Theorem 1, we have the following

[TABLE]

where $1\leq i,j,k\leq n$ and $k\not=j$ , and the same for $\{\hat{S}_{t}\}$ . Here $\Phi$ is the c.d.f. of standard normal distribution.

Suppose that $\{S_{t}\}$ and $\{\hat{S}_{t}\}$ have the same probability transition matrices. From Assumption 1 and the above equations, it follows that

[TABLE]

where $1\leq i,j,k\leq n$ and $k\not=j$ . Hence by the strictly increasing property of $\Phi$ ,

[TABLE]

where $1\leq i,j,k\leq n$ and $k\not=j$ . Therefore, if we set $j=k+1$ when $k<n$ , and $j=1$ when $k=n$ , then we have for all $i,k\in\{1,\dots,n\}$ , $a_{ik}=\hat{a}_{ik}$ . Consequently $c_{i}=\hat{c}_{i}$ holds for all $i\in\{1,\dots,n\}$ . $\Box$

Bibliography34

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] P. Dhaeseleer, S. Liang, and R. Somogyi, “Genetic network inference: from co-expression clustering to reverse engineering,” Bioinformatics , vol. 16, no. 8, pp. 707–726, 2000.
2[2] C. Ravazzi, R. Tempo, and F. Dabbene, “Learning influence structure in sparse social networks,” IEEE Transactions on Control of Network Systems , 2017.
3[3] M. Timme and J. Casadiego, “Revealing networks from dynamics: an introduction,” Journal of Physics A: Mathematical and Theoretical , vol. 47, no. 34, p. 343001, 2014.
4[4] E. Nozari, Y. Zhao, and J. Cortés, “Network identification with latent nodes via autoregressive models,” IEEE Transactions on Control of Network Systems , vol. 5, no. 2, pp. 722–736, 2018.
5[5] M. Sharf and D. Zelazo, “Network identification: A passivity and network optimization approach,” in 2018 IEEE Conference on Decision and Control (CDC) , pp. 2107–2113, IEEE, 2018.
6[6] S. Segarra, M. T. Schaub, and A. Jadbabaie, “Network inference from consensus dynamics,” in 2017 IEEE 56th Annual Conference on Decision and Control (CDC) , pp. 3212–3217, IEEE, 2017.
7[7] C. Ravazzi, S. Hojjatinia, C. M. Lagoa, and F. Dabbene, “Randomized opinion dynamics over networks: influence estimation from partial observations,” in 2018 IEEE Conference on Decision and Control (CDC) , pp. 2452–2457, IEEE, 2018.
8[8] Y. Dong, W. Zhao, et al. , “The identification of social networks by the least-square algorithm,” in 2018 37th Chinese Control Conference (CCC) , pp. 1931–1936, IEEE, 2018.