Multi-GCN: Graph Convolutional Networks for Multi-View Networks, with   Applications to Global Poverty

Muhammad Raza Khan; Joshua E. Blumenstock

arXiv:1901.11213·cs.LG·February 1, 2019

Multi-GCN: Graph Convolutional Networks for Multi-View Networks, with Applications to Global Poverty

Muhammad Raza Khan, Joshua E. Blumenstock

PDF

TL;DR

This paper introduces Multi-GCN, a graph convolutional network designed for multi-view networks, demonstrating superior performance in global poverty prediction and other multi-view learning tasks across various datasets.

Contribution

The paper presents a novel Multi-GCN model that effectively captures multi-view relations in graphs, improving semi-supervised learning in poverty research and beyond.

Findings

01

Outperforms state-of-the-art algorithms on poverty prediction tasks.

02

Achieves better results on multi-view node labeling in citation networks.

03

Effective across datasets from multiple developing countries.

Abstract

With the rapid expansion of mobile phone networks in developing countries, large-scale graph machine learning has gained sudden relevance in the study of global poverty. Recent applications range from humanitarian response and poverty estimation to urban planning and epidemic containment. Yet the vast majority of computational tools and algorithms used in these applications do not account for the multi-view nature of social networks: people are related in myriad ways, but most graph learning models treat relations as binary. In this paper, we develop a graph-based convolutional network for learning on multi-view networks. We show that this method outperforms state-of-the-art semi-supervised learning algorithms on three different prediction tasks using mobile phone datasets from three different developing countries. We also show that, while designed specifically for use in poverty…

Tables3

Table 1. Table 1: Summary statistics. The Label Rate indicates the fraction of instances that are labeled.

Dataset	Data Type	Nodes	Edges	Edges	Classes	Features	Label Rate
			(view 1)	(view 2)
Product Adoption	Phone logs (West Africa)	17,000	23,032	18,371	2	132	0.002
Poverty Prediction	Phone logs (East Africa)	422	544	1,799	2	1,709	0.094
Gender Prediction	Phone logs (South Asia)	958	992	978	2	821	0.042
Citeseer	Citation network	3,327	4,732	3,492	6	3,703	0.036
Cora	Citation network	2,708	5,429	2,846	7	1,433	0.052

Table 2. Table 2: Classification accuracy on mobile phone data . Numbers indicate mean classification accuracy (percentage) and standard error over 10 randomly selected dataset splits of equal size.

Method	Product Adoption	Poverty Prediction	Gender Prediction
DeepWalk (first view)	56.43 $\pm$ 0.187	51.91 $\pm$ 0.62	53.18 $\pm$ 0.55
DeepWalk (second view)	51.97 $\pm$ 0.112	50.34 $\pm$ 0.36	50.84 $\pm$ 0.64
DeepWalk (view union)	56.81 $\pm$ 0.114	50.87 $\pm$ 0.95	52.34 $\pm$ 0.50
Node2vec (first view)	53.87 $\pm$ 0.20	52.26 $\pm$ 0.58	50.12 $\pm$ 0.40
Node2vec (second view)	50.50 $\pm$ 0.11	49.70 $\pm$ 0.23	51.68 $\pm$ 0.40
Node2vec (view union)	54.50 $\pm$ 0.11	50.52 $\pm$ 0.63	51.64 $\pm$ 0.53
LINE (first view)	51.11 $\pm$ 0.01	50.15 $\pm$ 0.02	51.56 $\pm$ 0.001
LINE (second view)	50.83 $\pm$ 0.01	52.29 $\pm$ 0.001	50.00 $\pm$ 0.001
LINE (view union)	56.26 $\pm$ 0.003	50.18 $\pm$ 0.001	51.33 $\pm$ 0.002
GCN (first view)	70.74 $\pm$ 2.2	55.19 $\pm$ 2.33	63.97 $\pm$ 1.29
GCN (second view)	71.40 $\pm$ 1.81	50.06 $\pm$ 0.81	63.01 $\pm$ 0.013
GCN (view union)	71.90 $\pm$ 0.9	50.22 $\pm$ 0.56	63.90 $\pm$ 1.32
Multi-GCN (this paper)	73.47 $\pm$ 0.91	59.23 $\pm$ 0.20	66.34 $\pm$ 1.03

Table 3. Table 3: Classification accuracy on citation networks. Top panel shows the mean classification accuracy (percentage) for the pre-defined test-train splits as described by ? ( ? ). Bottom panel shows the classification accuracy (percentage) and standard error over 10 randomly selected dataset splits of equal size.

Predefined train-test splits
Method	Citeseer	Cora
ManiReg (first view) - ? (?)	60.1	59.5
DeepWalk (first view) - ? (?)	43.2	67.2
Planetoid (first view) - ? (?)	64.7	75.7
GCN (first view)	70.3	81.5
GCN (second view)	50.7	53.6
GCN (view union)	70.7	80.4
Multi-GCN (this paper)	71.3	82.5
Randomized train-test splits
GCN (first view)	67.9 $\pm$ 0.5	80.1 $\pm$ 0.5
GCN (second view)	53.6 $\pm$ 0.1	56.9 $\pm$ 0.3
GCN (view union)	67.9 $\pm$ 0.3	78.5 $\pm$ 0.1
Multi-GCN (this paper)	70.5 $\pm$ 0.2	81.1 $\pm$ 0.2

Equations24

L_{i} = D_{i}^{- 1/2} (D_{i} - W_{i}) D_{i}^{- 1/2}

L_{i} = D_{i}^{- 1/2} (D_{i} - W_{i}) D_{i}^{- 1/2}

U_{i} \in R^{n * k} min t r (U_{i}^{'} L_{i} U_{i}), s.t. U_{i}^{'} U_{i} = 1

U_{i} \in R^{n * k} min t r (U_{i}^{'} L_{i} U_{i}), s.t. U_{i}^{'} U_{i} = 1

d_{p r o j}^{2} (Y_{1}, Y_{2}) = i = 1 \sum k sin^{2} θ_{i} = k - t r (Y_{1} Y_{1}^{'} Y_{2} Y_{2}^{'})

d_{p r o j}^{2} (Y_{1}, Y_{2}) = i = 1 \sum k sin^{2} θ_{i} = k - t r (Y_{1} Y_{1}^{'} Y_{2} Y_{2}^{'})

d_{p r o j}^{2} (U, {U_{i}}_{i = 1}^{M}) = = i = 1 \sum M d_{p r o j}^{2} (U, U_{i}) k M - i = 1 \sum M t r (U U^{'} U_{i} U_{i}^{'})

d_{p r o j}^{2} (U, {U_{i}}_{i = 1}^{M}) = = i = 1 \sum M d_{p r o j}^{2} (U, U_{i}) k M - i = 1 \sum M t r (U U^{'} U_{i} U_{i}^{'})

U \in R^{n * k} min i = 1 \sum M t r (U^{'} L_{i} U) + α_{i} [k M - t r (U U^{'} U_{i} U_{i}^{'})], s.t. U_{i}^{'} U = 1

U \in R^{n * k} min i = 1 \sum M t r (U^{'} L_{i} U) + α_{i} [k M - t r (U U^{'} U_{i} U_{i}^{'})], s.t. U_{i}^{'} U = 1

U \in R^{n * k} min t r [U^{'} (i = 1 \sum M L_{i} - i = 1 \sum M α_{i} U_{i} U_{i}^{'}) U],

U \in R^{n * k} min t r [U^{'} (i = 1 \sum M L_{i} - i = 1 \sum M α_{i} U_{i} U_{i}^{'}) U],

L_{m o d} = i = 1 \sum M L_{i} - i = 1 \sum M α_{i} U_{i} U_{i}^{'}

L_{m o d} = i = 1 \sum M L_{i} - i = 1 \sum M α_{i} U_{i} U_{i}^{'}

f^{*} = (I - β * L_{m o d})^{- 1} q

f^{*} = (I - β * L_{m o d})^{- 1} q

g_{θ} * x = g_{θ} (L) x = U g_{θ} U^{T} x

g_{θ} * x = g_{θ} (L) x = U g_{θ} U^{T} x

g_{θ^{'}} * x = k = 0 \sum K θ_{k}^{'} T_{k} (\tilde{L}) x

g_{θ^{'}} * x = k = 0 \sum K θ_{k}^{'} T_{k} (\tilde{L}) x

g_{θ} * x = θ (I + \tilde{D}^{- 1/2} \tilde{A} \tilde{D}^{- 1/2}) x

g_{θ} * x = θ (I + \tilde{D}^{- 1/2} \tilde{A} \tilde{D}^{- 1/2}) x

Z = F (X, A) = so f t ma x (\hat{A} R e LU (\hat{A} X W^{0}) W^{1})

Z = F (X, A) = so f t ma x (\hat{A} R e LU (\hat{A} X W^{0}) W^{1})

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Multi-GCN: Graph Convolutional Networks for Multi-View Networks,

with Applications to Global Poverty

Muhammad Raza Khan, Joshua E. Blumenstock

University of California, Berkeley

[email protected], [email protected]

Abstract

With the rapid expansion of mobile phone networks in developing countries, large-scale graph machine learning has gained sudden relevance in the study of global poverty. Recent applications range from humanitarian response and poverty estimation to urban planning and epidemic containment. Yet the vast majority of computational tools and algorithms used in these applications do not account for the multi-view nature of social networks: people are related in myriad ways, but most graph learning models treat relations as binary. In this paper, we develop a graph-based convolutional network for learning on multi-view networks. We show that this method outperforms state-of-the-art semi-supervised learning algorithms on three different prediction tasks using mobile phone datasets from three different developing countries. We also show that, while designed specifically for use in poverty research, the algorithm also outperforms existing benchmarks on a broader set of learning tasks on multi-view networks, including node labelling in citation networks.

1 Introduction

Over the past several years, large-scale graph machine learning has gained increasing relevance in the domain of international poverty research (?). Driven largely by the expansion of mobile phone networks throughout developing countries – roughly 95% of the world population now has mobile phone coverage (?) – vast quantities of network data are constantly being generated by people living in even extremely poor and marginalized communities. Recent work has shown how such data can be used to inform critical policy decisions, including the measurement of living conditions (?), the spread of infectious diseases (?), and the management of humanitarian crises (?). Private companies are also taking advantage of this new source of data, for instance by using data from mobile phones to generate credit scores that can expand credit to millions of people historically shut out of the formal banking ecosystem (?).

However, a critical constraint to the use of these data in settings related to economic development is the lack of scalable algorithms for performing prediction tasks on sparse multi-view networks. Multi-view networks (also referred to as multiplex and multi-modal networks), are networks in which nodes can be related in multiple ways, and are the natural abstraction for mobile phone networks, where different individuals have different types of relationships and can interact using different modalities (such as phone calls, text messages, money transfers, and app-based activity). Yet, the vast majority of applied research using mobile phone data — in developing and developed countries alike — ignores the multi-view nature of phone networks.

This paper develops a novel approach for learning on multi-view networks, which bridges two different strands in the research literature. The first strand involves methods for efficient analysis of multi-view networks; the second explores algorithms for semi-supervised graph learning (see Related Work, below). The method we develop provides an efficient approach for applying convolutional neural networks to multi-view graph-structured data. We benchmark this new method, which we call Multi-GCN (short for Multi-View Graph Convolutional Networks), on three different mobile network datasets, on three different prediction tasks relevant to the international development community: (1) predicting the adoption of a new “financial inclusion” technology in a West African country; (2) predicting whether an individual is living below the poverty line in an East African country; (3) predicting the gender of mobile phone subscribers in a South Asian country. In all cases, we find that Multi-GCN outperforms state-of-the-art benchmarks, including standard Graph Convolutional Networks (?), Node2Vec (?), Deepwalk (?), and LINE (?).

While designed specifically with the developing-country context in mind (where the sparsity and multi-view properties of networks are very salient), we show that Multi-GCN can be more generally applied to a wide range of problems involving multi-view networks. Indeed, most real-world networks are multi-view, including the network data most frequently used by AI researchers (e.g., data from Twitter, Amazon, Netflix, etc.). Our second set of results shows that Multi-GCN can improve upon state-of-the-art algorithms not just in poverty-related contexts, but also in traditional classification problems. In particular, we show that Multi-GCN outperforms competing algorithms on citation labeling tasks (using benchmark datasets from Citeseer and Cora) that have been studied extensively in prior work.

2 Related Work

2.1 Technical Related Work

Our goal is to develop an efficient method for node-level transductive semi-supervised learning over multi-view graphs. Here, we begin with a general overview of semi-supervised learning, then focus on various approaches to graph-based semi-supervised learning, and finally discuss related work on multi-view networks.

Graph-Based Semi-Supervised Learning

One of the biggest issue with applying supervised learning algorithms in a developing country is that it is often costly to collect labels for training. For instance, when using mobile phone data to predict the wealth of subscribers, ? (?) manually conducted a survey of roughly 1,000 subscribers. Semi-supervised learning tries to solve this problem by using unlabeled data along with the labeled data to train better classifiers (see (?) for a survey). Our focus is on transductive semi-supervised learning, which assumes that all the unlabeled data is available at the training time and does not attempt to generalize to data unseen during training.

Graph-based semi-supervised learning (GSSL) is a popular approach for semi-supervised learning that treats labeled and unlabeled instances as graph vertices, and relationships between instances as edges (?). GSSL algorithms try to learn a classifier that is consistent with the labeled data while making sure that the prediction for similar nodes is also similar. This is achieved by minimizing a loss function with two factors: a) supervised loss over the labeled instances, and b) a graph-based regularization term. Different GSSL algorithms use different functions for graph regularization. Label propagation-based approaches, for instance, use a constrained label lookup function (e.g., ? (?)). Related, kernel-based approaches parameterize regularization term in the Reproducing Kernel Hilbert Space (RKHS).

Learning Over Graphs

The success of word embedding algorithms like Word2Vec (?) has inspired similar algorithms for graphs. For instance, DeepWalk (?) learns embeddings by predicting the neighborhood of nodes based on random walks over the graphs, while LINE (?) and Node2vec (?) allow for advanced sampling schemes. More recently, neural network-based approaches have been proposed to perform learning over graphs. These have been extended to the task of semi-supervised learning (?; ?), including recent work by ? (?) that proposes a Graph Convolutional Network (GCN), which we take as a starting point for our approach.

Learning Over Multi-View Graphs

The key distinction between our approach and prior work is our desire to handle graphs with multiple views, i.e., graphs where vertices can be connected in more than one way. In recent years, many different algorithms have been proposed for learning on multi-view graphs. These algorithms can be broadly divided into three main categories: 1) co-training algorithms, 2) learning with multiple kernels, and 3) subspace learning (See ? (?) for a survey). Recent work by ? (?) show that subspace approaches — which find a latent subspace shared by multiple views — perform well relative to co-training and kernelized approaches on a range of tasks. We therefore focus our attention on integrating subspace learning approaches with recent innovations in graph convolutional networks.

Comparison with existing work

Our main contribution is to propose an efficient method for adapting GSSL to multi-view contexts. Existing approaches to GSSL cannot be readily implemented on such data; those algorithms that do handle multiple views generally treat views and vertices equally. We show that current “state of the art” methods like Graph Convolutional Networks (?) can be enhanced by augmenting the input graph using subspace analysis over Grassman manifolds. ? (?) have demonstrated that subspace merging approach can be quite accurate for the problem of cross-domain recommendation which is different from our experimental settings and context as described in the section 4.

2.2 Empirical Related Work

Our experimental results focus on three prediction tasks of relevance to the international development community:

Predicting poverty.

A large number of humanitarian applications — from poverty targeting to program monitoring — require accurate estimates of the welfare for beneficiary populations. Recently, several papers have shown how digital trace data can be used to estimate the socioeconomic status of individuals, households, and villages. For instance, ? (?) show that daytime satellite imagery can be used to estimate village wealth; ? (?) find that Twitter data can be used to estimate levels of deprivation, and ? (2015) shows that mobile phone metadata can be used to estimate the welfare of individuals and regions.

Product adoption.

We focus on the adoption of “mobile money”, a suite of phone-based financial services that are designed to promote financial inclusion among those traditionally shut out of the formal banking ecosystem (?). Within this literature, our work relates most closely to ? (2016), who analyze the predictors of mobile money adoption in three different developing countries.

Gender prediction.

Gender equality and women’s empowerment are one of the Sustainable Development Goals, and recent work explores how digital trace data can be used to assess progress toward this goal (?). ? (?) and ? (?) show that gender can be predicted from social media and mobile phone data.

Broadly, these prior studies demonstrate a proof of concept: that digital trace data can be used to predict the characteristics and outcomes of individuals. However, such analysis rely on off-the-shelf algorithms that rarely, if ever, account for the multi-view nature of real-world social networks. This paper shows that a simple approach to multi-view learning can yield substantial improvements on these real-world prediction tasks.

3 Multi-GCN: Multi-View Graph Convolutional Networks

Our approach to semi-supervised learning on multi-view graphs integrates three steps, depicted in Figure 1. First, we use methods from subspace analysis to efficiently merge multiple views of the same graph. Second, we use a manifold ranking procedure to identify the most informative sub-components of the graph and to prune the graph upon which learning is performed. Finally, we apply a convolutional neural network, adapted to graph-structured data, to allow for semi-supervised node classification.

3.1 Merging Subspace Representations

Given an undirected multilayer graph with M layers $G={G_{i}}_{i=1}^{M}$ such that each layer $G_{i}$ has the same vertex set $V$ but same or different edges set $E_{i}$ , we first calculate the graph Laplacian for each of the individual layers. If $D_{i}$ and $W_{i}$ represent the degree matrix and the adjacency matrix for the $i^{th}$ view of the graph, then the normalized graph Laplacian is defined as

[TABLE]

Given the graph Laplacian $L_{i}$ for each layer of the graph, we calculate the spectral embedding matrix $U_{i}$ through trace minimization:

[TABLE]

This trace minimization problem can be solved by the Rayleigh-Ritz theorem. The solution $U_{i}$ contains the first $k$ eigenvectors corresponding to the $k$ smallest eigenvalues of $L_{i}$ . The spectral embedding embeds nodes of the original graph to a low dimensional spectral domain (See ? (?) for details).

A Grassman manifold $\mathcal{G}(k,n)$ can be considered as a set of $k$ -dimensional linear subspaces in $\mathbb{R}^{n}$ where each unique subspace is mapped to a unique point on the manifold. Each point on the manifold can be represented by an orthonormal matrix $Y\in\mathbb{R}^{n*k}$ whose columns span the corresponding k-dimensional subspace in $\mathbb{R}^{n*k}$ and the distance between the subspaces can be calculated as a set of principal angles $\{\theta_{i}\}_{i=1}^{k}$ between these subspaces. ? (?) show that the projection distance between two subspaces $Y_{1}$ and $Y_{2}$ can be represented as a separate trace minimization problem:

[TABLE]

where, based on Eq. 3, the projection distance between the target representative subspace $U$ and the individual subspaces ${U_{i}}_{i=1}^{M}$ can be calculated as:

[TABLE]

Minimization of Eq. 4 ensures that individual subspaces are close to the final representative subspace $U$ .

Finally, to ensure that the original vertex connectivity in each graph layer is preserved, we include a separate term that minimizes the quadratic-form Laplacian (evaluated on the columns of U):

[TABLE]

In Eq 5, $\alpha$ is the regularization parameter that balances the trade-off between the two terms in the objective function. Rearranging Eq. 5 and ignoring the constant terms yields

[TABLE]

As before, the Rayleigh-Ritz theorem can be used to solve Eq 5. The solution is given by the fist $k$ eigenvectors of the modified Laplacian:

[TABLE]

3.2 Graph-Based Manifold Ranking

Though the modified Laplacian calculated above can be fed directly to the downstream graph convolutional networks, model performance can be increased by ranking the nodes in the manifold based on their saliency with respect to some critical nodes (?). To rank points on the manifold, we use the closed form function,

[TABLE]

Here, $I$ represents the identity matrix, $L_{mod}$ is the normalized Laplacian as calculated in Eq. 7, and $\beta$ is the regularization parameter. Given a vector $q$ containing the indices of the query nodes, Eq. 8 calculates the saliency of the other nodes with respect to the query nodes; the saliency of these nodes can then be used to add or prune edges from the induced underlying graph. The use of manifold-based ranking suits our approach as the modified Laplacian representing merged subspaces can be used directly for saliency detection. The query nodes can be selected as the centroids determined by any clustering algorithm over the manifold.

The algorithm for the subspace merging and subsequent manifold ranking is shown in Algorithm 1. The time complexity of Algorithm 1 for a graph with $M$ layers with $N$ users per layer is $O(MN^{3}+MN^{2}K+N^{2}C^{2}+tN)$ where $K$ represents the number of eigenvectors to be calculated and $C$ is the number of centroids $O(MN^{3})$ is the cost of computing Laplacians and Eigenvector matrix for all the $M$ layers ; $O(MN^{2}K)$ is the cost of computing modified Laplacian; $O(N^{2}C^{2})$ is the cost of computing $C$ clusters using k-means clustering; $O(tN)$ is the cost of manifold ranking. using the iterative version described by (?).

3.3 Graph Convolution Networks

The application of convolutional neural networks to irregular or non-Euclidean grids, such as graphs, is based on the fact that convolutions are multiplications in the Fourier domain, which implies that graph convolutions can be expressed as the multiplication of a signal $x\in\mathbb{R}^{N}$ with a filter $g(\theta)$ (see ? (?)):

[TABLE]

Here, $U$ represents the eigen-decomposition of the normalized graph Laplacian $L=I-D^{-1/2}AD^{-1/2}$ and $I$ , $D$ , $A$ represent the identity, degree and the adjacency matrix, respectively. Graph convolutions can be further expressed in terms of Chebyshev polynomials as

[TABLE]

where $\tilde{L}$ is the rescaled Laplacian, $T_{k}$ represents the Chebyshev polynomials, and $\theta^{\prime}$ represents the vector of Chebyshev coefficients. Following ? (?), by approximating the maximum value of the largest eigenvalue and constraining the number of free parameters, the convolution operation can be represented as

[TABLE]

where $\tilde{A}=A+I$ and $\tilde{D}=\sum\tilde{A}$ are the renormalized versions of $A$ and $D$ . This renormalization avoids numerical instabilities resulting from exploding/vanishing gradients (?).

The modified graph ( $A_{mod}$ in Algorithm 1) resulting from the merger of Laplacians using the subspace analysis and manifold ranking can be fed directly into the graph convolution networks defined above. The forward propagation model for a two layer network can then be represented as

[TABLE]

Here, $\hat{A}=\tilde{D}^{-1/2}\tilde{A}\tilde{D}^{-1/2}$ is calculated as a preprocessing step before giving the input to the neural network. $W^{0}$ and $W^{1}$ represent the input-to-hidden-layer and hidden-layer-to-output weight matrices for a two layer neural network, and can be trained using gradient descent. ReLU and Softmax represent the activation functions in the hidden and output layers.

4 Experiments and Data

4.1 Datasets

Our first set of experiments test Multi-GCN on three prediction tasks relevant to international development. Each one uses a different dataset of mobile phone Call Detail Records (CDR), obtained from three different developing countries with GDP per capita less than $1,600 USD. These datasets contain detailed metadata on all communication events (calls, messages) that occur on the mobile phone network. Each CDR dataset contains multiple possible relationships between nodes (views); we extract one view corresponding to phone calls between users, and another corresponding to text messages. We separately construct a large set of features of each user (such as total call volume and degree centrality), using the combinatoric approach described in ? (?).

Table 1 presents summary statistics for each of these datasets. The connections and sparsity of each network are shown in Figure 2. These spy plots help visualize the structure of the adjacency matrices for each graph view, where a dot indicates that an edge exists between those two individuals on the corresponding view.

Product adoption dataset

The first dataset that we use is a sample of a dataset of mobile phone activity from a West African country. Here, the classification of interest is whether or not the user eventually adopts a new financial inclusion product. There are two possible classifications: (1) Did not adopt; (2) Adopted and used the product. Following the experimental setup described in ? (?), we randomly selected 20 users from each category (40 total) for the training dataset; the validation and the testing dataset consist of 500 and 1000 randomly selected users, respectively.

Poverty prediction dataset

The wealth prediction dataset consists of several thousand transactions of different mobile phone users from an East African country. We attempt to classify users as poor or non-poor, where labels were obtained by ? (?) through a small set of phone surveys that were conducted with mobile phone subscribers. Again, we randomly selected 20 users from each category as the training dataset, while the size of the validation dataset and the testing dataset is 100 and 200 respectively.

Gender prediction dataset

The gender prediction dataset originates from a developing country in South Asia. Here, the classification task is to predict the gender of the mobile phone users, where gender labels are provided by the operator for a small number of labeled instances. We randomly select 20 users from each category for training; the size of the validation and the testing datasets are 100 and 800, respectively.

Citation classification datasets

A final set of experiments replicates the experimental design of ? (?) to test Multi-GCN on more standard node labelling tasks. In these datasets, nodes are documents and the first view corresponds to the citation links between the research papers. We construct the second view from the textual similarity of the papers. Specifically, if the normalized cosine similarity between documents is greater than 0.8, then we create an edge in the second view of the citation network.

4.2 Experimental setup

In general, our goal is to correctly classify nodes in a network, where only a very small fraction of nodes are labeled. In the experiments, we start from a small sample of labeled nodes and test the ability of Multi-GCN, as well as several state-of-the-art algorithms, to correctly classify unlabeled nodes in the validation and testing sets. We use three popular node embedding algorithms (Node2vec, Deepwalk, and LINE) as a first set of baselines. In addition, we provide three baselines based on graph convolutional networks (?). The first two, GCN (first view) and GCN (second view), apply GCN over the two respective adjacency matrices from phone and text message activity. The third, GCN (view union), operates on the union of the adjacency matrices of the first view and the second view. In each GCN baseline, the node features are constructed from the adjacency matrix of the first view.

After merging different views, we rank the interaction between nodes using Eq. 8 based on their salience with respect to the query points. The value of the regularization parameter $\alpha$ (see Eq. 7) is selected through 10-fold cross-validation. We similarly tune the hyper-parameters $\beta$ to 0.99 and set the number of query points to ten times the number of classes.

After adding salient edges and eliminating non-salient edges through the ranking process, both the adjacency matrix of the modified graph and the node features are passed as input to a two-layer graph convolutional network as described in Section 3. All of the GCN-based models, including Multi-GCN, are trained for a maximum of 200 iterations, using Adam (Adaptive moment estimation extension to stochastic gradient descent – see ? (?)) and a learning rate of 0.01. Other GCN hyper-parameters are set using the same values reported in ? (?).

5 Results

Experimental results for the three developing-country datasets are shown in Table 2. Each row in this table indicates the average and standard error of the classification accuracy over 10 randomly drawn train-test splits of the same size for each dataset, constructed as described in Section 4. The last row in Table 2 shows the performance of Multi-GCN. In all four datasets, Multi-GCN outperforms existing state-of-the-art benchmarks, with the margin of improvement greatest in the poverty prediction task and smallest in the gender prediction task.

The second set of experimental results, comparing Multi-GCN to recent benchmarks on a more standard node classification task, are shown in Table 3. In addition to performing a comparison over randomly drawn train-test splits, we also compare the performance of Multi-GCN against a different set of randomized test-train splits, as used in the original tests by ? (?), with an additional validation set of 500 instances used for hyper-parameter tuning. In all cases, we observe improvements in predictive accuracy of Multi-GCN relative to existing approaches.

6 Discussion

This paper proposes a new approach to semi-supervised learning on multi-view graphs. Through a series of experiments, we show that this approach improves upon state-of-the-art embedding- and convolution-based algorithms on a variety of prediction tasks related to both poverty research and to node labelling in general.

Relative to single-view learning algorithms, the main value of the multi-GCN approach is that it incorporates non-redundant information from multiple views into the learning process. Thus, the gains from multi-GCN depend on the prediction task, and the importance of multi-view graph structure to that task. Intuitively, this depends on the mutual information between. This intuition is also supported by a closer look at the results in Table 2. Here, we observe that while Multi-GCN provides the biggest gains relative to Deepwalk, Node2vec and LINE in the case of product adoption, the gains relative to single-view GCN are more modest. By contrast, the performance gain on the poverty and gender prediction tasks is significantly higher for Multi-GCN, even relative to the other single-view GCN benchmarks. The spy plots in Figures 2(a)-2(c) help explain this pattern. In particular, we can see that different views in the product adoption setting appear somewhat redundant, whereas for poverty and gender prediction the views appear more independent.

We believe future work should explore several limitations of the current analysis. In particular, there is much to be learned from a more systematic exploration of the value of additional views, and for different methods for merging views (beyond the subspace learning approach developed in Section 3.1). We are also exploring how graphs with varying degrees of sparsity and a different fraction of labeled nodes can impact the performance of Multi-GCN relative to alternative approaches.

7 Conclusion

Graph convolutional networks have recently achieved considerable success in a variety of learning tasks on irregular, graph-structured data. Leveraging insights from spectral graph theory, GCN’s are beginning to replicate the success that CNN’s have seen on more regular image and text data. For a wide variety of learning tasks relevant to graph-structured data — in contexts ranging from advertising in online networks to intervening in the spread of a contagious disease — this is a promising development.

In this paper, we have shown that state-of-the-art GCNs can achieve even greater performance on a variety of classification tasks when the multi-view nature of the underlying network is incorporated into the learning process. While motivated by three applications in global poverty research, the performance gains appear to generalize to other graph-based classification problems. We therefore view Multi-GCN as an important first step in adapting neural network-based approaches to multi-view networks and hope that it provides a foundation for future work in this space.

8 Acknowledgements

This research was supported by the National Science Foundation Grant under award #CCF - 1637360 (Algorithms in the Field) and by the Office of Naval Research (Minerva Initiative) under award N00014-17-1-2313.

Bibliography31

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[2015] Blumenstock, J.; Cadamuro, G.; and On, R. 2015. Predicting poverty and wealth from mobile phone metadata. Science 350(6264):1073–1076.
2[2014] Blumenstock, J. E. 2014. Calling for Better Measurement: Estimating an Individual’s Wealth and Well-Being from Mobile Phone Transaction Records. In The 20th ACM Conference on Knowledge Discovery and Mining (KDD ’14), Workshop on Data Science for Social Good .
3[2016] Blumenstock, J. E. 2016. Fighting poverty with data. Science 353(6301):753–754.
4[2013] Bruna, J.; Zaremba, W.; Szlam, A.; and Le Cun, Y. 2013. Spectral networks and locally connected networks on graphs. ar Xiv preprint ar Xiv:1312.6203 .
5[2016] Defferrard, M.; Bresson, X.; and Vandergheynst, P. 2016. Convolutional neural networks on graphs with fast localized spectral filtering. In Advances in Neural Information Processing Systems , 3844–3852.
6[2014] Dong, X.; Frossard, P.; Vandergheynst, P.; and Nefedov, N. 2014. Clustering on multi-layer graphs via subspace analysis on grassmann manifolds. IEEE Transactions on signal processing 62(4):905–918.
7[2017] Farseev, A.; Samborskii, I.; Filchenkov, A.; and Chua, T.-S. 2017. Cross-domain recommendation via clustering on multi-layer graphs. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval , 195–204. ACM.
8[2018] Fatehkia, M.; Kashyap, R.; and Weber, I. 2018. Using facebook ad data to track the global digital gender gap. World Development 107:189–209.