Session-based Social Recommendation via Dynamic Graph Attention Networks

Weiping Song; Zhiping Xiao; Yifan Wang; Laurent Charlin; Ming Zhang,; Jian Tang

arXiv:1902.09362·cs.IR·April 17, 2019

Session-based Social Recommendation via Dynamic Graph Attention Networks

Weiping Song, Zhiping Xiao, Yifan Wang, Laurent Charlin, Ming Zhang,, Jian Tang

PDF

2 Repos

TL;DR

This paper introduces a dynamic graph attention neural network for session-based social recommendation, effectively modeling users' evolving interests and context-dependent social influences to improve recommendation accuracy.

Contribution

It presents a novel neural network model combining recurrent and graph attention mechanisms to capture dynamic user interests and social influences in online communities.

Findings

01

Outperforms several baseline models on real-world datasets.

02

Effectively models dynamic interests and social influence.

03

Scalable to large-scale data.

Abstract

Online communities such as Facebook and Twitter are enormously popular and have become an essential part of the daily life of many of their users. Through these platforms, users can discover and create information that others will then consume. In that context, recommending relevant information to users becomes critical for viability. However, recommendation in online communities is a challenging problem: 1) users' interests are dynamic, and 2) users are influenced by their friends. Moreover, the influencers may be context-dependent. That is, different friends may be relied upon for different topics. Modeling both signals is therefore essential for recommendations. We propose a recommender system for online communities based on a dynamic-graph-attention neural network. We model dynamic user behaviors with a recurrent neural network, and context-dependent social influence with a…

Figures8

Click any figure to enlarge with its caption.

Tables4

Table 1. Table 1 . Descriptive statistics of our three data sets.

	Douban	Delicious	Yelp
# Users	32,314	1,650	141,804
# Items	14,109	4,282	17,625
# Events	3,493,821	296,705	1,200,503
# Social links	331,315	15,328	6,818,026
Start Date	01/12/2008	08/12/2009	01/01/2009
End Date	07/22/2016	07/01/2016	10/15/2010
Avg. friends/user	10.25	9.00	48.08
Avg. events/user	108.12	179.82	8.47
Avg. session length	4.38	4.30	3.63

Table 2. Table 2 . Quantitative Results of Different Algorithms. We highlight that DGRec outperforms all other baselines across all three data sets and both metrics. Further analysis is provided in § 5.2 .

Model Class	Model	Douban		Delicious		Yelp
Model Class	Model	Recall@20	NDCG	Recall@20	NDCG	Recall@20	NDCG
Classical	ItemKNN (Linden et al., 2003)	0.1431	0.1635	0.2729	0.2241	0.0441	0.0989
Classical	BPR-MF (Rendle et al., 2009)	0.0163	0.1110	0.2775	0.2293	0.0365	0.1190
Social	SoReg (Ma et al., 2011)	0.0177	0.1113	0.2703	0.2271	0.0398	0.1218
	SBPR (Zhao et al., 2014)	0.0171	0.1059	0.2948	0.2391	0.0417	0.1207
	TranSIV (Xiao et al., 2017)	0.0173	0.1102	0.2588	0.2158	0.0420	0.1187
Temporal	RNN-Session (Hidasi et al., 2016)	0.1643	0.1854	0.3445	0.2581	0.0756	0.1378
Temporal	NARM (Li et al., 2017)	0.1755	0.1872	0.3776	0.2768	0.0765	0.1380
Social + Temporal (Ours)	DGRec	0.1861	0.1950	0.4066	0.2944	0.0842	0.1427

Table 3. Table 3 . Ablation study comparing the performance of the complete model ( DGRec ) with two variations.

Data Sets	Models	Recall@20	NDCG
Douban	DGRec $_{self}$	0.1643	0.1854
	DGRec $_{social}$	0.1185	0.1591
	DGRec	0.1861	0.1950
Delicious	DGRec $_{self}$	0.3445	0.2581
	DGRec $_{social}$	0.3306	0.2516
	DGRec	0.4066	0.2944
Yelp	DGRec $_{self}$	0.0756	0.1378
	DGRec $_{social}$	0.0690	0.1356
	DGRec	0.0842	0.1427

Table 4. Table 4 . Performance of our model w.r.t. different numbers of convolution layers.

Data Sets	Conv. Layers	Recall@20	NDCG
Douban	1	0.1726	0.1886
	2	0.1861	0.1950
	3	0.1793	0.1894
Delicious	1	0.4017	0.2883
	2	0.4066	0.2944
	3	0.4037	0.2932
Yelp	1	0.0760	0.1387
	2	0.0842	0.1427
	3	0.0846	0.1423

Equations25

h_{n} = f (i_{T + 1, n}^{u}, h_{n - 1}),

h_{n} = f (i_{T + 1, n}^{u}, h_{n - 1}),

x_{n}

x_{n}

f_{n}

o_{n}

\tilde{c}_{n}

c_{n}

h_{n}

s_{k}^{s}

s_{k}^{s}

s_{k}^{l} = W_{u} [k, :],

s_{k}^{l} = W_{u} [k, :],

s_{k} = R e LU (W_{1} [s_{k}^{s}; s_{k}^{l}]),

s_{k} = R e LU (W_{1} [s_{k}^{s}; s_{k}^{l}]),

α_{u k}^{(l)} = \frac{e x p ( f ( h _{u}^{(l)} , h _{k}^{(l)} ))}{\sum _{j \in N (u) \cup {u}} e x p ( f ( h _{u}^{(l)} , h _{j}^{(l)} ))},

α_{u k}^{(l)} = \frac{e x p ( f ( h _{u}^{(l)} , h _{k}^{(l)} ))}{\sum _{j \in N (u) \cup {u}} e x p ( f ( h _{u}^{(l)} , h _{j}^{(l)} ))},

\tilde{h}_{u}^{(l)} = k \in N (u) \cup {u} \sum α_{u k}^{(l)} h_{k}^{(l)},

\tilde{h}_{u}^{(l)} = k \in N (u) \cup {u} \sum α_{u k}^{(l)} h_{k}^{(l)},

\hat{h}_{n} = W_{2} [h_{n}; h_{u}^{(L)}],

\hat{h}_{n} = W_{2} [h_{n}; h_{u}^{(L)}],

p (y ∣ i_{T + 1, 1}^{u}, \dots, i_{T + 1, n}^{u}; {S_{T}^{k}, k \in N (u)}) = \frac{exp ( h ^ _{n} ^{⊤} z _{y} )}{\sum _{j = 1}^{∣ I ∣} exp ( h ^ _{n} ^{⊤} z _{j} )},

p (y ∣ i_{T + 1, 1}^{u}, \dots, i_{T + 1, n}^{u}; {S_{T}^{k}, k \in N (u)}) = \frac{exp ( h ^ _{n} ^{⊤} z _{y} )}{\sum _{j = 1}^{∣ I ∣} exp ( h ^ _{n} ^{⊤} z _{j} )},

u \in U \sum t = 2 \sum T n = 1 \sum N_{u, t} - 1 lo g p (i_{t, n + 1}^{u} ∣ i_{t, 1}^{u}, \dots, i_{t, n}^{u}; {S_{t - 1}^{k}, k \in N (u)}) .

u \in U \sum t = 2 \sum T n = 1 \sum N_{u, t} - 1 lo g p (i_{t, n + 1}^{u} ∣ i_{t, 1}^{u}, \dots, i_{t, n}^{u}; {S_{t - 1}^{k}, k \in N (u)}) .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Session-based Social Recommendation via Dynamic Graph Attention Networks

Weiping Song

School of EECS, Peking University

[email protected]

,

Zhiping Xiao

School of EECS, UC Berkeley

[email protected]

,

Yifan Wang

School of EECS, Peking University

[email protected]

,

Laurent Charlin

Mila & HEC Montreal

[email protected]

,

Ming Zhang

School of EECS, Peking University

mzhang˙[email protected]

and

Jian Tang

Mila & HEC Montreal

[email protected]

(2019)

Abstract.

Online communities such as Facebook and Twitter are enormously popular and have become an essential part of the daily life of many of their users. Through these platforms, users can discover and create information that others will then consume. In that context, recommending relevant information to users becomes critical for viability. However, recommendation in online communities is a challenging problem: 1) users’ interests are dynamic, and 2) users are influenced by their friends. Moreover, the influencers may be context-dependent. That is, different friends may be relied upon for different topics. Modeling both signals is therefore essential for recommendations.

We propose a recommender system for online communities based on a dynamic-graph-attention neural network. We model dynamic user behaviors with a recurrent neural network, and context-dependent social influence with a graph-attention neural network, which dynamically infers the influencers based on users’ current interests. The whole model can be efficiently fit on large-scale data. Experimental results on several real-world data sets demonstrate the effectiveness of our proposed approach over several competitive baselines including state-of-the-art models. The source code and data are available at https://github.com/DeepGraphLearning/RecommenderSystems.

Dynamic interests; social network; graph convolutional networks; session-based recommendation

††journalyear: 2019††copyright: acmcopyright††conference: The Twelfth ACM International Conference on Web Search and Data Mining; February 11–15, 2019; Melbourne, VIC, Australia††booktitle: The Twelfth ACM International Conference on Web Search and Data Mining (WSDM ’19), February 11–15, 2019, Melbourne, VIC, Australia††price: 15.00††doi: 10.1145/3289600.3290989††isbn: 978-1-4503-5940-5/19/02††ccs: Information systems Social recommendation††ccs: Computing methodologies Ranking††ccs: Computing methodologies Learning latent representations

1. Introduction

Online social communities are an essential part of today’s online experience. Platforms such as Facebook, Twitter, and Douban enable users to create and share information as well as consume the information created by others. Recommender systems for these platforms are therefore critical to surface information of interest to users and to improve long-term user engagement. However, online communities come with extra challenges for recommender systems.

First, user interests are dynamic by nature. A user may be interested in sports items for a period of time and then search for new music groups. Second, since online communities often promote sharing information among friends, users are also likely to be influenced by their friends. For instance, a user looking for a movie may be influenced by what her friends have liked. Further, the set of influencers can be dynamic since they can be context-dependent. For instance, a user will trust a set of friends who like comedies when searching for funny films; while she could be influenced by another set of friends when searching for action movies.

Motivating Example. Figure 1 presents the behavior of Alice’s and her friends’ in an online community. Behaviors are described by a sequence of actions (e.g., item clicks). To capture users’ dynamic interests, their actions are segmented into sub-sequences denoted as sessions. We are therefore interested in session-based recommendations (Schafer et al., 1999): within each session, we recommend the next item Alice should consume based on the items in the current session she has consumed so far. Figure 1 presents two sessions: session (a) and (b). In addition, the items consumed by Alice’s friends are also available. We would like to utilize them for better recommendations. We are thus in a session-based social recommendation setting.

In session (a), Alice browses sports items. Two of her friends: Bob and Eva, are notorious sports fans (long-term interests), and they are browsing sports’ items recently (short-term interests). Considering both facts, Alice may be influenced by the two and, e.g., decides to learn more about Ping Pong next. In session (b), Alice is interested in “literature & art” items. The situation is different with session (a) since none of her friends have consumed such items recently. But David is generally interested in this topic (long-term interests). In this case, it would make sense for Alice to be influenced by David, and say, be recommended a book that David enjoyed. These examples show how a user’s current interests combining with the (short- and long-term) interests of different friends’ provide session-based social recommendations. In this paper, we present a recommendation model based on both.

The current recommendation literature has modeled either users’ dynamic interests or their social influences, but, as far as we know, has never combined both (like in the example above). A recent study (Hidasi et al., 2016) models session-level user behaviors using recurrent neural networks, ignoring social influences. Others studied merely social influences (Ma et al., 2011; Zhao et al., 2014; Chaney et al., 2015). For example, Ma et al. (2011) explores the social influence of friends’ long-term preferences on recommendations. However, the influences from different users are static, not depicting the users’ current interests.

We propose an approach to model both users’ session-based interests as well as dynamic social influences. That is, which subset of a user’s friends influence her (the influencers) according to her current session. Our recommendation model is based on dynamic-graph-attention networks. Our approach first models user behaviors within a session using a recurrent neural network (RNN) (Elman, 1990). According to user’s current interests—captured by the hidden representation of the RNN—we capture the influences of friends using the graph-attention network (Velickovic et al., 2018). To provide session-level recommendations, we distinguish the model of friends’ short-term preferences from that of the long-term ones. The influence of each friend, given the user’s current interests, is then determined automatically using an attention mechanism (Bahdanau et al., 2015; Xu et al., 2015).

We conduct extensive experiments on data sets collected from several online communities (Douban, Delicious, and Yelp). Our proposed approach outperforms the well-known competitive baselines by modeling both users’ dynamic behaviors and dynamic social influences.

To summarize, we make the following contributions:

•

We propose to study both dynamic user interests and context-dependent social influences for the recommendation in online communities.

•

We propose a novel recommendation approach based on dynamic-graph-attention networks for modeling both dynamic user interests and context-dependent social influences. The approach can effectively scale to large data sets.

•

We conduct extensive experiments on real-world data sets. Experimental results demonstrate the effectiveness of our model over strong and state-of-the-art baselines.

Organization. §2 discusses related works. In §3 we give a formal definition of the session-based social recommendation problem. Our session-based social recommendation approach is described in §4. §5 presents the experimental results, followed by concluding remarks in §6.

2. Related Work

We discuss three lines of research that are relevant to our work: 1) recommender systems that model the dynamic user behaviors, 2) social recommender systems that take social influence into consideration, and 3) recent progress of convolutional network developed for graph-structured data.

2.1. Dynamic Recommendation

Modeling user interests that change over time has already received some attention (Xiong et al., 2010; Koren, 2010; Charlin et al., 2015). Most of these models are based on (Gaussian) matrix factorization (Mnih and Salakhutdinov, 2008). For example, Xiong et al. (2010) learned temporal representations by factorizing the (user, item, time) tensor. Koren (2010) developed a similar model named timeSVD++. Charlin et al. (2015) modeled similarly but using Poisson factorization (Gopalan et al., 2015). However, these approaches assume that the interest of users changes slowly and smoothly over long-term horizons, typically on the order of months or years. To effectively capture users’ short-term interests, recent works introduce RNN to model their recent (ordered) behaviors. For example, Hidasi et al. (2016) first proposed Session-RNN to model user’s interest within a session. Li et al. (2017) further extended Session-RNN with attention mechanism to capture user’s both local and global interests. Wu et al. (2017) used two separate RNNs to update the representations of both users and items based on new observations. Beutel et al. (2018) built an RNN-based recommender that can incorporate auxiliary context information. These models assume that items exhibit coherence within a period of time, and we use a similar approach to model session-based user interests.

2.2. Social Recommendation

Modeling the influence of friends on user interests has also received attention (Massa and Avesani, 2007; Ma et al., 2008, 2011; Jamali and Ester, 2010; Jiang et al., 2012). Most proposed models are (also) based on Gaussian or Poisson matrix factorization. For example, Ma et al. (2011) studied social recommendations by regularizing latent user factors such that the factors of connected users are close by. Chaney et al. (2015) weighted the contribution of friends on a user’s recommendation using a learned “trust factor”. Zhao et al. (2014) proposed an approach to leverage social networks for active learning. Xiao et al. (2017) framed the problem as transfer learning between the social domain and the recommendation domain. These approaches can model social influences assuming influences are uniform across friends and independent from the user’s preferences. Tang et al. (2012b) and Tang et al. (2012a) proposed multi-facet trust relations, which relies on additional side information (e.g., item category) to define facets. Wang et al. (2016) and Wang et al. (2017) distinguished strong and weak ties among users for recommendation in social networks. However, they ignore the user’s short-term behaviors and integrate context-independent social influences. Our proposed approach models dynamic social influences by modeling the dynamic user interests, and context-dependent social influences.

2.3. Graph Convolutional Networks

Graph convolutional networks (GCNs) inherits convolutional neural networks (CNNs). CNNs have achieved great success in computer vision and several other applications. CNNs are mainly developed for data with 2-D grid structures such as images (Krizhevsky et al., 2012). Recent works focus on modeling more general graph-structure data using CNNs (Bruna et al., 2014; Henaff et al., 2015; Defferrard et al., 2016; Kipf and Welling, 2017). Specifically, Kipf and Welling (2017) proposed graph-convolutional networks (GCNs) for semi-supervised graph classification. The model learns node representations by leveraging both the node attributes and the graph structure. It is composed of multiple graph-convolutional layers, each of which updates node representations using a combination of the current node’s representation and that of its neighbors. Through this process, the dependency between nodes is captured. However, in the original formulation, all neighbors are given the static “weight” when updating the node representations. Velickovic et al. (2018) addressed this problem by proposing graph-attention networks. They weighed the contribution of neighbors differently using an attention mechanism (Bahdanau et al., 2015; Xu et al., 2015).

We propose a dynamic-graph-attention network. Compared to previous work, we focus on a different application (modeling the context-dependent social influences for recommendations). Besides, we model a dynamic graph, where the features of nodes evolve over time, and the attention between nodes also changes along with the current context over time.

3. Problem Definition

Recommender systems suggest relevant items to their users according to their historical behaviors. In classical recommendation models (e.g., matrix factorization (Mnih and Salakhutdinov, 2008)), the order in which a user consumes items is ignored. However, in online communities, user-preferences change rapidly, and the order of user preference behaviors must be considered so as to model users’ dynamic interests. In practice, since users’ entire history record can be extremely long (e.g., certain online communities have existed for years) and users’ interests switch quickly, a common approach is to segment user preference behaviors into different sessions (e.g., using timestamps and consider each user’s behavior within a week as a session) and provide recommendations at session level (Hidasi et al., 2016). We define this problem as follows:

DEFINITION 1. (Session-based Recommendation) Let $U$ denote the set of users and $I$ be the set of items. Each user $u$ is associated with a set of sessions by the time step $T$ , $I^{u}_{T}=\{\vec{S}_{1}^{u},\vec{S}_{2}^{u},\ldots,\vec{S}_{T}^{u}\}$ , where $\vec{S}_{t}^{u}$ is the $t_{th}$ session of user $u$ . Within each session, $\vec{S}_{t}^{u}$ consists of a sequence of user behaviors $\{i_{t,1}^{u},i_{t,2}^{u},\ldots,i_{t,N_{u,t}}^{u}\}$ , where $i_{t,p}^{u}$ is the $p_{th}$ item consumed by user $u$ in $t_{th}$ session, and $N_{u,t}$ is the amount of items in the session. For each user $u$ , given a new session $\vec{S}_{T+1}^{u}=\{i_{T+1,1}^{u},\ldots,i_{T+1,n}^{u}\}$ , the goal of session-based recommendation is to recommend a set of items from $I$ that the user is likely to be interested in during the next step $n+1$ , i.e., $i_{T+1,n+1}^{u}$ .

In online communities, users’ interests are not only correlated to their historical behaviors, but also commonly influenced by their friends. For example, if a friend watches a movie, I may also be interested in watching it. This is known as social influence (Tang et al., 2009). Moreover, the influences from friends are context-dependent. In other words, the influences from friends vary from one situation to another. For example, if a user wants to buy a laptop, she will be more likely referring to friends who are keen on high-tech devices; while she may be influenced by photographer friends when shopping a camera. Like as Figure 1, a user can be influenced by both her friends’ short- and long-term preferences.

To provide an effective recommendation to users in online communities, we propose to model both users’ dynamic interests and context-dependent social influences. We define the resulting problem as follows:

DEFINITION 2. (Session-based Social Recommendation) Let $U$ denote the set of users, $I$ be the set of items, and $G=(U,E)$ be the social network, where $E$ is the set of social links between users. Given a new session $\vec{S}_{T+1}^{u}=\{i_{T+1,1}^{u},\ldots,i_{T+1,n}^{u}\}$ of user $u$ , the goal of session-based social recommendation is to recommend a set of items from $I$ that $u$ is likely to be interested in during the next time step $n+1$ by utilizing information from both her dynamic interests (i.e., information from $\cup_{t=1}^{T+1}\vec{S}_{t}^{u}$ ) and the social influences (i.e., information from $\cup_{k=1}^{N(u)}\cup_{t=1}^{T}\vec{S}_{t}^{k}$ , where $N(u)$ is the set of $u$ ’s friends).

4. Dynamic Social Recommender Systems

As is discussed previously, users are not only guided by their current preferences but also by their friends’ preferences. We propose a novel dynamic graph attention model Dynamic Graph Recommendation (DGRec) which models both types of preferences.

DGRec is composed of four modules (Figure 2). First (§4.1), a recurrent neural network (RNN) (Elman, 1990) models the sequence of items consumed in the (target) user’s current session. Her friends’ interests are modeled using a combination of their short- and long-term preferences (§4.2). The short-term preferences, or items in their most recent session, are also encoded using an RNN. Friends’ long-term preferences are encoded with a learned individual embedding. The model then combines the representation of the current user with the representations of her friends using a graph-attention network (§4.3). This is a key part of our model and contribution: our proposed mechanism learns to weigh the influence of each friend based on the user’s current interests. At the final step (§4.4), the model produces recommendations by combining a user’s current preferences with her (context-dependent) social influences.

4.1. Dynamic Individual Interests

To capture a user’s rapidly-changing interests, we use RNN to model the actions (e.g., clicks) of the (target) user in the current session. RNN is standard for modeling sequences and has recently been used for modeling user (sequential) preference data (Hidasi et al., 2016). The RNN infers the representation of a user’s session $\vec{S}_{T+1}^{u}=\{i_{T+1,1}^{u},\ldots,i_{T+1,n}^{u}\}$ , token by token by recursively combining the representation of all previous tokens with the latest token, i.e.,

[TABLE]

where $h_{n}$ represents a user’s interests and $f(\cdot,\cdot)$ is a non-linear function combining both sources of information. In practice, the long short-term memory (LSTM) (Hochreiter and Schmidhuber, 1997) unit is often used as the combination function $f(\cdot,\cdot)$ :

[TABLE]

where $\sigma$ is the sigmoid function: $\sigma(x)=(1+\exp(-x))^{-1}$ .

4.2. Representing Friends’ Interests

We consider both friends’ short- and long-term interests. Short-term interests are modeled using the sequence of recently-consumed items (e.g., a friend’s latest online session). Long-term interests represent a friend’s average interest and are modeled using individual embedding.

Short-term preference: For a target user’s current session $\vec{S}_{T+1}^{u}$ , her friends’ short-term interests are represented using their sessions right before session $T+1$ (our model generalizes beyond single session but this is effective empirically). Each friend $k$ ’s actions $\vec{S}_{T}^{k}=\{i_{T,1}^{k},i_{T,2}^{k},\ldots,i_{T,N_{k,T}}^{k}\}$ are modeled using an RNN. In fact, here we reuse the RNN for modeling the target user’s session (§ 4.1). In other words, both RNNs share the same weights. We represent friend $k$ ’s short-term preference $s_{k}^{s}$ by the final output of the RNN:

[TABLE]

Long-term preference: Friends’ long-term preferences reflect their average interests. Since long-term preferences are not time-sensitive, we use a single vector to represent them. Formally,

[TABLE]

where friend $k$ ’s long-term preference $s_{k}^{l}$ is the $k_{th}$ row of the user embedding matrix $\textbf{W}_{u}$ .

Finally, we concatenate friends’ short- and long-term preferences using a non-linear transformation:

[TABLE]

where $ReLU(x)=max(0,x)$ is a non-linear activation function and $\textbf{W}_{1}$ is the transformation matrix.

4.3. Context-dependent Social Influences

We described how we obtain representations of target user (§ 4.1) and her friends (§ 4.2). We now combine both into a single representation that we then use downstream (§4.4). The combined representation is a mixture of the target user’s interest and her friends’ interest.

We obtain this combined representation using a novel graph-attention network. First, we encode the friendship network in a graph where nodes correspond to users (i.e., target users and their friends) and edges denote friendship. In addition, each node uses its corresponding user’s representation (§4.1 & §4.2) as (dynamic) features. Second, these features are propagated along the edges using a message-passing algorithm (Gilmer et al., 2017). The main novelty of our approach lies in using an attention mechanism to weigh the features traveling along each edge. A weight corresponds to the level of a friend’s influence. After a fixed number of iterations of message passing, the resulting features at the target user’s node are the combined representation.

Below we detail how we design the node features as well as the accompanying graph-attention mechanism.

4.3.1. Dynamic feature graph

For each user, we build a graph where nodes correspond to that user and her friends. For target user $u$ with $|N(u)|$ friends, the graph has $|N(u)|+1$ nodes. User $u$ ’s initial representation $h_{n}$ is used as node $u$ ’s features $h_{u}^{(0)}$ (the features are updated whenever $u$ consumes a new item in $\vec{S}_{T+1}^{u}$ ). For a friend $k$ , the corresponding node feature is set to $s_{k}$ and remains unchanged for the duration of time step $T+1$ . Formally, the node features are $h_{u}^{(0)}=h_{n}$ and { $h_{k}^{(0)}=s_{k},k\in{N(u)}$ }.

4.3.2. Graph-Attention Network

With the node features defined as above, we then pass messages (features) to combine friends’ and the target user’s interests. This procedure is formalized as inference in a graph convolutional network (Kipf and Welling, 2017).

Kipf and Welling (2017) introduce graph convolutional networks for semi-supervised node representation learning. In these networks, the convolutional layers “pass” the information between nodes. The number of layers $L$ of the networks corresponds to the number of iterations of message passing.111We propagate information on a graph that also contains higher-order relationships (e.g., friends of friends of friends) in practice. In the $l^{th}$ layer of the network, the target user then receives information from users that are $l$ degrees away. However, all neighbors are treated equally. Instead, we propose a novel dynamic graph attention network to model context-dependent social influences.

The fixed symmetric normalized Laplacian is widely used as a propagation strategy in existing graph convolutional networks (Defferrard et al., 2016; Kipf and Welling, 2017). In order to distinguish the influence of each friend, we must break the static propagation schema first. We propose to use an attention mechanism to guide the influence propagation. The process is illustrated in Figure 3. We first calculate the similarity between the target user’s node representation $h_{u}^{(l)}$ and all of its neighbors’ representations $h_{k}^{(l)}$ :

[TABLE]

where $h_{u}^{(l)}$ is the representation of node/user $u$ at layer $l$ , and $f(h_{u}^{(l)},h_{k}^{(l)})={h_{u}^{(l)}}^{\top}h_{k}^{(l)}$ is the similarity function between two elements. Intuitively, $\alpha_{uk}^{(l)}$ is the level of influence or weight of friend $k$ on user $u$ (conditioned on the current context $h_{u}^{(l)}$ ). Note that we also include a self-connection edge to preserve a user’s revealed interests. $\alpha_{u:}^{(l)}$ then provide the weights to combine the features:

[TABLE]

where $\tilde{h}_{u}^{(l)}$ is a mixture of user $u$ ’s friends’ interests at layer $l$ , followed by a non-linear transformation: $h_{u}^{(l+1)}=ReLU(\textbf{W}^{(l)}\tilde{h}_{u}^{(l)}).$ $\textbf{W}^{(l)}$ is the shared and learnable weight matrix at layer $l$ . We obtain the final representation of each node by stacking this attention layer $L$ times.222We also tested our model with two popular context-independent propagation strategies that do not use an attention mechanism: a) averaging friends’ interests and; b) element-wise max-pooling over their interests—similar to techniques for aggregating word-level embeddings (Weston et al., 2014). Mean aggregation outperforms the latter, but both are inferior to our proposed attention model. The combined (social-influenced) representation is denoted by $h_{u}^{(L)}$ .

4.4. Recommendation

Since a user’s interest depends on both her recent behaviors and social influences, her final representation is obtained by combining them using a fully-connected layer:

[TABLE]

where $\textbf{W}_{2}$ is a linear transformation matrix, and $\hat{h}_{n}$ is the final representation of the user $u$ ’s current interest.

We then obtain the probability that the next item will be $y$ using a softmax function:

[TABLE]

where $N(u)$ are user $u$ ’s set of friends according to the social network $G$ , $z_{y}$ is the embedding of item $y$ , and $|I|$ the total number of items.

4.5. Training

We train the model by maximizing the log-likelihood of the observed items in all user sessions:

[TABLE]

This function is optimized using gradient descent.

5. Experiments

Studying the effectiveness of our DGRec using real-world data sets, we highlight the following results:

•

DGRec significantly outperforms all seven methods that it is compared to under all experimental settings.

•

Ablation studies demonstrate the usefulness of the different components of DGRec.

•

Exploring the fitted models shows that attention contextually weighs the influences of friends.

5.1. Experimental Setup

5.1.1. Data Sets

We study all models using data collected from three well-known online communities. Descriptive statistics for all data sets are in Table 1.

Douban.333http://www.douban.com A popular site on which users can review movies, music, and books they consume. We crawled the data using the identities of the users in the movie community, obtaining every movie they reviewed along with associated timestamps. We also crawled the users’ social networks. We construct our data set by using each review as an evidence that a user consumed an item. Users tend to be highly active on Douban so we segment users’ behaviors (movie consumption) into week-long sessions.

Delicious.444Data set available from https://grouplens.org/datasets/hetrec-2011/ An online bookmarking system where users can store, share, and discover web bookmarks and assign them a variety of semantic tags. The task we consider is personalized tag recommendations for bookmarks. Each session is a sequence of tags a user has assigned to a bookmark (tagging actions are timestamped). This differs from the ordinary definition of sessions as a sequence of consumptions over a short horizon.

Yelp.555Data set available from https://www.yelp.com/dataset An online review system where users review local businesses (e.g., restaurants and shops). Similar as for Douban, we treat each review as an observation. Based on the empirical frequency of the reviews, we segment the data into month-long sessions.

We also tried different segmentation strategies. Preliminary results showed that our method consistently outperformed Session-RNN and NARM for other session lengths. We leave a systematic study for optimizing session segmentation as our future work.

5.1.2. Train/valid/test splits

We reserve the sessions of the last $d$ days for testing and filter out items that did not appear in the training set. Due to the different sparseness of the three data sets, we choose $d=180,50$ and $25$ for Douban, Yelp and Delicious data sets respectively. We randomly and equally split the held out sessions into validation and test sets.

5.1.3. Competing Models

We compare DGRec to three classes of recommenders: (A) classical methods that utilize neither social nor temporal factors; (B) social recommenders, which take context-independent social influences into consideration; and (C) session-based recommendation methods, which model user interests in sessions. (Below, we indicate a model’s class next to its name.)

•

ItemKNN (Linden et al., 2003) (A): inspired by the classic KNN model, it looks for items that are similar to items liked by a user in the past.

•

BPR-MF (Rendle et al., 2009) (A): matrix factorization (MF) technique trained using a ranking objective as opposed to a regression objective.

•

SoReg (Ma et al., 2011) (B): uses the social network to regularize the latent user factors of matrix factorization.

•

SBPR (Zhao et al., 2014) (B): an approach for social recommendations based on BPR-MF. The social network is used to provide additional training samples for matrix factorization.

•

TranSIV (Xiao et al., 2017) (B): uses shared latent factors to transfer the learned information from the social domain to the recommendation domain.

•

RNN-Session (Hidasi et al., 2016) (C): recent state-of-the-art approach that uses recurrent neural networks for session-based recommendations.

•

NARM (Li et al., 2017) (C): a hybrid model of both session-level preferences and the user’s “main purpose”, where the main purpose is obtained via attending on previous behaviors within the session.

5.1.4. Evaluation Metrics

We evaluate all models with two widely used ranking-based metrics: Recall@K and Normalized Discounted Cumulative Gain (NDCG).

Recall@K measures the proportion of the top-K recommended items that are in the evaluation set. We use $K=20$ .

NDCG is a standard ranking metric. In the context of session-based recommendation, it is formulated as: $\text{NDCG}=\frac{1}{\log_{2}(1+\text{rank}_{pos})}$ , where $\text{rank}_{pos}$ denotes the rank of a positive item. We report the average value of NDCG over all the testing examples.

5.1.5. Hyper-parameter Settings

For RNN-Session, NARM and our models, we use a batch size of 200. We use Adam (Kingma and Ba, 2014) for optimization due to its effectiveness with $\beta_{1}=0.9$ , $\beta_{2}=0.999$ and $\epsilon=1e^{-8}$ as suggested in TensorFlow (et al., 2015). The initial learning rate is empirically set to 0.002 and decayed at the rate of 0.98 every 400 steps. For all models, the dimensions of the user (when needed) and item representations are fixed to 100 following Hidasi et al. (2016). We cross-validated the number of hidden units of the LSTMs and the performance plateaued around 100 hidden units. The neighborhood sample sizes are empirically set to 10 and 15 in the first and second convolutional layers, respectively. We tried to use more friends in each layer but observed no significant improvement. In our models, dropout (Srivastava et al., 2014) with rate $0.2$ is used to avoid overfitting.

5.1.6. Implementation Details

We implement our model using TensorFlow (et al., 2015). Training graph attention networks on our data with mini-batch gradient descent is not trivial since node degrees have a large range. We found the neighbor sampling technique proposed in (Hamilton et al., 2017) pretty effective. Further, to reasonably reduce the computational cost of training DGRec, we represent friends’ short-term interests using only their most recent sessions.

5.2. Quantitative Results

The performance of different algorithms is summarized in Table 2. ItemKNN and BPR-MF perform very similarly, except on Douban. A particularity of Douban is that users typically only consume each item once (different from Delicious and Yelp). MF-based methods tend to recommend previously consumed items which explain BPR-MF’s poor performance. By modeling social influence, the performance of social recommenders improves compared to BPR-MF in most cases. However, the improvement is marginal because these three algorithms (B) only model context-independent social influence. By modeling dynamic user interests, RNN-Session significantly outperforms ItemKNN and BPR, which is consistent with the results in Hidasi et al. (2016). Further, NARM extends RNN-Session by explicitly modeling user’s main purpose and becomes the strongest baseline. Our proposed model DGRec achieves the best performance among all the algorithms by modeling both user’s dynamic interests and context-dependent social influences. Besides, the improvement over RNN-Session and NARM is more significant compared to that of SoReg over BPR-MF, which shows the necessity of modeling context-dependent social influences.

5.3. Variations of DGRec

To justify and gain further insights into the specifics of DGRec’s architecture, we now study and compare variations of our model.

5.3.1. Self v.s. Social

DGRec obtains users’ final preferences as a combination of user’s consumed items in the current session and context-dependent social influences (see Eq. 8). To tease apart the contribution of both sources of information, we compare DGRec against two submodels: a) (DGRec ${}_{\text{self}}$ ) a model of the user’s current session only (Eq. 8 without social influence features $h_{u}^{(L)}$ ) and; b) (DGRec ${}_{\text{social}}$ ) a model using context-dependent social influence features only (Eq. 8 without individual features $h_{n}$ ). Note that when using individual features only, DGRec ${}_{\text{self}}$ is identical to RNN-Session (hence the results are reproduced from Table 2). Table 3 reports the performance of all three models on our data sets. DGRec ${}_{\text{self}}$ consistently outperforms DGRec ${}_{\text{social}}$ across all three data sets, which means that overall users’ individual interests have a higher impact on recommendation quality. Compared to the full model DGRec, the performance of both DGRec ${}_{\text{self}}$ and DGRec ${}_{\text{social}}$ significantly decreases. To achieve good recommendation performance in online communities, it is, therefore, crucial to model both a user’s current interests as well as her (dynamic) social influences.

5.3.2. Short-term v.s. Long-term

DGRec provides a mechanism for encoding friends’ short- as well as long-term interests (see § 4.2). We study the impact of each on the model’s performance. Similar to above, we compare using either short- or long-term interests to the results of using both. Figure 4 reports that for Douban, the predictive capability of friends’ short-term interests outperforms that of friends’ long-term interest drastically, and shows comparable performance in regard to the full model. It is reasonable, considering that the interests of users in online communities (e.g., Douban) change frequently, and exploiting users’ short-term interests should be able to predict user behaviors more quickly. Interestingly, on the data set Delicious, different results are observed. Using long-term interests yield more accurate predictions than doing short-term. This is not surprising since, on Delicious website, users tend to have static interests.

5.3.3. Number of Convolutional Layers

DGRec aggregates friends’ interests using a multi-layer graph convolutional network. More convolutional layers will yield influences from higher-order friends. In our study so far we have used two-layer graph convolutional networks. To validate this choice we compare the performance to one- and three-layer networks but maintain the number of selected friends to 10 and 5 in the first and third layer, respectively. Table 4 shows a significant decline in performance when using a single layer. This implies that the interests of friends’ friends (obtained by 2 layers) is important for recommendations.

Next, we test our model using three convolutional layers to explore the influences of even higher-order friends. The influence of the third layer on the performance is small. There is a small improvement for Yelp but a slightly larger drop in performance for both Douban and Delicious, which may be attributed to model overfitting or noises introduced by higher-order friends. This confirms that two convolutional layers are enough for our data sets.

5.4. Exploring Attention

DGRec uses an attention mechanism to weigh the contribution of different friends based on a user’s current session. We hypothesized that while friends have varying interests, user session typically only explores a subset of these interests. As a consequence, for a target user, different subsets of her friends should be relied upon in different situations. We now explore the results of the attention learned by our model.

First, we randomly select a Douban user from those who have at least 5 test sessions as well as 5 friends and plot her attention weights (Eq. 6) within and across session(s) in Figure 5. For the inter-session level plot (left), we plot the average attention weight of a friend within a session. For intra-session level plot (right), the user’s attention weights within one session (i.e., SessionId=7) are presented. We make the following observations. First, the user allocates her attention to different friends across different sessions. This indicates that social influence is indeed conditioned on context (i.e., target user’s current interests). Further, friend #8 obtains little attention in all sessions, which means that social links do not necessarily lead to observed shared interest. Second, the distribution of attention is relatively stable within a single session. This confirms that the user’s behaviors are coherent in a short period and suitable to be processed in a session manner.

As a second exploration of the behavior of the attention mechanism we take a macro approach and analyze the attention across all users (as opposed to a single user across friends). We use the attention levels inferred on the Douban test set. Figure 6 reports the empirical distributions of the inter-session (brown) and intra-session (blue) attention variance (i.e., how much does the attention weights vary in each case). The intra-session variance is lower on average. This agrees with our assumption that users’ interests tend to be focused within a short time so that the same set of friends are attended to for the duration of a session. On the contrary, a user is more likely to trust different friends in different sessions, which further validates modeling context-dependent social influences via attention-based graph convolutional networks.

6. Conclusions

We propose a model based on graph convolutional networks for session-based social recommendation in online communities. Our model first learns individual user representations by modeling the users’ current interests. Each user’s representation is then aggregated with her friends’ representations using a graph convolutional networks with a novel attention mechanism. The combined representation along with the user’s original representation is then used to form item recommendations. Experimental results on three real-world data sets demonstrate the superiority of our model compared to several state-of-the-art models. Next steps involve exploring user and item features indicative of preferences and further improving the performance of recommender systems for online communities.

7. Acknowledgement

This paper is partially supported by Beijing Municipal Commission of Science and Technology under Grant No. Z181100008918005 as well as the National Natural Science Foundation of China (NSFC Grant Nos.61772039, 61472006 and 91646202). We would like to thank Haoran Shi for collecting Douban data used in this paper.

Bibliography42

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1(1)
2Bahdanau et al . (2015) Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. In International Conference on Learning Representations .
3Beutel et al . (2018) Alex Beutel, Paul Covington, Sagar Jain, Can Xu, Jia Li, Vince Gatto, and Ed H Chi. 2018. Latent Cross: Making Use of Context in Recurrent Recommender Systems. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining . ACM, 46–54.
4Bruna et al . (2014) Joan Bruna, Wojciech Zaremba, Arthur Szlam, and Yann Le Cun. 2014. Spectral networks and locally connected networks on graphs. In International Conference on Learning Representations .
5Chaney et al . (2015) Allison JB Chaney, David M Blei, and Tina Eliassi-Rad. 2015. A probabilistic model for using social networks in personalized item recommendation. In Proceedings of the 9th ACM Conference on Recommender Systems . ACM, 43–50.
6Charlin et al . (2015) Laurent Charlin, Rajesh Ranganath, James Mc Inerney, and David M Blei. 2015. Dynamic poisson factorization. In Proceedings of the 9th ACM Conference on Recommender Systems . ACM, 155–162.
7Defferrard et al . (2016) Michaël Defferrard, Xavier Bresson, and Pierre Vandergheynst. 2016. Convolutional neural networks on graphs with fast localized spectral filtering. In Advances in Neural Information Processing Systems . 3844–3852.
8Elman (1990) Jeffrey L Elman. 1990. Finding structure in time. Cognitive science 14, 2 (1990), 179–211.