Effective Community Search on Large Attributed Bipartite Graphs

Zongyu Xu; Yihao Zhang; Long Yuan; Yuwen Qian; Zi Chen; Mingliang; Zhou; Qin Mao; Weibin Pan

arXiv:2302.14498·cs.SI·March 2, 2023

Effective Community Search on Large Attributed Bipartite Graphs

Zongyu Xu, Yihao Zhang, Long Yuan, Yuwen Qian, Zi Chen, Mingliang, Zhou, Qin Mao, Weibin Pan

PDF

Open Access 1 Repo

TL;DR

This paper introduces new algorithms for community search in large attributed bipartite graphs, effectively incorporating node attributes to improve the cohesion of the identified communities.

Contribution

The paper proposes the first algorithms that integrate node attributes into community search on bipartite graphs, enhancing result relevance and cohesion.

Findings

01

Algorithms are effective on eight large graphs

02

Proposed methods outperform baseline approaches

03

Query efficiency and community quality are improved

Abstract

Community search over bipartite graphs has attracted significant interest recently. In many applications such as user-item bipartite graph in E-commerce, customer-movie bipartite graph in movie rating website, nodes tend to have attributes, while previous community search algorithm on bipartite graphs ignore attributes, which makes the returned results with poor cohesion with respect to their node attributes. In this paper, we study the community search problem on attributed bipartite graphs. Given a query vertex q, we aim to find attributed $(α, β)$ -communities of $G$ , where the structure cohesiveness of the community is described by an $(α, β)$ -core model, and the attribute similarity of two groups of nodes in the subgraph is maximized. In order to retrieve attributed communities from bipartite graphs, we first propose a basic algorithm composed…

Tables2

Table 1. Table 1 . Symbols and meanings

Symbol	Meaning
G(U,V,E)	An attributed bipartite graph with vertex set U and V, and
	edge set E
$W_{U} (u)$	The keyword set of vertex u in U(G)
$W_{V} (v)$	The keyword set of vertex v in V(G)
$d e g (u, G)$	The degree of vertex u in U(G)
$d e g (v, G)$	The degree of vertex v in V(G)
$G [S_{u}^{^{'}}, S_{v}^{^{'}}]$	The largest connected subgraph of G s.t. q $\in G [S_{u}^{^{'}}, S_{v}^{^{'}}]$ , and
	$\forall u \in G [S_{u}^{^{'}}, S_{v}^{^{'}}], S_{u}^{^{'}} \subseteq W_{U} (u)$ , $\forall v \in G [S_{u}^{^{'}}, S_{v}^{^{'}}], S_{v}^{^{'}} \subseteq W_{V} (v)$
$G_{(α, β)} [S_{u}^{^{'}}, S_{v}^{^{'}}]$	The largest connected subgraph of G s.t. q $\in G_{(α, β)} [S_{u}^{^{'}}, S_{v}^{^{'}}]$ ,
	and $\forall u \in G_{(α, β)} [S_{u}^{^{'}}, S_{v}^{^{'}}], d e g (u, G) \geq α, S_{u}^{^{'}} \subseteq W_{U} (u)$ , $\forall v \in$
	$G_{(α, β)} [S_{u}^{^{'}}, S_{v}^{^{'}}], d e g (v, G) \geq β, S_{v}^{^{'}} \subseteq W_{V} (v)$

Table 2. Table 2 . Datasets used in our experiments

ID	Dataset	$\| U \|$	$\| V \|$	$\| E \|$	$\hat{d}$
D0	Enwikibooks(Wikibooks edits)	79,268	249,725	766,272	4.66
D1	Movie(Actor movies)	127,823	383,640	1,470,404	5.75
D2	IMDB(komarix-imdb)	685,568	186,414	2,715,604	6.23
D3	Actor(actor2)	303,617	896,302	3,782,463	6.30
D4	Discogs(Discogs)	1,754,823	270,771	5,302,276	5.24
D5	Idwiki(edit-idwiki)	125,481	2,183,494	6,126,592	5.31
D6	Plwiki(edit-plwiki)	207,781	2,664,432	21,219,204	14.78
D7	Nlwiki(edit-nlwiki)	220,847	3,800,349	22,142,951	11.01

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

892681347/attributebigraph
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComplex Network Analysis Techniques · Caching and Content Delivery · Web Data Mining and Analysis

Full text

Effective Community Search on Large Attributed Bipartite Graphs

Zongyu Xu

Nanjing University of Science and Technology

[email protected]

,

Yihao Zhang

Nanjing University of Science and Technology

[email protected]

,

Long Yuan

Nanjing University of Science and Technology

[email protected]

,

Yuwen Qian

Nanjing University of Science and Technology

[email protected]

,

Zi Chen

East China Normal University

[email protected]

,

Mingliang Zhou

Chongqing University

[email protected]

,

Qin Mao

Qiannan Normal Coll Nationalities

[email protected]

and

Weibin Pan

North Information Control Research Academy Group Co.

[email protected]

Abstract.

Community search over bipartite graphs has attracted significant interest recently. In many applications such as user-item bipartite graph in E-commerce, customer-movie bipartite graph in movie rating website, nodes tend to have attributes, while previous community search algorithm on bipartite graphs ignore attributes, which makes the returned results with poor cohesion with respect to their node attributes. In this paper, we study the community search problem on attributed bipartite graphs. Given a query vertex q, we aim to find attributed $\left(\alpha,\beta\right)$ -communities of $G$ , where the structure cohesiveness of the community is described by an $\left(\alpha,\beta\right)$ -core model, and the attribute similarity of two groups of nodes in the subgraph is maximized. In order to retrieve attributed communities from bipartite graphs, we first propose a basic algorithm composed of two steps: the generation and verification of candidate keyword sets, and then two improved query algorithms Inc and Dec are proposed. Inc is proposed considering the anti-monotonity property of attributed bipartite graphs, then we adopt different generating method and verifying order of candidate keyword sets and propose the Dec algorithm. After evaluating our solutions on eight large graphs, the experimental results demonstrate that our methods are effective and efficient in querying the attributed communities on bipartite graphs.

Community search; Bipartite graphs; Attributed graphs.

††copyright: none

1. Introduction

With the proliferation of graph data, research efforts have been devoted to many fundamental problems in managing and analyzing graph data (Yuan et al., 2016, 2017a; Chen et al., 2018, 2021; Huang et al., 2017; Yang et al., 2018; Wu et al., 2019; Chen et al., 2020; Zhang et al., 2021b; Hao et al., 2021; Yang et al., 2021c, a; Yang et al., 2021b). Bipartite graphs are widely used to represent the relationships between two different types of entities in many real-world applications, such as user-page networks (Beutel et al., 2013; Qiao et al., 2021), customer-product networks (Wang et al., 2006; Qi et al., 2022), collaboration networks (Ley, 2002; Cai et al., 2022), gene co-expression networks (Kaytoue et al., 2011; Zhu et al., 2022). In these practical networks, community structure naturally exists, and a number of cohesive subgraph models (e.g., $\left(\alpha,\beta\right)$ -core (Liu et al., 2019), bitruss (Wang et al., 2020), and biclique (Lyu et al., 2020)) are proposed to capture the communities in the bipartite graphs. Following these models, community search over bipartite graphs that aims to find densely connected subgraphs satisfying specified structural cohesiveness conditions has been studied in applications such as anomaly detection (Lyu et al., 2020), personalized recommendation (Kumar et al., 1999), and gene expression analysis (Madeira and Oliveira, 2004).

In the aforementioned real-world applications, the entities modeled by the vertices of bipartite graphs often have properties represented by text strings or keywords. When performing community search over such bipartite graphs, previous studies often only focus on the structural cohesiveness of communities but ignore the attributes of the vertices. However, these attributes are important for making sense of communities(Berahmand et al., 2020; Berahmand et al., 2021, 2022), and taking the attributes into consideration provides more personalization and interpretation regarding the returned results(Huang and Lakshmanan, 2017; Fang et al., 2016), while there are few researches on community search based on attributed bipartite graphs.

Motivated by this, we study the $attributed\ (\alpha,\beta)$ - $community$ $search$ problem on attributed bipartite graphs in this paper. Specifically, given an attributed bipartite graph $G$ and a query vertex $q\in G$ , we aim to find one or more attributed communities in $G$ such that these communities meet both structure cohesiveness (e.g., each vertex in upper layer has at least $\alpha$ neighbors and each vertex in lower layer has at least $\beta$ neighbors) and keyword cohesiveness (e.g., vertices in the same layer share the most keywords).

Applications. Attributed $(\alpha,\beta)$ -community has many real-world applications. For example,

•

Personalized product recommendation. Attributed $(\alpha,\beta)$ -commu-nity can be used to recommend personalized products. Consider the sub customer-movie subnetwork of IMDB (https://www.imdb.com), where the vertices in the upper layer represent the consumers and the associated attributes describe his or her preference for movies, the vertices in the lower layer represent the movies and the associated attributes describe its genres. The platforms can utilize the attributed $(\alpha,\beta)$ -community model to provide personalized recommendation. For example, as Fig.1 shows, if we regard $u_{2}$ as the query customer, we can find a (2,2)-community composed of viewers $\{u2,u3,u4,u5\}$ and movies $\{v2,v3,v4,v5,v6\}$ . In this community “u2” who prefer “Drama” and “Romance” movies may not be interested in “v2”. We further consider the keyword cohesiveness of this community and find an attributed (2,2)-community containing viewers $\{u2,u3,u4\}$ who share the same preference for “Drama” movies and the movies $\{v3,v4,v5\}$ with genre “Drama”. We can recommend the movie “v5” which the user is likely to be interested in to the query viewer “u2”.

•

Team Formation. In a bipartite graph composed of developers and projects, an edge between a developer and a project indicates that the developer participates in the project, the keywords of developers show their skills while that of projects indicate the technology it requires. When there is a new project to complete, a developer may wish to form a team as cohesive as possible with all developers in this team having the skills that the project requires, which can be supported by an attributed $(\alpha,\beta)$ -community search over the bipartite graph through specifying keywords of the new project.

Although attributed $(\alpha,\beta)$ -community search is useful in real applications. it is still inapplicable if the search cannot be finished efficiently, considering that attributed bipartite graph can be very large, and the (structure and keyword) cohesiveness criteria can be complex to handle. A simple way is first to consider all the possible attribute combinations, and then return the corresponding $(\alpha,\beta)$ -community that have the most shared attributes. However, the possible number of attribute combinations is exponential, which makes this approach infeasible in practice.

To address this problem, we observe that the attributed $(\alpha,\beta)$ -community owns the anti-monotonicity property, namely, for a given set $\mathcal{A}$ of attributes, if it appears in every vertex of an attributed $(\alpha,\beta)$ -community, then every subset $\mathcal{A}^{\prime}$ of $\mathcal{A}$ , there exists an attributed $(\alpha,\beta)$ -community in which every vertex contains $\mathcal{A}^{\prime}$ . Following this observation, we devise efficient algorithms which can significantly reduce the search space when compute the results.

Contributions. In this paper, we make the following contributions.

•

The first work on attributed $(\alpha,\beta)$ -community search over attributed bipartite graphs. In this paper, we propose the $(\alpha,\beta)$ -community search problem. To the best of our knowledge, this is the first work on attributed $(\alpha,\beta)$ -community search.

•

Efficient algorithms to conduct the $(\alpha,\beta)$ -community search. Based on the anti-monotonicity property, we devise efficient algorithms to conduct the $(\alpha,\beta)$ -community search.

•

Extensive experiments on real datasets. We conduct extensive experiments to evaluate the performance of the proposed algorithms. The experimental results demonstrates the efficiency of our proposed algorithms.

Outline. The remainder of this paper is organized as follows. Section 2 presents some related works. Section 3 describes the proposed problem and definitions. A basic solution, enumerating all possible keyword sets and searching for $(\alpha,\beta)$ -communities with the most shared keywords, is described in Section 4. Section 5 describes two more efficient algorithms generating and verifying candidate keyword sets in different ways. Section 6 discusses the obtained results with our approaches. Finally, conclusion will be found in Section 7.

2. Related Work

2.1. Community search on unipartite graphs.

Community search performed on unipartite graphs usually using different cohesiveness models such as k-core(Seidman, 1983), k-truss (Cohen, 2008), clique(Fang et al., 2019b). For a detailed survey, see Ref. (Fang et al., 2020). Based on k-core, two online algorithms and one index-based algorithm for k-core community search on unipartite graphs are studied, Cui et al.(Cui et al., 2014) propose a local search algorithm, Sozio et al.(Sozio and Gionis, 2010)propose a global search algorithm, Barbieri et al.(Barbieri et al., 2015) propose a tree-like index structure, and Wu et al.(Wu et al., 2021) study the maximal personalized influential community search. Using k-core, Fang et al.(Fang et al., 2016, 2017a, 2019a) further integrate the attributes of vertices to identify community and then the spatial locations of vertices are also considered to identify community(Fang et al., 2017b; Wang et al., 2018; Ji et al., 2021). For the truss-based community search, Huang et al.(Huang et al., 2014) propose the triangle-connected k-truss community model and then study the closest model.(Huang et al., 2015), Akbas et al. (Akbas and Zhao, 2017) also study the triangle-connected k-truss community model and propose an index-based search algorithm. Acquisti et al.(Acquisti and Gross, 2006) present an efficient k-clique component detection algorithm and Yuan et al.(Yuan et al., 2017b) study the problem of densest clique percolation community search.

2.2. Community search/detection on bipartite graphs.

On bipartite graphs, several existing works (Ding et al., 2017; He et al., 2021; Liu et al., 2019, 2020) extend the k-core model on unipartite graph to the $(\alpha,\beta)$ -core model. Ding et al.(Ding et al., 2017)extend the linear k-core mining algorithm to compute $(\alpha,\beta)$ -core. He et al.(He et al., 2021) first consider both tie strength and vertex engagement on bipartite graphs and propose a novel cohesive subgraph model. Liu et al.(Liu et al., 2019, 2020) present an efficient algorithm based on a novel index to compute $(\alpha,\beta)$ -core in linear time regarding the result size. Based on the butterfly structure, Sariyuce et al.(Sarıyüce and Pinar, 2018), Wang et al.(Wang et al., 2019, 2020), Zou et al.(Zou, 2016) study the bitruss model in bipartite graphs which is the maximal subgraph where each edge is contained in at least k butterflies. Zhang et al.(Zhang et al., 2014) study the biclique enumeration problem. zhang et al.(Zhang et al., 2021a) are the first to consider both structure cohesiveness and weight of vertices on bipartite graphs and then propose a novel cohesive subgraph model. Wang et al.(Wang et al., 2021) present a novel index structure and study the significant community search problem on weighted bipartite graphs, which is the first to study community search on bipartite graphs. However, community search on attributed bipartite graphs remains largely unexplored.

3. Problem Definition

Our problem is defined over an undirected attributed bipartite graph $G=(U,V,E)$ , which consists of nodes divided into two separate sets, $U$ and $V$ , such that every edge connects one node in $U$ to another node in $V$ . We use $U(G)$ and $V(G)$ to denote the two disjoint node sets of $G$ and $E(G)$ to represent the edge set of $G$ . Each vertex $u\in U(G)\ (v\in V(G))$ is associated with a set of keywords denoted by $W_{U}(u)\ (W_{V}(v))$ . An edge $e$ between two vertices $u$ and $v$ in $G$ is denoted as $(u,v)$ . We denote the number of nodes in $U(G)$ and $V(G)$ as $n_{u}$ and $n_{v}$ , the total number of nodes as $n$ and the number of edges in $E(G)$ as $m$ . The set of neighbors of a vertex $u$ in $G$ is denoted as $N(u,G)=\{v\in V(G)|(u,v)\in E(G)\}$ , and the degree of $u$ is denoted as $deg(u,G)=|N(u,G)|$ . Table 1 lists the symbols used in the paper.

Definition 0 ( $(\alpha,\beta)$ -Core).

Given a bipartite graph $G$ and two positive integers $\alpha$ and $\beta$ , a subgraph $C_{\alpha,\beta}$ is an $(\alpha,\beta)$ -core of $G$ if $deg(u,C_{\alpha,\beta})\geq\alpha$ for each $u\in U(C_{\alpha,\beta})$ and $deg(v,C_{\alpha,\beta})\geq\beta$ for each $v\in V(C_{\alpha,\beta})$ .

Example 3.2.

In Fig.2(a), $\{A,C,D,E,G,H,I\}$ is a (2,2)-core. The (1,1)-core has vertices $\{A,B,C,D,E,F,G,H,I,J,K\}$ , and is composed of two (1,1)-core components: $\{A,B,C,D,E,F,G,H,I\}$ and $\{J,K\}$ . Each $(\alpha,\beta)$ -core in Fig.2(a) is listed in Fig.2(b).

Definition 0 ( $(\alpha,\beta)$ -Connected Component).

Given a bipartite graph $G$ and its $(\alpha,\beta)$ -core, $C_{\alpha,\beta}$ , a subgraph $G_{\alpha,\beta}$ is an $(\alpha,\beta)$ -connected component if (1) $G_{\alpha,\beta}\subseteq C_{\alpha,\beta}$ and $G_{\alpha,\beta}$ is connected; (2) $G_{\alpha,\beta}$ is maximal.

Definition 0 ( $(\alpha,\beta)$ -Community).

Given a vertex $q$ , we call the $(\alpha,\beta)$ -connected component containing $q$ the $(\alpha,\beta)$ -community, denoted as $G_{\alpha,\beta}(q)$ .

Definition 0 (Attributed $(\alpha,\beta)$ -Community).

Given an attributed bipartite graph $G$ , two positive integers $\alpha$ and $\beta$ , a query vertex $q$ and a keyword set $S\subseteq W(q)$ (i.e., $q\in U(G)$ ), a subgraph $g$ is an attributed $(\alpha,\beta)$ -community of $G$ if it satisfies the following constraints:

(1)

Connectivity Constraint. $g$ is a connected subgraph which contains $q$ . 2. (2)

Structure Cohesiveness Constraint. $\forall u\in U(g)$ , $deg(u,g)$ $\geq$$\alpha$ and $\forall v\in V(g)$ , $deg(v,g)\geq\beta$ . 3. (3)

Keyword Cohesiveness Constraint. The size of $(|L_{U}(g)|+|L_{V}(g)|)$ is maximal, where $L_{U}(g)=\cap_{u\in U(g)}(W_{U}(u)\cap S)$ represents the set of keywords shared in $S$ by all vertices of $U(g)$ and $L_{V}(g)=\cap_{v\in V(g)}(W_{V}(v))$ represents the set of keywords shared by all vertices of $V(g)$ . 4. (4)

Maximality Constraint. There exists no other $g^{{}^{\prime}}\supset g$ satisfying above constraints with $L_{U}(g^{{}^{\prime}})=L_{U}(g)$ and $L_{V}(g^{{}^{\prime}})=L_{V}(g)$ .

Example 3.6.

Considering the bipartite graph G in Fig.2(a), let q=A, $\alpha$ =2, $\beta$ =2. If $S$ = $\{a,b,c\}$ , we can find an attributed $(2,2)$ -commu-nity $g$ as Fig.3 illustrates (in red corlor), whose shared keyword set $L_{U}(g)=\{b,c\}$ , $L_{V}(g)=\{x,y\}$ .

Problem Statement. Given an attributed bipartite graph $G$ , parameters $\alpha$ and $\beta$ , a query vertex $q$ and a keyword set $S\subseteq W(q)$ , the $attributed\ (\alpha,\beta)$ - $community\ search$ problem aims to find the attributed $(\alpha,\beta)$ -communities in $G$ . For ease of representation, we regard $q$ as a vertex in $U(G)$ in this paper. Since the final result must contains $q$ , we regard $S$ as $S_{U}$ , the maximum keyword set which is possible to be shared by all vertices in $U(G)$ .

4. Basic Solution

We use $G[S_{u},S_{v}]$ to denote the largest connected subgraph of $G$ , where each vertex in $U(G[S_{u},S_{v}])(V(G[S_{u},S_{v}]))$ contains $S_{u}(S_{v})$ and $q\in G[S_{u},S_{v}]$ . We use $G_{\alpha,\beta}[S_{u},S_{v}]$ to denote the largest connected subgraph of $G[S_{u},S_{v}]$ , in which every vertex in $U(G_{\alpha,\beta}[S_{u},S_{v}])$ has degree being at least $\alpha$ and every vertex in $V(G_{\alpha,\beta}[S_{u},S_{v}])$ has degree being at least $\beta$ . We call $\{S_{u},S_{v}\}$ a qualified keyword set for the query vertex $q$ on the graph $G$ , if $G_{\alpha,\beta}[S_{u},S_{v}]$ exists.

Given a query vertex $q$ , a straightforward method to find the attributed $(\alpha,\beta)$ -communities in $G$ performs three steps. First, for one layer of the bipartite graph which contains q, here we consider it as $U(G)$ and consider $S$ as $S_{U}$ , all nonempty subsets of $S_{U}$ , $S_{U1}$ , $S_{U2}$ , $...$ , $S_{U(2^{l}-1)}\ (l=|S_{U}|)$ , are enumerated, and for each $v\in V(G)$ , we put all different keywords in $W_{V}(v)$ into $S_{V}$ and enumerate all nonempty subsets of $S_{V}$$(i.e.,S_{V1}$ , $S_{V2}$ , $...$ , $S_{V(2^{k}-1)}\ (k=|S_{V}|))$ . Then for each set $\{S_{Ui},S_{Vj}\}(1\leq i\leq 2^{l}-1,1\leq j\leq 2^{k}-1)$ , we verify the existence of $G_{(\alpha,\beta)}[S_{Ui},S_{Vj}]$ and compute it when it exists. Finally, we output the subgraphs having the most shared keywords among all $G_{(\alpha,\beta)}[S_{Ui},S_{Vj}]$ .

We can summarize the straightforward method into a two-step framework, generation and verification of candidate keyword sets. Considering the bipartite graph $G$ in Fig.2(a), let $q$ = $A$ , $\alpha$ =2, $\beta$ =2, $S$ = $\{a,b,c\}$ , Fig.4 shows how we find attributed $(2,2)$ -communities through the two-step framework, and the the computational complexity for the proposed framework is the same as that for the $Basic$ algorithm mentioned below.

Here we first give the procedure to verify the existence of $G_{\alpha,\beta}$ $(q,G^{{}^{\prime}})$ in a given subgraph $G^{{}^{\prime}}$ of $G$ for each given candidate keyword set.

Theorem 4.1.

Given a bipartite graph $G$ , It takes $O(d_{umax}\cdot(n_{u}+n_{v}\cdot d_{vmax}))$ to compute $G_{\alpha,\beta}(q,G^{{}^{\prime}})$ .

Proof.

There are $n_{u}$ nodes in $U(G^{{}^{\prime}})$ , $n_{v}$ nodes in $V(G^{{}^{\prime}})$ and we denote the largest degree of these nodes in $U(G^{{}^{\prime}})(V(G^{{}^{\prime}}))$ as $d_{umax}(d_{vmax})$ . Removing all $u\in U(G)$ with degree less than $\alpha$ cost $O(n_{u}\cdot d_{umax})$ , and the while loop in line 4-15 cost $O(n_{v}\cdot d_{vmax}\cdot d_{umax})$ . ∎

Based on the straightforward method, we present Algorithm2, a baseline query algorithm called $Basic$ . The input of $basic$ is a bipartite graph $G$ , a query vertex q, two positive integers $\alpha$ and $\beta$ , and a set $S$ . It first initializes a set, $\psi$ , of candidate keyword sets with each being a nonempty subset of $S(i.e.,S_{1},S_{2},S_{3},...,S_{U(2^{l}-1)}(l=|S|))$ (line 1). After that, for each vertex in $V(G)$ , we enumerate all nonempty subsets of $W_{V}(v)$ , put them into $\varphi$ and ensure that each element in $\varphi$ appears only once. In the while loop (lines 2–12), it first set $m=0$ ,indicating the size of current keyword sets, $max=0$ , indicating the maximal size of all keyword sets and an empty set $\phi_{m}$ (line 3) for collecting all the qualified keyword sets. Then for each $\psi^{{}^{\prime}}\in\psi$ and for each $\varphi^{{}^{\prime}}\in\varphi$ , it finds $G_{\alpha,\beta}[\psi^{{}^{\prime}},\varphi^{{}^{\prime}}]$ from $G_{\alpha,\beta}$ by considering the keyword and degree constraints (line 4-7). If $G_{\alpha,\beta}[\psi^{{}^{\prime}},\varphi^{{}^{\prime}}]$ exists, the sum of numbers of elements in $\psi^{{}^{\prime}}$ and $\varphi^{{}^{\prime}}$ is recorded by $m$ . Then we compare $m$ with $max$ . If $max\leq m$ , it then assign $m$ to $max$ and put the set of current keywords in $\psi^{{}^{\prime}}$ and $\varphi^{{}^{\prime}}$ into $\phi_{m}$ (line 10-12). After checking all the candidate keyword sets in $\psi$ and $\varphi$ , if there are at least one qualified keyword sets in $\phi_{m}$ , it output the communities of keyword sets in $\phi_{m}$ (line 13-14).

Theorem 4.2.

Given a bipartite graph G, $Basic$ computes $G_{\alpha,\beta}[S_{u},$ $\ S_{v}]$ in $O(n_{v}\cdot 2^{|S_{v}|_{max}}\log(n_{v}\cdot 2^{|S_{v}|_{max}})+2^{|S|}\cdot 2^{|S_{v}|_{max}}\cdot O(compute$ $\ G_{\alpha,\beta}$$(q,G^{{}^{\prime}})))$ .

Proof.

We use $|S_{v}|_{max}$ to represent the $W_{V}(v)$ of largest size among all $v\in V(G)$ , Initializing $\psi$ and $\varphi$ can be completed in $O(2^{|S|}+n_{v}\cdot 2^{|S_{v}|_{max}}\log(n_{v}\cdot 2^{|S_{v}|_{max}}))$ and the while loop in line 2-12 costs $O(2^{|S|}\cdot 2^{|S_{v}|_{max}}\cdot O(compute\ G_{\alpha,\beta}(q,G^{{}^{\prime}})))$ . ∎

One major drawback of the straightforward method is that we need to compute $(2^{l}-1)\times(2^{k}-1)$ subsets of attributes and verify the existence of corresponding subgraphs (i.e., $G_{(\alpha,\beta)}[S_{Ui},S_{Vj}]$ ). For large values of $l$ and $k$ , the computation overhead makes this method impractical. To alleviate this problem, we study methods to simplify the generation and verification of candidate keyword sets, and propose two improved algorithms.

5. Improved Attributed $(\alpha,\beta)$ -community Search Algorithm

In this section, we shrink the range of possible candidate keyword sets and develop two more efficient algorithms: the incremental algorithm ( $Inc$ ) verify the candidate sets from smaller to larger ones while the decremental algorithm ( $Dec$ ) examine larger candidate sets to smaller ones.

5.1. The Incremental Algorithm

Attributed bipartite graphs have the anti-monotonicity property regarding the attributed $(\alpha,\beta)$ -community search, which is shown in the following lemma:

Lemma 5.1.

Given a graph $G$ , a vertex $q\in G$ , set $S_{u}$ and $S_{v}$ of keywords, if there exists a subgraph $G_{\alpha,\beta}[S_{u},S_{v}]$ , then there exists a subgraph $G_{\alpha,\beta}[S_{u}^{{}^{\prime}},S_{v}^{{}^{\prime}}]\supseteq G_{\alpha,\beta}[S_{u},S_{v}]$ for any subset $S_{u}^{{}^{\prime}}\subseteq S_{u},S_{v}^{{}^{\prime}}\subseteq S_{v}$ .

Proof.

Based on the definition of $G_{\alpha,\beta}[S_{u},S_{v}]$ , each vertex in $U(G_{\alpha,\beta}[S_{u},S_{v}])$ contains $S_{u}$ and each vertex in $V(G_{\alpha,\beta}[S_{u},S_{v}])$ contains $S_{v}$ . Consider two new keyword sets $S_{u}^{{}^{\prime}}\subseteq S_{u},S_{v}^{{}^{\prime}}\subseteq S_{v}$ , we can easily conclude that each vertex in $U(G_{\alpha,\beta}[S_{u},S_{v}])$ contains $S_{u}^{{}^{\prime}}$ and each vertex in $V(G_{\alpha,\beta}[S_{u},S_{v}])$ contains $S_{v}^{{}^{\prime}}$ as well. Also, note that $q\in G_{\alpha,\beta}[S_{u},S_{v}]$ . These two properties imply that there exists one subgraph of $G$ , namely $G_{\alpha,\beta}[S_{u},S_{v}]$ , with each vertex in $U(G)$ has degree being at least $\alpha$ and each vertex in $V(G)$ has degree being at least $\beta$ , such that it contains $q$ and every vertex in its upper(lower) layer contains $S_{u}^{{}^{\prime}}(S_{u}^{{}^{\prime}})$ . It follows that there exists such a subgraph with maximal size (i.e., $G_{\alpha,\beta}[S_{u}^{{}^{\prime}},S_{v}^{{}^{\prime}}]$ ). ∎

Lemma 5.2.

Given two groups of keyword sets $\{S_{u1},S_{v1}\}$ and $\{S_{u2},S_{v2}\}$ , if $G_{\alpha,\beta}[S_{u1},S_{v1}]$ and $G_{\alpha,\beta}[S_{u2},S_{v2}]$ exist, we have $G_{\alpha,\beta}$ $[S_{u1\cup u2},S_{v1\cup v2}]\subseteq G_{\alpha,\beta}[S_{u1},S_{v1}]\cap G_{\alpha,\beta}[S_{u2},S_{v2}]$ .

Proof.

Based on Lemma 1, since $\{S_{u1},S_{v1}\}\subseteq\{S_{u1\cup u2},S_{v1\cup v2}\}$ and $G_{\alpha,\beta}[S_{u1},S_{v1}]$ exsits, we have $G_{\alpha,\beta}[S_{u1\cup u2},S_{v1\cup v2}]\subseteq G_{\alpha,\beta}[S_{u1},$ $S_{v1}]$ . For the same reason, we have $G_{\alpha,\beta}[S_{u1\cup u2},S_{v1\cup v2}]\subseteq G_{\alpha,\beta}[S_{u2},$ $S_{v2}]$ . It directly follows the lemma. ∎

This lemma implies, if $\{S_{u}^{{}^{\prime}},S_{v}^{{}^{\prime}}\}$ is generated from $\{S_{u1},S_{v1}\}$ and $\{S_{u2},S_{v2}\}$ , we can find $G_{\alpha,\beta}[S_{u}^{{}^{\prime}},S_{v}^{{}^{\prime}}]$ from $G_{\alpha,\beta}[S_{u1},S_{v1}]\cap G_{\alpha,\beta}[S_{u2},$ $S_{v2}]$ directly. Since every vertex in $G_{\alpha,\beta}[S_{u1},S_{v1}]\cap G_{\alpha,\beta}[S_{u2},S_{v2}]$ contains both $\{S_{u1},S_{v1}\}$ and $\{S_{u2},S_{v2}\}$ , we do not need to consider the keyword constraint again when finding $G_{\alpha,\beta}[S_{u}^{{}^{\prime}},S_{v}^{{}^{\prime}}]$ .

In addition, considering the degree constraint of $G_{\alpha,\beta}[S_{u}^{{}^{\prime}},S_{v}^{{}^{\prime}}]$ , there is a key observation that, if ${S_{u}^{{}^{\prime}},S_{v}^{{}^{\prime}}}$ is a qualified keyword set, then there are at least $\beta$ vextices in $U(G_{\alpha,\beta}[S_{u}^{{}^{\prime}},S_{v}^{{}^{\prime}}])$ containing set $S_{u}^{{}^{\prime}}$ and $\alpha$ vertices in $N(q)$ containing set $S_{v}^{{}^{\prime}}$ . This observation implies, we can generate all the candidate keyword sets directly by using the query vertex $q$ and $q^{{}^{\prime}}$ neighbors, without touching other vertices.

Based on above lemmas and observation, we introduce the algorithm $Inc$ . Compared with $Basic$ , it shrinks the initial candidate keyword sets and can always verify the existence of $G_{(\alpha,\beta)}[\psi^{{}^{\prime}},\varphi^{{}^{\prime}}]$ within a subgraph of G instead of the entire graph $G$ , and thus the subgraph for such verification shrinks when the candidate set ${\psi^{{}^{\prime}},\varphi^{{}^{\prime}}}$ expands. Therefore, a large sum of redundant computation is reduced during the verification process.

Algorithm 3 presents $Inc$ . First it initializes a set, $\psi\{\psi_{1},\psi_{2},...,\psi_{i}\}$ , of candidate keyword sets with each being a keyword of $S$ . Then for each $v\in N(q)$ , it puts each keyword in $W_{V}(v)$ into $S_{V}$ and initializes a set, $\varphi\{\varphi_{1},\varphi_{2},...,\varphi_{j}\}$ , of candidate keyword sets with each being a keyword of $S_{V}$ (line 1). For each candidate keyword set $\psi_{i}(\varphi_{j})$ in $\psi(\varphi)$ , it traverse $G$ and put nodes containing $\psi_{i}(\varphi_{j})$ into $P_{i}(Q_{j})$ (line 2). Considering the key observation that, if $\psi_{i}(\varphi_{j})$ is a qualified keyword set, then there are at least $\beta$ nodes in $U(G)$ containing $\psi_{i}$ and $\alpha$ nodes in $V(G)$ containing $\varphi_{j}$ , so it removes $P_{i}$ and $\psi_{i}$ if $|P_{i}|<\beta$ , and removes $Q_{j}$ and $\varphi_{j}$ if $|Q_{j}|<\alpha$ as well (line 3). Then, we set $l=0$ , indicating the sizes of current keyword sets, and initialize a set $\phi$ of $<c,G_{\alpha,\beta}[c]>$ pairs. In a $<c,G_{\alpha,\beta}[c]>$ pair, $c$ contains a set, $\psi^{{}^{\prime}}$ , of keywords from $\psi$ and a set, $\varphi^{{}^{\prime}}$ , of keywords from $\varphi$ , and $G_{\alpha,\beta}[c]$ is an $(\alpha,\beta)$ -community of $G$ where each vertex in $U(G_{\alpha,\beta}[c])$ contains $\psi^{{}^{\prime}}$ and each vertex in $V(G_{\alpha,\beta}[c])$ contains $\varphi^{{}^{\prime}}$ (line 4). $\forall\psi^{{}^{\prime}}\in\psi$ and $\forall\varphi^{{}^{\prime}}\in\varphi$ , we verify the existence of $G_{(\alpha,\beta)}[\psi^{{}^{\prime}},\varphi^{{}^{\prime}}]$ and put the qualified $<c,G_{\alpha,\beta}[c]>$ pairs into $\phi_{l}$ (line 5-10). In the while loop (lines 11–18), for every two $<c,G_{\alpha,\beta}[c]>$ pairs, denoted as $<c_{1},G_{\alpha,\beta}[c_{2}]>$ and $<c_{2},G_{\alpha,\beta}[c_{2}]>$ in $\phi_{l}$ , we find $G_{(\alpha,\beta)}[c_{1}\cup c_{2}]$ from $G[c_{1}\cup c_{2}]$ , the shared subgraph of $G_{\alpha,\beta}[c_{2}]$ and $G_{\alpha,\beta}[c_{2}]$ (line 12-15). If $G_{(\alpha,\beta)}[c_{1}\cup c_{2}]$ exists, we put the pair of $c_{1}\cup c_{2}$ and $G_{(\alpha,\beta)}[c_{1}\cup c_{2}]$ into the set $\phi_{l+1}$ (line 16-17). When $\phi_{l}$ is empty, we stop the loop. Next, we look for the qualified keyword sets $c$ , which contain the most keywords, from $\phi_{0}$ to $\phi_{l-1}$ . Finally, we output the communities of keyword sets $c$ .

Theorem 5.3.

Given a bipartite graph G, $Inc$ computes $G_{\alpha,\beta}[c]$ in $O((|S|+|S_{V}|-1)\cdot|S|\cdot|S_{V}|(|S|\cdot|S_{V}|+O(Compute\ G_{\alpha,\beta}(q,G))))$ .

Proof.

In Algorithm 3, we use $d$ to denote the degree of $q$ and $|S_{v}|_{max}$ to represent the $W_{V}(v)$ of largest size among all $v\in N(q)$ , lines 1 can be completed in $O(|S|+d\cdot|S_{v}|_{max}\log(d\cdot|S_{v}|_{max}))$ time. Line 2-3 can be completed in $O(n_{u}\cdot|S|+n_{v}\cdot d\cdot|S_{v}|_{max}\log(d\cdot 2^{|S_{v}|_{max}}))$ time. Line 5-10 can be completed in $O(|S|\cdot|S_{V}|\cdot O(Compute$ $\ G_{\alpha,\beta}(q,G)))$ time. In while loop, each time it takes $O(|S|\cdot|S_{V}|(|S|\cdot|S_{V}|+O(Compute\ G_{\alpha,\beta}(q,G))))$ time to find qualified communities and put them into a new set $\phi_{l+1}$ , in the worst case, it runs $(|S|+|S_{V}|-1)$ times. ∎

Example 5.4.

Considering $G$ in Fig.2(a), let $q$ = $A$ , $\alpha$ =2, $\beta$ =2 and $S$ = $\{a,b,c\}$ , Fig.5(a) shows a (2,2)-core of $G$ . By Algorithm 3, we first find set of keyword sets $\psi\{\{a\},\{b\},\{c\}\}$ , $\varphi\{\{w\},\{x\},\{y\},\{z\}\}$ and then verify that $G_{2,2}[\{b\},\{x\}]$ , $G_{2,2}[\{b\},\{y\}]$ , $G_{2,2}[\{c\},\{x\}]$ and $G_{2,2}[\{c\},\{y\}]$ exists as Fig.5(b) and Fig.5(c) show. In the first while loop, we choose 2 qualified keyword sets from $\{\{b,x\},$ $\{b,y\},\{c,x\},$ $\{c,y\}\}$ and get their union set (e.t. $\{bc,xy\}\ from\ \{b,x\}\ and\ \{c,y\}$ ). By Lemma 2, we only need to verify the new candidate keyword set under nodes in $G_{2,2}[\{b\},\{x\}]$ and $G_{2,2}[\{c\},\{y\}]$ . Fig.5(d) shows the final attributed community $G_{2,2}[\{b,c\},\{x,y\}]$ .

5.2. The Decremental Algorithm

The decremental algorithm, denoted by $Dec$ , differs from the incremental algorithm on both the generation and verification of candidate keyword sets.

5.2.1. Generation of candidate keyword sets

Lemma 5.5.

Given a vertex set V of $q^{{}^{\prime}}$ s neighbors, a qualified keyword set $S_{u}$ and a set $S_{V}$ containing all nonempty subsets of $W_{V}(v)$ . For each $S_{v}\in S_{V}$ , if less than $\alpha$ vertices in V containing $S_{v}$ , we have $G_{\alpha,\beta}[S_{u},S_{v}]$ does’t exist.

Proof.

Assume that $\{S_{u},S_{v}\}$ is a qualified keyword set, then there are at least $\beta$ vertices in $U(G_{\alpha,\beta}[S_{u},S_{v}])$ containing $S_{u}$ and $\alpha$ vertices of $q^{{}^{\prime}}s$ neighbors containing $S_{v}$ . This contradicts the condition that less than $\alpha$ vertices in $V$ contains $S_{v}$ , so lemma 3 is proved. ∎

We generate the candidate keyword sets, $\psi$ , of $U(G)$ by enumerating all nonempty subsets of $S_{U}$ . For each vertex $v\in N(q)$ , we enumerate all nonempty subsets of $W_{V}(v)$ and put them into a new set $\varphi$ , the elements of which are different from each other. Then we update the candidate keyword sets by removing those contained by less than $\alpha$ of $q^{{}^{\prime}}$ neighbors.

Example 5.6.

Consider a query vertex Q( $\alpha$ = 3)with 5 neighbors in Fig.6(a), where the selected keywords of each vertex are listed in the curly braces. For each neighbor of Q, all nonempty subsets of its keyword sets are generated, as shown in Fig.6(b). We can easily filter out the subset which occurs equal to or more than three times and form the set $\varphi\{\{x\},\{y\},\{z\},\{x,y\}\}$ .

5.2.2. Verification of candidate keyword sets

As candidates can be obtained using $S$ and $q^{{}^{\prime}}$ neighbors directly, we can verify them in a decremental manner (larger candidate keyword sets first and smaller candidate keyword sets later). During the verification process, once finding the attribute $(\alpha,\beta)$ -communities for candidate keyword sets of the same size, $Dec$ does not need to verify smaller candidate keyword sets. Therefore, compared with the incremental algorithm, $Dec$ can save the cost of verifying smaller candidate keywords, thus it may be faster practically.

Based on the above discussions, we design $Dec$ as shown in Algorithm 4. We first generate candidate keyword sets $\psi$ and $\varphi$ respectively using $S$ and $q^{{}^{\prime}}$ neighbors, $P_{i}$ denote the set of nodes containing $\psi_{i}$ and $Q_{j}$ denote the set of nodes containing $\varphi_{j}$ (line 1-2). Next, we update $\psi,\varphi,P,Q$ through removing the vertex sets and the corresponding keyword sets that dissatisfy structure cohesiveness constraint (line 3). Then, we set $max=0$ , indicating the maximal size of all candidate keyword sets, and initialize set $S$ and $c$ , where $S$ contains $c$ and $c$ denotes a set consisting of a keyword set, $\psi^{{}^{\prime}}$ ,from $\psi$ and a keyword set, $\varphi^{{}^{\prime}}$ , from $\varphi$ (line 4). $\forall\psi^{{}^{\prime}}\in\varphi$ and $\forall\varphi^{{}^{\prime}}\in\varphi$ , we generate ( $|\psi|\times|\varphi|$ ) $c$ and put them into $S$ (line 5-8). For each subset of $S$ , we sort it in descending order according to the number of elements in it (line 9). After that, while $S_{k}\in S$ and $|S_{k}|>max$ , we verify the existence of $G_{(\alpha,\beta)}[S_{k}]$ in order. If $G_{(\alpha,\beta)}[S_{k}]$ exists, we put it into the set $ans$ and replace $max$ by $|S_{k}|$ .For the rest set in $S$ , when we find a set with less than $max$ elements, we stop the verification and output the desired $(\alpha,\beta)-$ communities in $ans$ .

Theorem 5.7.

Given a bipartite graph G, $Dec$ computes $G_{\alpha,\beta}[S_{k}]$ in $O((2^{|S|}\cdot d\cdot 2^{|S_{v}|_{max}})\cdot O(compute\ G_{\alpha,\beta}(q,G))+n_{v}\cdot d\cdot 2^{|S_{v}|_{max}}\log(d\cdot 2^{|S_{v}|_{max}}))$ .

Proof.

In Algorithm 4, we use $d$ to represent the degree of $q$ , $|S_{v}|_{max}$ to represent the $W_{V}(v)$ of largest size among all $v\in N(q)$ , we can initialize $\psi$ and $\varphi$ in $O(2^{|S|}+d\cdot 2^{|S_{v}|_{max}}\log(d\cdot 2^{|S_{v}|_{max}}))$ time. Line 2-3 can be completed in $O(n_{u}\cdot 2^{|S|}+n_{v}\cdot d\cdot 2^{|S_{v}|_{max}}\log(d\cdot 2^{|S_{v}|_{max}}))$ time. In line 5-8, set $c$ can be generated in $O(2^{|S|}\cdot d\cdot 2^{|S_{v}|_{max}}))$ time. Then it takes $O(2^{|S|}\cdot d\cdot 2^{|S_{v}|_{max}}\log(2^{|S|}\cdot d\cdot 2^{|S_{v}|_{max}}))$ sorting $S$ in descending order of the number of elements in $S$ . In the worst case, it costs $O((2^{|S|}\cdot d\cdot 2^{|S_{v}|_{max}})\cdot O(compute\ G_{\alpha,\beta}(q,G)))$ to find all qualified $G_{(\alpha,\beta)}[S_{k}]$ in line 10-18. However, it will be much faster in practice. ∎

6. Experiments

This section presents our experimental results. We evaluate the efficiency of the techniques for retrieving attributed $(\alpha,\beta)$ -communities.

6.1. Experiments setting

Algorithms. We implement and compare following algorithms: 1) a baseline algorithm $Basic$ we propose in Section 4, 2) an improved algorithm $Basic^{+}$ based on Basic,3) the improved attributed $(\alpha,\beta)$ -community search algorithm $Inc$ , 4) the improved attributed $(\alpha,\beta)$ -community search algorithm $Dec$ in Section 5.

Datasets. We evaluate the algorithms on eight real graphs which are $Enwikibooks$ , $Movie$ , $IMDB$ , $Actor$ , $Discogs$ , $Idwiki$ , $Plwiki$ and $Nlwiki$ . All the datasets we use can be found in KONECT (http://konect.cc/networks). Note that, for the datasets without attributes, we respectively generate two different kinds of keyword sets for the vertices in the different layer of the bipartite graphs. In each experiment we randomly select 8-13 keywords (average 10) for each vertex. The summary of datasets is shown in Table 1. $U$ and $V$ are vertex layers, $|E|$ is the number of edges, and $\widehat{d}$ is the average degree of vertices.

The algorithms are implemented in C++ and the experiments are run on a machine having two tetradeca-core Intel Xeon E5-2680 v4 processor, and 251GB of memory, with Ubuntu installed. We set the maximum running time for each test to be 3 days. If a test does not stop in the time limit, we denote the corresponding processing time as INF. The code is open-sourced in https://github.com/892681347/AttributeBigraph.

6.2. Evaluation of retrieving attributed $(\alpha,\beta)$ -community

Here we evaluate the performance of the algorithms ( $Basic$ , $Basic^{+}$ , $Inc$ and $Dec$ ) for querying attributed $(\alpha,\beta)$ -communities. We set the default values of $\alpha$ and $\beta$ to 3, and the input keyword set S is set to be the full set of keywords contained in the query vertex. For each dataset, we randomly select 300 query vertices with core numbers greater than or equal to the core number we set. The value of each data is the average result of those 300 queries. For each dataset, we also randomly select $20\%$ , $40\%$ , $60\%$ and $80\%$ of its vertices and obtain four subgraphs induced by these vertex sets, $20\%$ , $40\%$ , $60\%$ and $80\%$ of its keywords and obtain four keyword sets.

The running time of $Basic$ is more than 3 days for all experiments, while the $Basic^{+}$ is unpredictable for large graphs (Idwiki, Plwiki and Nlwiki), so we record them as INF, and the effect of $Basic$ and $Basic^{+}$ algorithm will not be described separately in the corresponding experiments.

Evaluating the effect of query parameters $\alpha$ and $\beta$ . We vary $\alpha$ and $\beta$ to assess the performance of these algorithms. In Fig.7(a)-7(h), $\beta$ is fixed and the experimental parameter $\alpha$ gradually increases from 2 to 6. We can observe that as $\alpha$ keeps increasing, the running time of $Basic^{+}$ , $Inc$ and $Dec$ algorithms decreases. This is because only a few number of vertices and edges are removed from the original graph when the query parameter $\alpha$ is small. When $\alpha$ is large, the resulting $(\alpha,\beta)$ -communities are much smaller than the original graph. Thus the size of subgraph directly impacts on the running time of $Basic^{+}$ , $Inc$ and $Dec$ algorithms. Obviously, $Dec$ algorithm takes less time than $Basic^{+}$ and $Inc$ algorithms in any case. In Fig.8(a)-8(h), we fix $\alpha$ and vary $\beta$ to compare the query efficiency. In the experiment, we gradually increase the experimental parameter $\beta$ from 2 to 6 and the experimental results are similar to those when $\alpha$ increases. With the increase of $\beta$ , the running time of $Basic^{+}$ , $Inc$ and $Dec$ algorithms decreases. This is also because higher $\beta$ returns a subgraph with less vertices from the original graph, while $Basic^{+}$ and $Inc$ algorithms are easier to be affected by the number of vertices.

Evaluating the scalability w.r.t. keyword. In this experiment, we evaluate scalability over the fraction of keywords for each vertex. We vary the number of keywords by randomly sampling them from $20\%$ to $100\%$ . As shown in Fig.9(a)-9(h), when varying the number of keywords, the running time of $Basic^{+}$ , $Inc$ and $Dec$ algorithms stably increases. This is because when the number of keywords increase, the number of subgraphs derived from the keywords and the vertices and edges in each subgraph will increase accordingly. The running time of $Basic^{+}$ and $Inc$ algorithms increase faster than that of $Dec$ algorithm as more keywords are involved, which indicates that $Dec$ performs the better and has a good scalability in practice.

Evaluating the scalability w.r.t. vertex. In this experiment, we evaluate the scalability over different fraction of vertices. To test the scalability, we vary the number of vertices and edges by randomly sampling them respectively from $20\%$ to $100\%$ and keeping the induced subgraphs as the input graphs. All the keywords of vertices are considered. Fig.10(a)-10(h) show that, as the number of vertices increasing from $20\%$ to $100\%$ , the running time for $Basic^{+}$ , $Inc$ and $Dec$ algorithms stably increases, and the running time of $Basic^{+}$ and $Inc$ increases faster than that of $Dec$ . For example, on Imdb, When the number of nodes increases from $20\%$ to $100\%$ , the running time of $Dec$ increase from 0.30s to 0.75s, while that of $Basic^{+}$ increase from 3.38s to 29.93s and that of $Inc$ increase from 0.28s to 3.32s. We see that $Dec$ has better performance than $Inc$ for most cases, but the opposite may occur in some cases with few vertices. This is because $Inc$ algorithm is easier to be affected by the number of vertices than $Dec$ .

Evaluating the effect of $S$ . In this experiment, we evaluate the effect of the experimental parameter $S$ on the efficiency of the algorithms. For each query vertex, we randomly sampling $20\%$ , $40\%$ , $60\%$ , $80\%$ and $100\%$ keywords of it to form the query keyword set $S$ . As shown in Fig.11(a)-11(h), We can see that with the increase of $|S|$ , the running time of $Basic^{+}$ and $Inc$ increase rapidly, while that of $Dec$ algorithm increases slowly or almost unchanges. For example, on Actor, the running time of $Dec$ increase form 1.08s to 1.13s, while that of $Basic^{+}$ increase form 2.32s to 14.68s and that of $Inc$ increase form 1.65s to 4.70s. The result shows that $Dec$ performs better than $Basic$ and $Inc$ .

Case study. We conduct queries on the real dateset Southern women (small) from the KONECT (http://konect.cc/networks/), where each vertex in $U$ represents a woman, each vertex in $V$ represents a social activity and each edge indicates the woman participates in the social activity.

We use $A$ as a query vertex, $\alpha$ and $\beta$ are both set to 2, and $S$ contains the keyword “environmental”, the query result is shown in the circled part containing women $\{A,B\}$ and activities $\{w,x\}$ as Fig.11 shows. From the result, we can see the returned people $A$ and $B$ are active participants in environmental activities, and the social activities $w$ and $x$ are all environmental activities with multiple participants from U. In this case, if there is an environmental social activity that needs to recruit team members, then $A$ and $B$ can be given priority because they not only have a preference for environmental social activities but also have experience of cooperation among team members. If we search an (2,2)-community without considering keywords, the result will return the whole women and activities in Fig.12, which includes those who do not often participate in environmental activities. Obviously, the returned candidates cannot be valid team members expected by an environmental activity. This is because we only consider the structure cohesiveness constraint but ignore the keyword cohesiveness constraint.

7. Conclusion

In this paper, we study the attributed $(\alpha,\beta)$ -community search problem. To solve this problem efficiently, we follow a two-step framework which first generates candidate keyword sets, and then verifies the existence of attributed $(\alpha,\beta)$ -community according to each candidate keyword set. Then we develop a basic and two improved query algorithms to retrieve the $(\alpha,\beta)$ -community through verifying the candidate keyword sets in a different order.We conduct extensive experiments on real-world graphs, and the results demonstrate the effectiveness of the attributed $(\alpha,\beta)$ -community model and the proposed techniques.

Bibliography59

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1(1)
2Acquisti and Gross (2006) Alessandro Acquisti and Ralph Gross. 2006. Imagined communities: Awareness, information sharing, and privacy on the Facebook. In International workshop on privacy enhancing technologies . Springer, 36–58.
3Akbas and Zhao (2017) Esra Akbas and Peixiang Zhao. 2017. Truss-based community search: a truss-equivalence based indexing approach. Proceedings of the VLDB Endowment 10, 11 (2017), 1298–1309.
4Barbieri et al . (2015) Nicola Barbieri, Francesco Bonchi, Edoardo Galimberti, and Francesco Gullo. 2015. Efficient and effective community search. Data mining and knowledge discovery 29, 5 (2015), 1406–1433.
5Berahmand et al . (2020) Kamal Berahmand, Sogol Haghani, Mehrdad Rostami, and Yuefeng Li. 2020. A new Attributed Graph Clustering by using Label Propagation in Complex Networks. Journal of King Saud University - Computer and Information Sciences (2020).
6Berahmand et al . (2022) Kamal Berahmand, Mehrnoush Mohammadi, Azadeh Faroughi, and Rojiar Pir Mohammadiani. 2022. A novel method of spectral clustering in attributed networks by constructing parameter-free affinity matrix. Cluster computing 25-2 (2022).
7Berahmand et al . (2021) Kamal Berahmand, Elahe Nasiri, Rojiar Pir mohammadiani, and Yuefeng Li. 2021. Spectral clustering on protein-protein interaction networks via constructing affinity matrix using attributed graph embedding. Computers in Biology and Medicine 138 (2021), 104933. https://doi.org/10.1016/j.compbiomed.2021.104933 · doi ↗
8Beutel et al . (2013) Alex Beutel, Wanhong Xu, Venkatesan Guruswami, Christopher Palow, and Christos Faloutsos. 2013. Copy Catch: stopping group attacks by spotting lockstep behavior in social networks. In Proceedings of WWW . 119–130.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Code & Models

Videos

Taxonomy

Effective Community Search on Large Attributed Bipartite Graphs

Abstract.

1. Introduction

2. Related Work

2.1. Community search on unipartite graphs.

2.2. Community search/detection on bipartite graphs.

3. Problem Definition

Definition 0 ((α,β)(\alpha,\beta)(α,β)-Core).

Example 3.2.

Definition 0 ((α,β)(\alpha,\beta)(α,β)-Connected Component).

Definition 0 ((α,β)(\alpha,\beta)(α,β)-Community).

Definition 0 (Attributed (α,β)(\alpha,\beta)(α,β)-Community).

Example 3.6.

4. Basic Solution

Theorem 4.1.

Proof.

Theorem 4.2.

Proof.

5. Improved Attributed (α,β)(\alpha,\beta)(α,β)-community Search Algorithm

5.1. The Incremental Algorithm

Lemma 5.1.

Proof.

Lemma 5.2.

Proof.

Theorem 5.3.

Proof.

Example 5.4.

5.2. The Decremental Algorithm

5.2.1. Generation of candidate keyword sets

Lemma 5.5.

Proof.

Example 5.6.

5.2.2. Verification of candidate keyword sets

Theorem 5.7.

Proof.

6. Experiments

6.1. Experiments setting

6.2. Evaluation of retrieving attributed (α,β)(\alpha,\beta)(α,β)-community

7. Conclusion

Definition 0 ( $(\alpha,\beta)$ -Core).

Definition 0 ( $(\alpha,\beta)$ -Connected Component).

Definition 0 ( $(\alpha,\beta)$ -Community).

Definition 0 (Attributed $(\alpha,\beta)$ -Community).

5. Improved Attributed $(\alpha,\beta)$ -community Search Algorithm

6.2. Evaluation of retrieving attributed $(\alpha,\beta)$ -community