Effective Community Search on Large Attributed Bipartite Graphs
Zongyu Xu, Yihao Zhang, Long Yuan, Yuwen Qian, Zi Chen, Mingliang, Zhou, Qin Mao, Weibin Pan

TL;DR
This paper introduces new algorithms for community search in large attributed bipartite graphs, effectively incorporating node attributes to improve the cohesion of the identified communities.
Contribution
The paper proposes the first algorithms that integrate node attributes into community search on bipartite graphs, enhancing result relevance and cohesion.
Findings
Algorithms are effective on eight large graphs
Proposed methods outperform baseline approaches
Query efficiency and community quality are improved
Abstract
Community search over bipartite graphs has attracted significant interest recently. In many applications such as user-item bipartite graph in E-commerce, customer-movie bipartite graph in movie rating website, nodes tend to have attributes, while previous community search algorithm on bipartite graphs ignore attributes, which makes the returned results with poor cohesion with respect to their node attributes. In this paper, we study the community search problem on attributed bipartite graphs. Given a query vertex q, we aim to find attributed -communities of , where the structure cohesiveness of the community is described by an -core model, and the attribute similarity of two groups of nodes in the subgraph is maximized. In order to retrieve attributed communities from bipartite graphs, we first propose a basic algorithm composed…
| Symbol | Meaning |
|---|---|
| G(U,V,E) | An attributed bipartite graph with vertex set U and V, and |
| edge set E | |
| The keyword set of vertex u in U(G) | |
| The keyword set of vertex v in V(G) | |
| The degree of vertex u in U(G) | |
| The degree of vertex v in V(G) | |
| The largest connected subgraph of G s.t. q, and | |
| , | |
| The largest connected subgraph of G s.t. q, | |
| and , | |
| ID | Dataset | ||||
|---|---|---|---|---|---|
| D0 | Enwikibooks(Wikibooks edits) | 79,268 | 249,725 | 766,272 | 4.66 |
| D1 | Movie(Actor movies) | 127,823 | 383,640 | 1,470,404 | 5.75 |
| D2 | IMDB(komarix-imdb) | 685,568 | 186,414 | 2,715,604 | 6.23 |
| D3 | Actor(actor2) | 303,617 | 896,302 | 3,782,463 | 6.30 |
| D4 | Discogs(Discogs) | 1,754,823 | 270,771 | 5,302,276 | 5.24 |
| D5 | Idwiki(edit-idwiki) | 125,481 | 2,183,494 | 6,126,592 | 5.31 |
| D6 | Plwiki(edit-plwiki) | 207,781 | 2,664,432 | 21,219,204 | 14.78 |
| D7 | Nlwiki(edit-nlwiki) | 220,847 | 3,800,349 | 22,142,951 | 11.01 |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComplex Network Analysis Techniques · Caching and Content Delivery · Web Data Mining and Analysis
Effective Community Search on Large Attributed Bipartite Graphs
Zongyu Xu
Nanjing University of Science and Technology
,
Yihao Zhang
Nanjing University of Science and Technology
,
Long Yuan
Nanjing University of Science and Technology
,
Yuwen Qian
Nanjing University of Science and Technology
,
Zi Chen
East China Normal University
,
Mingliang Zhou
Chongqing University
,
Qin Mao
Qiannan Normal Coll Nationalities
and
Weibin Pan
North Information Control Research Academy Group Co.
Abstract.
Community search over bipartite graphs has attracted significant interest recently. In many applications such as user-item bipartite graph in E-commerce, customer-movie bipartite graph in movie rating website, nodes tend to have attributes, while previous community search algorithm on bipartite graphs ignore attributes, which makes the returned results with poor cohesion with respect to their node attributes. In this paper, we study the community search problem on attributed bipartite graphs. Given a query vertex q, we aim to find attributed -communities of , where the structure cohesiveness of the community is described by an -core model, and the attribute similarity of two groups of nodes in the subgraph is maximized. In order to retrieve attributed communities from bipartite graphs, we first propose a basic algorithm composed of two steps: the generation and verification of candidate keyword sets, and then two improved query algorithms Inc and Dec are proposed. Inc is proposed considering the anti-monotonity property of attributed bipartite graphs, then we adopt different generating method and verifying order of candidate keyword sets and propose the Dec algorithm. After evaluating our solutions on eight large graphs, the experimental results demonstrate that our methods are effective and efficient in querying the attributed communities on bipartite graphs.
Community search; Bipartite graphs; Attributed graphs.
††copyright: none
1. Introduction
With the proliferation of graph data, research efforts have been devoted to many fundamental problems in managing and analyzing graph data (Yuan et al., 2016, 2017a; Chen et al., 2018, 2021; Huang et al., 2017; Yang et al., 2018; Wu et al., 2019; Chen et al., 2020; Zhang et al., 2021b; Hao et al., 2021; Yang et al., 2021c, a; Yang et al., 2021b). Bipartite graphs are widely used to represent the relationships between two different types of entities in many real-world applications, such as user-page networks (Beutel et al., 2013; Qiao et al., 2021), customer-product networks (Wang et al., 2006; Qi et al., 2022), collaboration networks (Ley, 2002; Cai et al., 2022), gene co-expression networks (Kaytoue et al., 2011; Zhu et al., 2022). In these practical networks, community structure naturally exists, and a number of cohesive subgraph models (e.g., -core (Liu et al., 2019), bitruss (Wang et al., 2020), and biclique (Lyu et al., 2020)) are proposed to capture the communities in the bipartite graphs. Following these models, community search over bipartite graphs that aims to find densely connected subgraphs satisfying specified structural cohesiveness conditions has been studied in applications such as anomaly detection (Lyu et al., 2020), personalized recommendation (Kumar et al., 1999), and gene expression analysis (Madeira and Oliveira, 2004).
In the aforementioned real-world applications, the entities modeled by the vertices of bipartite graphs often have properties represented by text strings or keywords. When performing community search over such bipartite graphs, previous studies often only focus on the structural cohesiveness of communities but ignore the attributes of the vertices. However, these attributes are important for making sense of communities(Berahmand et al., 2020; Berahmand et al., 2021, 2022), and taking the attributes into consideration provides more personalization and interpretation regarding the returned results(Huang and Lakshmanan, 2017; Fang et al., 2016), while there are few researches on community search based on attributed bipartite graphs.
Motivated by this, we study the - problem on attributed bipartite graphs in this paper. Specifically, given an attributed bipartite graph and a query vertex , we aim to find one or more attributed communities in such that these communities meet both structure cohesiveness (e.g., each vertex in upper layer has at least neighbors and each vertex in lower layer has at least neighbors) and keyword cohesiveness (e.g., vertices in the same layer share the most keywords).
Applications. Attributed -community has many real-world applications. For example,
- •
Personalized product recommendation. Attributed -commu-nity can be used to recommend personalized products. Consider the sub customer-movie subnetwork of IMDB (https://www.imdb.com), where the vertices in the upper layer represent the consumers and the associated attributes describe his or her preference for movies, the vertices in the lower layer represent the movies and the associated attributes describe its genres. The platforms can utilize the attributed -community model to provide personalized recommendation. For example, as Fig.1 shows, if we regard as the query customer, we can find a (2,2)-community composed of viewers and movies . In this community “u2” who prefer “Drama” and “Romance” movies may not be interested in “v2”. We further consider the keyword cohesiveness of this community and find an attributed (2,2)-community containing viewers who share the same preference for “Drama” movies and the movies with genre “Drama”. We can recommend the movie “v5” which the user is likely to be interested in to the query viewer “u2”.
- •
Team Formation. In a bipartite graph composed of developers and projects, an edge between a developer and a project indicates that the developer participates in the project, the keywords of developers show their skills while that of projects indicate the technology it requires. When there is a new project to complete, a developer may wish to form a team as cohesive as possible with all developers in this team having the skills that the project requires, which can be supported by an attributed -community search over the bipartite graph through specifying keywords of the new project.
Although attributed -community search is useful in real applications. it is still inapplicable if the search cannot be finished efficiently, considering that attributed bipartite graph can be very large, and the (structure and keyword) cohesiveness criteria can be complex to handle. A simple way is first to consider all the possible attribute combinations, and then return the corresponding -community that have the most shared attributes. However, the possible number of attribute combinations is exponential, which makes this approach infeasible in practice.
To address this problem, we observe that the attributed -community owns the anti-monotonicity property, namely, for a given set of attributes, if it appears in every vertex of an attributed -community, then every subset of , there exists an attributed -community in which every vertex contains . Following this observation, we devise efficient algorithms which can significantly reduce the search space when compute the results.
Contributions. In this paper, we make the following contributions.
- •
The first work on attributed -community search over attributed bipartite graphs. In this paper, we propose the -community search problem. To the best of our knowledge, this is the first work on attributed -community search.
- •
Efficient algorithms to conduct the -community search. Based on the anti-monotonicity property, we devise efficient algorithms to conduct the -community search.
- •
Extensive experiments on real datasets. We conduct extensive experiments to evaluate the performance of the proposed algorithms. The experimental results demonstrates the efficiency of our proposed algorithms.
Outline. The remainder of this paper is organized as follows. Section 2 presents some related works. Section 3 describes the proposed problem and definitions. A basic solution, enumerating all possible keyword sets and searching for -communities with the most shared keywords, is described in Section 4. Section 5 describes two more efficient algorithms generating and verifying candidate keyword sets in different ways. Section 6 discusses the obtained results with our approaches. Finally, conclusion will be found in Section 7.
2. Related Work
2.1. Community search on unipartite graphs.
Community search performed on unipartite graphs usually using different cohesiveness models such as k-core(Seidman, 1983), k-truss (Cohen, 2008), clique(Fang et al., 2019b). For a detailed survey, see Ref. (Fang et al., 2020). Based on k-core, two online algorithms and one index-based algorithm for k-core community search on unipartite graphs are studied, Cui et al.(Cui et al., 2014) propose a local search algorithm, Sozio et al.(Sozio and Gionis, 2010)propose a global search algorithm, Barbieri et al.(Barbieri et al., 2015) propose a tree-like index structure, and Wu et al.(Wu et al., 2021) study the maximal personalized influential community search. Using k-core, Fang et al.(Fang et al., 2016, 2017a, 2019a) further integrate the attributes of vertices to identify community and then the spatial locations of vertices are also considered to identify community(Fang et al., 2017b; Wang et al., 2018; Ji et al., 2021). For the truss-based community search, Huang et al.(Huang et al., 2014) propose the triangle-connected k-truss community model and then study the closest model.(Huang et al., 2015), Akbas et al. (Akbas and Zhao, 2017) also study the triangle-connected k-truss community model and propose an index-based search algorithm. Acquisti et al.(Acquisti and Gross, 2006) present an efficient k-clique component detection algorithm and Yuan et al.(Yuan et al., 2017b) study the problem of densest clique percolation community search.
2.2. Community search/detection on bipartite graphs.
On bipartite graphs, several existing works (Ding et al., 2017; He et al., 2021; Liu et al., 2019, 2020) extend the k-core model on unipartite graph to the -core model. Ding et al.(Ding et al., 2017)extend the linear k-core mining algorithm to compute -core. He et al.(He et al., 2021) first consider both tie strength and vertex engagement on bipartite graphs and propose a novel cohesive subgraph model. Liu et al.(Liu et al., 2019, 2020) present an efficient algorithm based on a novel index to compute -core in linear time regarding the result size. Based on the butterfly structure, Sariyuce et al.(Sarıyüce and Pinar, 2018), Wang et al.(Wang et al., 2019, 2020), Zou et al.(Zou, 2016) study the bitruss model in bipartite graphs which is the maximal subgraph where each edge is contained in at least k butterflies. Zhang et al.(Zhang et al., 2014) study the biclique enumeration problem. zhang et al.(Zhang et al., 2021a) are the first to consider both structure cohesiveness and weight of vertices on bipartite graphs and then propose a novel cohesive subgraph model. Wang et al.(Wang et al., 2021) present a novel index structure and study the significant community search problem on weighted bipartite graphs, which is the first to study community search on bipartite graphs. However, community search on attributed bipartite graphs remains largely unexplored.
3. Problem Definition
Our problem is defined over an undirected attributed bipartite graph , which consists of nodes divided into two separate sets, and , such that every edge connects one node in to another node in . We use and to denote the two disjoint node sets of and to represent the edge set of . Each vertex is associated with a set of keywords denoted by . An edge between two vertices and in is denoted as . We denote the number of nodes in and as and , the total number of nodes as and the number of edges in as . The set of neighbors of a vertex in is denoted as , and the degree of is denoted as . Table 1 lists the symbols used in the paper.
Definition 0 (-Core).
Given a bipartite graph and two positive integers and , a subgraph is an -core of if for each and for each .
Example 3.2.
In Fig.2(a), is a (2,2)-core. The (1,1)-core has vertices , and is composed of two (1,1)-core components: and . Each -core in Fig.2(a) is listed in Fig.2(b).
Definition 0 (-Connected Component).
Given a bipartite graph and its -core, , a subgraph is an -connected component if (1) and is connected; (2) is maximal.
Definition 0 (-Community).
Given a vertex , we call the -connected component containing the -community, denoted as .
Definition 0 (Attributed -Community).
Given an attributed bipartite graph , two positive integers and , a query vertex and a keyword set (i.e., ), a subgraph is an attributed -community of if it satisfies the following constraints:
- (1)
Connectivity Constraint. is a connected subgraph which contains . 2. (2)
Structure Cohesiveness Constraint. , \geq$$\alpha and , . 3. (3)
Keyword Cohesiveness Constraint. The size of is maximal, where represents the set of keywords shared in by all vertices of and represents the set of keywords shared by all vertices of . 4. (4)
Maximality Constraint. There exists no other satisfying above constraints with and .
Example 3.6.
Considering the bipartite graph G in Fig.2(a), let q=A, =2, =2. If =, we can find an attributed -commu-nity as Fig.3 illustrates (in red corlor), whose shared keyword set , .
Problem Statement. Given an attributed bipartite graph , parameters and , a query vertex and a keyword set , the - problem aims to find the attributed -communities in . For ease of representation, we regard as a vertex in in this paper. Since the final result must contains , we regard as , the maximum keyword set which is possible to be shared by all vertices in .
4. Basic Solution
We use to denote the largest connected subgraph of , where each vertex in contains and . We use to denote the largest connected subgraph of , in which every vertex in has degree being at least and every vertex in has degree being at least . We call a qualified keyword set for the query vertex on the graph , if exists.
Given a query vertex , a straightforward method to find the attributed -communities in performs three steps. First, for one layer of the bipartite graph which contains q, here we consider it as and consider as , all nonempty subsets of , , , , , are enumerated, and for each , we put all different keywords in into and enumerate all nonempty subsets of S_{V}$$(i.e.,S_{V1}, , , . Then for each set , we verify the existence of and compute it when it exists. Finally, we output the subgraphs having the most shared keywords among all .
We can summarize the straightforward method into a two-step framework, generation and verification of candidate keyword sets. Considering the bipartite graph in Fig.2(a), let =, =2, =2, =, Fig.4 shows how we find attributed -communities through the two-step framework, and the the computational complexity for the proposed framework is the same as that for the algorithm mentioned below.
Here we first give the procedure to verify the existence of in a given subgraph of for each given candidate keyword set.
Theorem 4.1.
Given a bipartite graph , It takes to compute .
Proof.
There are nodes in , nodes in and we denote the largest degree of these nodes in as . Removing all with degree less than cost , and the while loop in line 4-15 cost . ∎
Based on the straightforward method, we present Algorithm2, a baseline query algorithm called . The input of is a bipartite graph , a query vertex q, two positive integers and , and a set . It first initializes a set, , of candidate keyword sets with each being a nonempty subset of (line 1). After that, for each vertex in , we enumerate all nonempty subsets of , put them into and ensure that each element in appears only once. In the while loop (lines 2–12), it first set ,indicating the size of current keyword sets, , indicating the maximal size of all keyword sets and an empty set (line 3) for collecting all the qualified keyword sets. Then for each and for each , it finds from by considering the keyword and degree constraints (line 4-7). If exists, the sum of numbers of elements in and is recorded by . Then we compare with . If , it then assign to and put the set of current keywords in and into (line 10-12). After checking all the candidate keyword sets in and , if there are at least one qualified keyword sets in , it output the communities of keyword sets in (line 13-14).
Theorem 4.2.
Given a bipartite graph G, computes in \ G_{\alpha,\beta}$$(q,G^{{}^{\prime}}))).
Proof.
We use to represent the of largest size among all , Initializing and can be completed in and the while loop in line 2-12 costs . ∎
One major drawback of the straightforward method is that we need to compute subsets of attributes and verify the existence of corresponding subgraphs (i.e.,). For large values of and , the computation overhead makes this method impractical. To alleviate this problem, we study methods to simplify the generation and verification of candidate keyword sets, and propose two improved algorithms.
5. Improved Attributed -community Search Algorithm
In this section, we shrink the range of possible candidate keyword sets and develop two more efficient algorithms: the incremental algorithm () verify the candidate sets from smaller to larger ones while the decremental algorithm () examine larger candidate sets to smaller ones.
5.1. The Incremental Algorithm
Attributed bipartite graphs have the anti-monotonicity property regarding the attributed -community search, which is shown in the following lemma:
Lemma 5.1.
Given a graph , a vertex , set and of keywords, if there exists a subgraph , then there exists a subgraph for any subset .
Proof.
Based on the definition of , each vertex in contains and each vertex in contains . Consider two new keyword sets , we can easily conclude that each vertex in contains and each vertex in contains as well. Also, note that . These two properties imply that there exists one subgraph of , namely , with each vertex in has degree being at least and each vertex in has degree being at least , such that it contains and every vertex in its upper(lower) layer contains . It follows that there exists such a subgraph with maximal size (i.e.,). ∎
Lemma 5.2.
Given two groups of keyword sets and , if and exist, we have .
Proof.
Based on Lemma 1, since and exsits, we have . For the same reason, we have . It directly follows the lemma. ∎
This lemma implies, if is generated from and , we can find from directly. Since every vertex in contains both and , we do not need to consider the keyword constraint again when finding .
In addition, considering the degree constraint of , there is a key observation that, if is a qualified keyword set, then there are at least vextices in containing set and vertices in containing set . This observation implies, we can generate all the candidate keyword sets directly by using the query vertex and neighbors, without touching other vertices.
Based on above lemmas and observation, we introduce the algorithm . Compared with , it shrinks the initial candidate keyword sets and can always verify the existence of within a subgraph of G instead of the entire graph , and thus the subgraph for such verification shrinks when the candidate set expands. Therefore, a large sum of redundant computation is reduced during the verification process.
Algorithm 3 presents . First it initializes a set, , of candidate keyword sets with each being a keyword of . Then for each , it puts each keyword in into and initializes a set, , of candidate keyword sets with each being a keyword of (line 1). For each candidate keyword set in , it traverse and put nodes containing into (line 2). Considering the key observation that, if is a qualified keyword set, then there are at least nodes in containing and nodes in containing , so it removes and if , and removes and if as well (line 3). Then, we set , indicating the sizes of current keyword sets, and initialize a set of pairs. In a pair, contains a set, , of keywords from and a set, , of keywords from , and is an -community of where each vertex in contains and each vertex in contains (line 4). and , we verify the existence of and put the qualified pairs into (line 5-10). In the while loop (lines 11–18), for every two pairs, denoted as and in , we find from , the shared subgraph of and (line 12-15). If exists, we put the pair of and into the set (line 16-17). When is empty, we stop the loop. Next, we look for the qualified keyword sets , which contain the most keywords, from to . Finally, we output the communities of keyword sets .
Theorem 5.3.
Given a bipartite graph G, computes in .
Proof.
In Algorithm 3, we use to denote the degree of and to represent the of largest size among all , lines 1 can be completed in time. Line 2-3 can be completed in time. Line 5-10 can be completed in time. In while loop, each time it takes time to find qualified communities and put them into a new set , in the worst case, it runs times. ∎
Example 5.4.
Considering in Fig.2(a), let =, =2, =2 and =, Fig.5(a) shows a (2,2)-core of . By Algorithm 3, we first find set of keyword sets , and then verify that , , and exists as Fig.5(b) and Fig.5(c) show. In the first while loop, we choose 2 qualified keyword sets from and get their union set (e.t.). By Lemma 2, we only need to verify the new candidate keyword set under nodes in and . Fig.5(d) shows the final attributed community .
5.2. The Decremental Algorithm
The decremental algorithm, denoted by , differs from the incremental algorithm on both the generation and verification of candidate keyword sets.
5.2.1. Generation of candidate keyword sets
Lemma 5.5.
Given a vertex set V of s neighbors, a qualified keyword set and a set containing all nonempty subsets of . For each , if less than vertices in V containing , we have does’t exist.
Proof.
Assume that is a qualified keyword set, then there are at least vertices in containing and vertices of neighbors containing . This contradicts the condition that less than vertices in contains , so lemma 3 is proved. ∎
We generate the candidate keyword sets, , of by enumerating all nonempty subsets of . For each vertex , we enumerate all nonempty subsets of and put them into a new set , the elements of which are different from each other. Then we update the candidate keyword sets by removing those contained by less than of neighbors.
Example 5.6.
Consider a query vertex Q( = 3)with 5 neighbors in Fig.6(a), where the selected keywords of each vertex are listed in the curly braces. For each neighbor of Q, all nonempty subsets of its keyword sets are generated, as shown in Fig.6(b). We can easily filter out the subset which occurs equal to or more than three times and form the set .
5.2.2. Verification of candidate keyword sets
As candidates can be obtained using and neighbors directly, we can verify them in a decremental manner (larger candidate keyword sets first and smaller candidate keyword sets later). During the verification process, once finding the attribute -communities for candidate keyword sets of the same size, does not need to verify smaller candidate keyword sets. Therefore, compared with the incremental algorithm, can save the cost of verifying smaller candidate keywords, thus it may be faster practically.
Based on the above discussions, we design as shown in Algorithm 4. We first generate candidate keyword sets and respectively using and neighbors, denote the set of nodes containing and denote the set of nodes containing (line 1-2). Next, we update through removing the vertex sets and the corresponding keyword sets that dissatisfy structure cohesiveness constraint (line 3). Then, we set , indicating the maximal size of all candidate keyword sets, and initialize set and , where contains and denotes a set consisting of a keyword set, ,from and a keyword set, , from (line 4). and , we generate () and put them into (line 5-8). For each subset of , we sort it in descending order according to the number of elements in it (line 9). After that, while and , we verify the existence of in order. If exists, we put it into the set and replace by .For the rest set in , when we find a set with less than elements, we stop the verification and output the desired communities in .
Theorem 5.7.
Given a bipartite graph G, computes in .
Proof.
In Algorithm 4, we use to represent the degree of , to represent the of largest size among all , we can initialize and in time. Line 2-3 can be completed in time. In line 5-8, set can be generated in time. Then it takes sorting in descending order of the number of elements in . In the worst case, it costs to find all qualified in line 10-18. However, it will be much faster in practice. ∎
6. Experiments
This section presents our experimental results. We evaluate the efficiency of the techniques for retrieving attributed -communities.
6.1. Experiments setting
Algorithms. We implement and compare following algorithms: 1) a baseline algorithm we propose in Section 4, 2) an improved algorithm based on Basic,3) the improved attributed -community search algorithm , 4) the improved attributed -community search algorithm in Section 5.
Datasets. We evaluate the algorithms on eight real graphs which are , , , , , , and . All the datasets we use can be found in KONECT (http://konect.cc/networks). Note that, for the datasets without attributes, we respectively generate two different kinds of keyword sets for the vertices in the different layer of the bipartite graphs. In each experiment we randomly select 8-13 keywords (average 10) for each vertex. The summary of datasets is shown in Table 1. and are vertex layers, is the number of edges, and is the average degree of vertices.
The algorithms are implemented in C++ and the experiments are run on a machine having two tetradeca-core Intel Xeon E5-2680 v4 processor, and 251GB of memory, with Ubuntu installed. We set the maximum running time for each test to be 3 days. If a test does not stop in the time limit, we denote the corresponding processing time as INF. The code is open-sourced in https://github.com/892681347/AttributeBigraph.
6.2. Evaluation of retrieving attributed -community
Here we evaluate the performance of the algorithms (, , and ) for querying attributed -communities. We set the default values of and to 3, and the input keyword set S is set to be the full set of keywords contained in the query vertex. For each dataset, we randomly select 300 query vertices with core numbers greater than or equal to the core number we set. The value of each data is the average result of those 300 queries. For each dataset, we also randomly select , , and of its vertices and obtain four subgraphs induced by these vertex sets, , , and of its keywords and obtain four keyword sets.
The running time of is more than 3 days for all experiments, while the is unpredictable for large graphs (Idwiki, Plwiki and Nlwiki), so we record them as INF, and the effect of and algorithm will not be described separately in the corresponding experiments.
Evaluating the effect of query parameters and . We vary and to assess the performance of these algorithms. In Fig.7(a)-7(h), is fixed and the experimental parameter gradually increases from 2 to 6. We can observe that as keeps increasing, the running time of , and algorithms decreases. This is because only a few number of vertices and edges are removed from the original graph when the query parameter is small. When is large, the resulting -communities are much smaller than the original graph. Thus the size of subgraph directly impacts on the running time of , and algorithms. Obviously, algorithm takes less time than and algorithms in any case. In Fig.8(a)-8(h), we fix and vary to compare the query efficiency. In the experiment, we gradually increase the experimental parameter from 2 to 6 and the experimental results are similar to those when increases. With the increase of , the running time of , and algorithms decreases. This is also because higher returns a subgraph with less vertices from the original graph, while and algorithms are easier to be affected by the number of vertices.
Evaluating the scalability w.r.t. keyword. In this experiment, we evaluate scalability over the fraction of keywords for each vertex. We vary the number of keywords by randomly sampling them from to . As shown in Fig.9(a)-9(h), when varying the number of keywords, the running time of , and algorithms stably increases. This is because when the number of keywords increase, the number of subgraphs derived from the keywords and the vertices and edges in each subgraph will increase accordingly. The running time of and algorithms increase faster than that of algorithm as more keywords are involved, which indicates that performs the better and has a good scalability in practice.
Evaluating the scalability w.r.t. vertex. In this experiment, we evaluate the scalability over different fraction of vertices. To test the scalability, we vary the number of vertices and edges by randomly sampling them respectively from to and keeping the induced subgraphs as the input graphs. All the keywords of vertices are considered. Fig.10(a)-10(h) show that, as the number of vertices increasing from to , the running time for , and algorithms stably increases, and the running time of and increases faster than that of . For example, on Imdb, When the number of nodes increases from to , the running time of increase from 0.30s to 0.75s, while that of increase from 3.38s to 29.93s and that of increase from 0.28s to 3.32s. We see that has better performance than for most cases, but the opposite may occur in some cases with few vertices. This is because algorithm is easier to be affected by the number of vertices than .
Evaluating the effect of . In this experiment, we evaluate the effect of the experimental parameter on the efficiency of the algorithms. For each query vertex, we randomly sampling , , , and keywords of it to form the query keyword set . As shown in Fig.11(a)-11(h), We can see that with the increase of , the running time of and increase rapidly, while that of algorithm increases slowly or almost unchanges. For example, on Actor, the running time of increase form 1.08s to 1.13s, while that of increase form 2.32s to 14.68s and that of increase form 1.65s to 4.70s. The result shows that performs better than and .
Case study. We conduct queries on the real dateset Southern women (small) from the KONECT (http://konect.cc/networks/), where each vertex in represents a woman, each vertex in represents a social activity and each edge indicates the woman participates in the social activity.
We use as a query vertex, and are both set to 2, and contains the keyword “environmental”, the query result is shown in the circled part containing women and activities as Fig.11 shows. From the result, we can see the returned people and are active participants in environmental activities, and the social activities and are all environmental activities with multiple participants from U. In this case, if there is an environmental social activity that needs to recruit team members, then and can be given priority because they not only have a preference for environmental social activities but also have experience of cooperation among team members. If we search an (2,2)-community without considering keywords, the result will return the whole women and activities in Fig.12, which includes those who do not often participate in environmental activities. Obviously, the returned candidates cannot be valid team members expected by an environmental activity. This is because we only consider the structure cohesiveness constraint but ignore the keyword cohesiveness constraint.
7. Conclusion
In this paper, we study the attributed -community search problem. To solve this problem efficiently, we follow a two-step framework which first generates candidate keyword sets, and then verifies the existence of attributed -community according to each candidate keyword set. Then we develop a basic and two improved query algorithms to retrieve the -community through verifying the candidate keyword sets in a different order.We conduct extensive experiments on real-world graphs, and the results demonstrate the effectiveness of the attributed -community model and the proposed techniques.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1(1)
- 2Acquisti and Gross (2006) Alessandro Acquisti and Ralph Gross. 2006. Imagined communities: Awareness, information sharing, and privacy on the Facebook. In International workshop on privacy enhancing technologies . Springer, 36–58.
- 3Akbas and Zhao (2017) Esra Akbas and Peixiang Zhao. 2017. Truss-based community search: a truss-equivalence based indexing approach. Proceedings of the VLDB Endowment 10, 11 (2017), 1298–1309.
- 4Barbieri et al . (2015) Nicola Barbieri, Francesco Bonchi, Edoardo Galimberti, and Francesco Gullo. 2015. Efficient and effective community search. Data mining and knowledge discovery 29, 5 (2015), 1406–1433.
- 5Berahmand et al . (2020) Kamal Berahmand, Sogol Haghani, Mehrdad Rostami, and Yuefeng Li. 2020. A new Attributed Graph Clustering by using Label Propagation in Complex Networks. Journal of King Saud University - Computer and Information Sciences (2020).
- 6Berahmand et al . (2022) Kamal Berahmand, Mehrnoush Mohammadi, Azadeh Faroughi, and Rojiar Pir Mohammadiani. 2022. A novel method of spectral clustering in attributed networks by constructing parameter-free affinity matrix. Cluster computing 25-2 (2022).
- 7Berahmand et al . (2021) Kamal Berahmand, Elahe Nasiri, Rojiar Pir mohammadiani, and Yuefeng Li. 2021. Spectral clustering on protein-protein interaction networks via constructing affinity matrix using attributed graph embedding. Computers in Biology and Medicine 138 (2021), 104933. https://doi.org/10.1016/j.compbiomed.2021.104933 · doi ↗
- 8Beutel et al . (2013) Alex Beutel, Wanhong Xu, Venkatesan Guruswami, Christopher Palow, and Christos Faloutsos. 2013. Copy Catch: stopping group attacks by spotting lockstep behavior in social networks. In Proceedings of WWW . 119–130.
