Efficient Community Detection in Large Networks using Content and Links
Yiye Ruan, David Fuhry, Srinivasan Parthasarathy

TL;DR
This paper presents a simple, efficient method for community detection in large networks by combining content similarity with link analysis, effectively reducing noise and improving clustering performance.
Contribution
It introduces a novel fusion of content and link information, along with a biased sampling technique, to enhance community detection in large-scale networks.
Findings
Effective in noisy real-world networks like Flickr and Wikipedia
Significantly faster than existing methods, with comparable or better accuracy
Consistently improves community detection quality by integrating content and link data
Abstract
In this paper we discuss a very simple approach of combining content and link information in graph structures for the purpose of community discovery, a fundamental task in network analysis. Our approach hinges on the basic intuition that many networks contain noise in the link structure and that content information can help strengthen the community signal. This enables ones to eliminate the impact of noise (false positives and false negatives), which is particularly prevalent in online social networks and Web-scale information networks. Specifically we introduce a measure of signal strength between two nodes in the network by fusing their link strength with content similarity. Link strength is estimated based on whether the link is likely (with high probability) to reside within a community. Content similarity is estimated through cosine similarity or Jaccard coefficient. We discuss a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComplex Network Analysis Techniques · Advanced Graph Neural Networks · Peer-to-Peer Network Technologies
