# From Community Detection to Community Profiling

**Authors:** Hongyun Cai, and Vincent W. Zheng, and Fanwei Zhu, and Kevin, Chen-Chuan Chang, and Zi Huang

arXiv: 1701.04528 · 2017-01-18

## TL;DR

This paper introduces a novel community profiling framework that characterizes communities by internal content and external diffusion profiles, addressing key challenges with a scalable joint model and demonstrating superior performance on large datasets.

## Contribution

It formalizes the concept of community profiling, proposes a joint model for profiling and detection, and develops a scalable inference algorithm for large-scale data.

## Key findings

- CPD outperforms state-of-the-art baselines in community profiling tasks.
- The inference algorithm scales linearly with data size and is parallelizable.
- Community profiles effectively capture both internal content and external diffusion characteristics.

## Abstract

Most existing community-related studies focus on detection, which aim to find the community membership for each user from user friendship links. However, membership alone, without a complete profile of what a community is and how it interacts with other communities, has limited applications. This motivates us to consider systematically profiling the communities and thereby developing useful community-level applications. In this paper, we for the first time formalize the concept of community profiling. With rich user information on the network, such as user published content and user diffusion links, we characterize a community in terms of both its internal content profile and external diffusion profile. The difficulty of community profiling is often underestimated. We novelly identify three unique challenges and propose a joint Community Profiling and Detection (CPD) model to address them accordingly. We also contribute a scalable inference algorithm, which scales linearly with the data size and it is easily parallelizable. We evaluate CPD on large-scale real-world data sets, and show that it is significantly better than the state-of-the-art baselines in various tasks.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1701.04528/full.md

## Figures

49 figures with captions in the complete paper: https://tomesphere.com/paper/1701.04528/full.md

## References

43 references — full list in the complete paper: https://tomesphere.com/paper/1701.04528/full.md

---
Source: https://tomesphere.com/paper/1701.04528