Triangular clustering in document networks
Xue-qi Cheng, Fu-xin Ren, Shi Zhou, Mao-Bin Hu

TL;DR
This paper investigates the high clustering in document networks, revealing a strong link between content similarity and triangle formation, and introduces a model that reproduces these properties.
Contribution
It proposes the DSP model that captures the influence of content similarity on network clustering, advancing understanding of document network structure and evolution.
Findings
High number of triangles and clustering coefficient observed
Strong correlation between triangle formation and content similarity
DSP model effectively reproduces observed network properties
Abstract
Document networks are characteristic in that a document node, e.g. a webpage or an article, carries meaningful content. Properties of document networks are not only affected by topological connectivity between nodes, but also strongly influenced by the semantic relation between content of the nodes. We observe that document networks have a large number of triangles and a high value of clustering coefficient. And there is a strong correlation between the probability of formation of a triangle and the content similarity among the three nodes involved. We propose the degree-similarity product (DSP) model which well reproduces these properties. The model achieves this by using a preferential attachment mechanism which favours the linkage between nodes that are both popular and similar. This work is a step forward towards a better understanding of the structure and evolution of document…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
