Probing the statistical properties of enriched co-occurrence networks
Diego R. Amancio, Jeaneth Machicao, Laura V. C. Quispe

TL;DR
This paper investigates how adding semantic edges to co-occurrence networks affects their statistical properties and their ability to distinguish meaningful texts, providing insights into metric sensitivity and network enrichment effects.
Contribution
It offers a detailed analysis of the impact of virtual edges on network metrics, highlighting their varying effects on different statistical properties and applications.
Findings
Average shortest path and closeness centrality improve with virtual edges in short texts.
Clustering coefficient's informativeness decreases as more virtual edges are added.
Including stopwords influences the statistical properties of enriched networks.
Abstract
Recent studies have explored the addition of virtual edges to word co-occurrence networks using word embeddings to enhance graph representations, particularly for short texts. While these enriched networks have demonstrated some success, the impact of incorporating semantic edges into traditional co-occurrence networks remains uncertain. This study investigates two key statistical properties of text-based network models. First, we assess whether network metrics can effectively distinguish between meaningless and meaningful texts. Second, we analyze whether these metrics are more sensitive to syntactic or semantic aspects of the text. Our results show that incorporating virtual edges can have positive and negative effects, depending on the specific network metric. For instance, the informativeness of the average shortest path and closeness centrality improves in short texts, while the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComplex Network Analysis Techniques · Advanced Clustering Algorithms Research · Bayesian Modeling and Causal Inference
