Scalable Models for Computing Hierarchies in Information Networks
Baoxu Shi, Tim Weninger

TL;DR
This paper introduces HDTM, a scalable Bayesian model that automatically infers meaningful hierarchies from large information networks, including linkages, outperforming previous methods in accuracy and scalability.
Contribution
HDTM is a novel distributed Bayesian model that incorporates network linkages to generate hierarchical structures at Web-scale.
Findings
HDTM accurately infers hierarchies on large datasets.
It outperforms existing models in scalability and accuracy.
Effective on Wikipedia and medium-sized datasets.
Abstract
Information hierarchies are organizational structures that often used to organize and present large and complex information as well as provide a mechanism for effective human navigation. Fortunately, many statistical and computational models exist that automatically generate hierarchies; however, the existing approaches do not consider linkages in information {\em networks} that are increasingly common in real-world scenarios. Current approaches also tend to present topics as an abstract probably distribution over words, etc rather than as tangible nodes from the original network. Furthermore, the statistical techniques present in many previous works are not yet capable of processing data at Web-scale. In this paper we present the Hierarchical Document Topic Model (HDTM), which uses a distributed vertex-programming process to calculate a nonparametric Bayesian generative model.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComplex Network Analysis Techniques · Web Data Mining and Analysis · Text and Document Classification Technologies
