Scalable Models for Computing Hierarchies in Information Networks

Baoxu Shi; Tim Weninger

arXiv:1601.00626·cs.AI·January 5, 2016

Scalable Models for Computing Hierarchies in Information Networks

Baoxu Shi, Tim Weninger

PDF

Open Access 1 Repo

TL;DR

This paper introduces HDTM, a scalable Bayesian model that automatically infers meaningful hierarchies from large information networks, including linkages, outperforming previous methods in accuracy and scalability.

Contribution

HDTM is a novel distributed Bayesian model that incorporates network linkages to generate hierarchical structures at Web-scale.

Findings

01

HDTM accurately infers hierarchies on large datasets.

02

It outperforms existing models in scalability and accuracy.

03

Effective on Wikipedia and medium-sized datasets.

Abstract

Information hierarchies are organizational structures that often used to organize and present large and complex information as well as provide a mechanism for effective human navigation. Fortunately, many statistical and computational models exist that automatically generate hierarchies; however, the existing approaches do not consider linkages in information {\em networks} that are increasingly common in real-world scenarios. Current approaches also tend to present topics as an abstract probably distribution over words, etc rather than as tangible nodes from the original network. Furthermore, the statistical techniques present in many previous works are not yet capable of processing data at Web-scale. In this paper we present the Hierarchical Document Topic Model (HDTM), which uses a distributed vertex-programming process to calculate a nonparametric Bayesian generative model.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

nddsg/HDTM
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComplex Network Analysis Techniques · Web Data Mining and Analysis · Text and Document Classification Technologies