Scalable Text and Link Analysis with Mixed-Topic Link Models

Yaojia Zhu; Xiaoran Yan; Lise Getoor; Cristopher Moore

arXiv:1303.7264·cs.LG·October 30, 2014

Scalable Text and Link Analysis with Mixed-Topic Link Models

Yaojia Zhu, Xiaoran Yan, Lise Getoor, Cristopher Moore

PDF

TL;DR

This paper introduces a scalable mixed-topic link model that combines topic modeling with community detection, enabling efficient analysis of large text-link datasets for classification and prediction tasks.

Contribution

It presents a novel, scalable model integrating topic and link analysis with an EM algorithm, outperforming existing methods on large datasets.

Findings

01

Outperforms state-of-the-art methods in link prediction and topic classification.

02

Achieves high accuracy with significantly less computation.

03

Successfully analyzes a dataset with 1.3 million words and 44,000 links in minutes.

Abstract

Many data sets contain rich information about objects, as well as pairwise relations between them. For instance, in networks of websites, scientific papers, and other documents, each node has content consisting of a collection of words, as well as hyperlinks or citations to other nodes. In order to perform inference on such data sets, and make predictions and recommendations, it is useful to have models that are able to capture the processes which generate the text at each node and the links between them. In this paper, we combine classic ideas in topic modeling with a variant of the mixed-membership block model recently developed in the statistical physics community. The resulting model has the advantage that its parameters, including the mixture of topics of each document and the resulting overlapping communities, can be inferred with a simple and scalable expectation-maximization…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.