# Hierarchical Re-estimation of Topic Models for Measuring Topical   Diversity

**Authors:** Hosein Azarbonyad, Mostafa Dehghani, Tom Kenter, Maarten Marx, and Jaap Kamps, Maarten de Rijke

arXiv: 1701.04273 · 2017-01-17

## TL;DR

This paper introduces a hierarchical re-estimation method for topic models to improve the measurement of topical diversity in documents, addressing issues of generality and impurity in standard models.

## Contribution

It proposes a novel three-level re-estimation approach for topic models that enhances diversity measurement accuracy over existing methods.

## Key findings

- Outperforms state-of-the-art on PubMed dataset
- Reduces impurity and generality in topic models
- Improves interpretability of topics

## Abstract

A high degree of topical diversity is often considered to be an important characteristic of interesting text documents. A recent proposal for measuring topical diversity identifies three elements for assessing diversity: words, topics, and documents as collections of words. Topic models play a central role in this approach. Using standard topic models for measuring diversity of documents is suboptimal due to generality and impurity. General topics only include common information from a background corpus and are assigned to most of the documents in the collection. Impure topics contain words that are not related to the topic; impurity lowers the interpretability of topic models and impure topics are likely to get assigned to documents erroneously. We propose a hierarchical re-estimation approach for topic models to combat generality and impurity; the proposed approach operates at three levels: words, topics, and documents. Our re-estimation approach for measuring documents' topical diversity outperforms the state of the art on PubMed dataset which is commonly used for diversity experiments.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1701.04273/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/1701.04273/full.md

## References

26 references — full list in the complete paper: https://tomesphere.com/paper/1701.04273/full.md

---
Source: https://tomesphere.com/paper/1701.04273