Handling Collocations in Hierarchical Latent Tree Analysis for Topic   Modeling

Leonard K. M. Poon; Nevin L. Zhang; Haoran Xie; Gary Cheng

arXiv:2007.05163·cs.CL·July 13, 2020

Handling Collocations in Hierarchical Latent Tree Analysis for Topic Modeling

Leonard K. M. Poon, Nevin L. Zhang, Haoran Xie, Gary Cheng

PDF

Open Access

TL;DR

This paper introduces a method to improve hierarchical latent tree analysis for topic modeling by incorporating collocations, which enhances the model's ability to handle multiword expressions and improves performance on multiple datasets.

Contribution

The paper proposes a collocation extraction and replacement method as a preprocessing step for HLTA, addressing its limitation in representing multiword expressions.

Findings

01

Improved HLTA performance on three out of four datasets

02

Effective collocation extraction and replacement method

03

Enhanced representation of multiword expressions in topic models

Abstract

Topic modeling has been one of the most active research areas in machine learning in recent years. Hierarchical latent tree analysis (HLTA) has been recently proposed for hierarchical topic modeling and has shown superior performance over state-of-the-art methods. However, the models used in HLTA have a tree structure and cannot represent the different meanings of multiword expressions sharing the same word appropriately. Therefore, we propose a method for extracting and selecting collocations as a preprocessing step for HLTA. The selected collocations are replaced with single tokens in the bag-of-words model before running HLTA. Our empirical evaluation shows that the proposed method led to better performance of HLTA on three of the four data sets tested.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Advanced Text Analysis Techniques · Data Mining Algorithms and Applications