Towards Generalising Neural Topical Representations

Xiaohao Yang; He Zhao; Dinh Phung; Lan Du

arXiv:2307.12564·cs.CL·June 14, 2024·1 cites

Towards Generalising Neural Topical Representations

Xiaohao Yang, He Zhao, Dinh Phung, Lan Du

PDF

Open Access 1 Repo

TL;DR

This paper proposes a novel framework that enhances neural topic models by minimizing semantic distance between similar documents using TopicalOT, significantly improving their ability to generalize across different corpora.

Contribution

The work introduces a plug-and-play module for NTMs that leverages text data augmentation and TopicalOT to improve cross-corpus generalization of topical representations.

Findings

01

Significant improvement in cross-corpus generalization of NTMs.

02

Framework is compatible with most existing NTMs.

03

Demonstrated effectiveness through extensive experiments.

Abstract

Topic models have evolved from conventional Bayesian probabilistic models to recent Neural Topic Models (NTMs). Although NTMs have shown promising performance when trained and tested on a specific corpus, their generalisation ability across corpora has yet to be studied. In practice, we often expect that an NTM trained on a source corpus can still produce quality topical representation (i.e., latent distribution over topics) for the document from different target corpora to a certain degree. In this work, we aim to improve NTMs further so that their representation power for documents generalises reliably across corpora and tasks. To do so, we propose to enhance NTMs by narrowing the semantic distance between similar documents, with the underlying assumption that documents from different corpora may share similar semantics. Specifically, we obtain a similar document for each training…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

xiaohao-yang/topic_model_generalisation
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Computational and Text Analysis Methods · Advanced Text Analysis Techniques