Multi-view and Multi-source Transfers in Neural Topic Modeling with Pretrained Topic and Word Embeddings
Pankaj Gupta, Yatin Chaudhary, Hinrich Sch\"utze

TL;DR
This paper introduces a transfer learning framework for neural topic modeling that leverages pre-trained latent topics and word embeddings from source corpora to improve topic quality and handle data sparsity in target domains.
Contribution
It proposes a novel method to transfer latent topics and word representations from source to target, enhancing neural topic models especially for sparse or short texts.
Findings
Achieved state-of-the-art results on multiple datasets.
Improved topic coherence and interpretability.
Enhanced generalization in various text domains.
Abstract
Though word embeddings and topics are complementary representations, several past works have only used pre-trained word embeddings in (neural) topic modeling to address data sparsity problem in short text or small collection of documents. However, no prior work has employed (pre-trained latent) topics in transfer learning paradigm. In this paper, we propose an approach to (1) perform knowledge transfer using latent topics obtained from a large source corpus, and (2) jointly transfer knowledge via the two representations (or views) in neural topic modeling to improve topic quality, better deal with polysemy and data sparsity issues in a target corpus. In doing so, we first accumulate topics and word representations from one or many source corpora to build a pool of topics and word vectors. Then, we identify one or multiple relevant source domain(s) and take advantage of corresponding…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Computational and Text Analysis Methods · Advanced Text Analysis Techniques
MethodsInterpretability
