Discovering topic structures of a temporally evolving document corpus
Adham Beykikhoshk, Ognjen Arandjelovic, Dinh Phung, Svetha, Venkatesh

TL;DR
This paper introduces a new framework for discovering and tracking evolving topics in temporal document corpora without restrictive assumptions, demonstrated on medical literature related to ASD and MetS.
Contribution
The novel framework combines epoch-wise topic discovery with a temporal similarity graph, enabling modeling of complex topic dynamics without prior rate assumptions.
Findings
Effectively captures emergence, disappearance, evolution, splitting, and merging of topics.
Demonstrates strong performance on ASD and MetS medical corpora.
Provides detailed empirical analysis and qualitative case studies.
Abstract
In this paper we describe a novel framework for the discovery of the topical content of a data corpus, and the tracking of its complex structural changes across the temporal dimension. In contrast to previous work our model does not impose a prior on the rate at which documents are added to the corpus nor does it adopt the Markovian assumption which overly restricts the type of changes that the model can capture. Our key technical contribution is a framework based on (i) discretization of time into epochs, (ii) epoch-wise topic discovery using a hierarchical Dirichlet process-based model, and (iii) a temporal similarity graph which allows for the modelling of complex topic changes: emergence and disappearance, evolution, splitting, and merging. The power of the proposed framework is demonstrated on two medical literature corpora concerned with the autism spectrum disorder (ASD) and the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Biomedical Text Mining and Ontologies · Natural Language Processing Techniques
