Stick-Breaking Embedded Topic Model with Continuous Optimal Transport for Online Analysis of Document Streams
Federica Granese, Serena Villata, Charles Bouveyron

TL;DR
SB-SETM is an online topic model that dynamically infers the number of active topics and merges topic embeddings using optimal transport, effectively analyzing evolving document streams.
Contribution
The paper introduces SB-SETM, a novel online topic model that combines a stick-breaking process with optimal transport for real-time document stream analysis.
Findings
Outperforms baseline models on simulated data
Effectively captures evolving topics in news articles
Automatically infers the number of active topics
Abstract
Online topic models are unsupervised algorithms to identify latent topics in data streams that continuously evolve over time. Although these methods naturally align with real-world scenarios, they have received considerably less attention from the community compared to their offline counterparts, due to specific additional challenges. To tackle these issues, we present SB-SETM, an innovative model extending the Embedded Topic Model (ETM) to process data streams by merging models formed on successive partial document batches. To this end, SB-SETM (i) leverages a truncated stick-breaking construction for the topic-per-document distribution, enabling the model to automatically infer from the data the appropriate number of active topics at each timestep; and (ii) introduces a merging strategy for topic embeddings based on a continuous formulation of optimal transport adapted to the high…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComplex Network Analysis Techniques · Topic Modeling · Advanced Graph Neural Networks
