TriTopic: Tri-Modal Graph-Based Topic Modeling with Iterative Refinement and Archetypes
Roman Egger

TL;DR
TriTopic introduces a tri-modal graph-based framework for topic modeling that enhances stability, lexical precision, and multi-perspective analysis, outperforming existing methods across multiple datasets.
Contribution
It presents a novel tri-modal graph approach with iterative refinement and archetype-based representations, addressing key limitations of prior topic modeling techniques.
Findings
Achieves highest NMI scores across all tested datasets
Ensures 100% corpus coverage with no outliers
Outperforms existing methods like BERTopic, NMF, and LDA
Abstract
Topic modeling extracts latent themes from large text collections, but leading approaches like BERTopic face critical limitations: stochastic instability, loss of lexical precision ("Embedding Blur"), and reliance on a single data perspective. We present TriTopic, a framework that addresses these weaknesses through a tri-modal graph fusing semantic embeddings, TF-IDF, and metadata. Three core innovations drive its performance: hybrid graph construction via Mutual kNN and Shared Nearest Neighbors to eliminate noise and combat the curse of dimensionality; Consensus Leiden Clustering for reproducible, stable partitions; and Iterative Refinement that sharpens embeddings through dynamic centroid-pulling. TriTopic also replaces the "average document" concept with archetype-based topic representations defined by boundary cases rather than centers alone. In benchmarks across 20 Newsgroups,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational and Text Analysis Methods · Topic Modeling · Sentiment Analysis and Opinion Mining
