The Geometric Structure of Topic Models
Johannes Hirth, Tom Hanika

TL;DR
This paper introduces a geometric method to analyze and visualize topic models in higher dimensions, revealing conceptual relationships and hierarchies without artificial artifacts, demonstrated on scientific paper data.
Contribution
It proposes an incidence-geometric approach for deriving ordinal structures from flat topic models, enabling higher-dimensional analysis and conceptual hierarchy visualization.
Findings
New geometric visualization paradigm for topic hierarchies
Application to scientific paper corpus from machine learning venues
Effective extraction of conceptual relationships between topics
Abstract
Topic models are a popular tool for clustering and analyzing textual data. They allow texts to be classified on the basis of their affiliation to the previously calculated topics. Despite their widespread use in research and application, an in-depth analysis of topic models is still an open research topic. State-of-the-art methods for interpreting topic models are based on simple visualizations, such as similarity matrices, top-term lists or embeddings, which are limited to a maximum of three dimensions. In this paper, we propose an incidence-geometric method for deriving an ordinal structure from flat topic models, such as non-negative matrix factorization. These enable the analysis of the topic model in a higher (order) dimension and the possibility of extracting conceptual relationships between several topics at once. Due to the use of conceptual scaling, our approach does not…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Management and Algorithms · Data Visualization and Analytics · Geographic Information Systems Studies
