Large-Scale Evaluation of Topic Models and Dimensionality Reduction Methods for 2D Text Spatialization
Daniel Atzberger, Tim Cech, Willy Scheibel, Matthias Trapp, Rico, Richter, J\"urgen D\"ollner, Tobias Schreck

TL;DR
This study conducts a comprehensive large-scale benchmark to evaluate how different combinations of topic models and dimensionality reduction techniques affect the quality of 2D text spatializations, providing guidelines for effective visualization.
Contribution
It presents the first extensive benchmark analyzing various topic model and dimensionality reduction combinations for text visualization quality.
Findings
Interpretable topic models improve semantic structure capture.
t-SNE is recommended for effective dimensionality reduction.
Guidelines for designing high-quality text spatializations.
Abstract
Topic models are a class of unsupervised learning algorithms for detecting the semantic structure within a text corpus. Together with a subsequent dimensionality reduction algorithm, topic models can be used for deriving spatializations for text corpora as two-dimensional scatter plots, reflecting semantic similarity between the documents and supporting corpus analysis. Although the choice of the topic model, the dimensionality reduction, and their underlying hyperparameters significantly impact the resulting layout, it is unknown which particular combinations result in high-quality layouts with respect to accuracy and perception metrics. To investigate the effectiveness of topic models and dimensionality reduction methods for the spatialization of corpora as two-dimensional scatter plots (or basis for landscape-type visualizations), we present a large-scale, benchmark-based…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGeographic Information Systems Studies · Image Retrieval and Classification Techniques · Data Visualization and Analytics
