A Large-Scale Sensitivity Analysis on Latent Embeddings and Dimensionality Reductions for Text Spatializations
Daniel Atzberger, Tim Cech, Willy Scheibel, J\"urgen D\"ollner,, Michael Behrisch, Tobias Schreck

TL;DR
This study systematically evaluates how variations in data, hyperparameters, and randomness affect the stability of text visualization layouts created through dimensionality reduction, providing guidelines for more reliable visual analysis.
Contribution
It offers a comprehensive sensitivity analysis of text layout stability across multiple corpora, embeddings, and hyperparameters, which was previously underexplored.
Findings
Identified key hyperparameters influencing layout stability
Quantified layout similarity across different settings
Provided practical guidelines for stable text visualization
Abstract
The semantic similarity between documents of a text corpus can be visualized using map-like metaphors based on two-dimensional scatterplot layouts. These layouts result from a dimensionality reduction on the document-term matrix or a representation within a latent embedding, including topic models. Thereby, the resulting layout depends on the input data and hyperparameters of the dimensionality reduction and is therefore affected by changes in them. Furthermore, the resulting layout is affected by changes in the input data and hyperparameters of the dimensionality reduction. However, such changes to the layout require additional cognitive efforts from the user. In this work, we present a sensitivity study that analyzes the stability of these layouts concerning (1) changes in the text corpora, (2) changes in the hyperparameter, and (3) randomness in the initialization. Our approach has…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling
