Explainable Point-Based Document Visualizations
Primo\v{z} Godec, Nikola {\DH}uki\'c, Ajda Pretnar, Vesna Tanko, Lan, \v{Z}agar, Bla\v{z} Zupan

TL;DR
This paper explores using keyword extraction methods, especially YAKE!, to label clusters in point-based document visualizations like t-SNE and UMAP, enhancing interpretability of data maps.
Contribution
It introduces a novel approach of applying keyword extraction to label document clusters in visualizations, comparing multiple methods and highlighting YAKE!'s effectiveness.
Findings
YAKE! outperformed other keyword extraction methods
TF-IDF was more effective than graph and embedding-based techniques
Keyword labeling improves interpretability of data maps
Abstract
Two-dimensional data maps can visually reveal information about the relations between data instances. Popular techniques to construct data maps are t-SNE and UMAP. The resulting point-based visualizations, though, provide information only through their interpretation. We here consider a set of abstracts from the articles on longevity to argue for using keyword extraction methods to label clusters of documents in the map. Among the considered approaches, the best results were obtained by recently proposed YAKE!. Surprisingly, a classical TF-IDF term ranking outperformed graph and embedding-based techniques.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Text Analysis Techniques · Data Visualization and Analytics · Time Series Analysis and Forecasting
