Explainable Point-Based Document Visualizations

Primo\v{z} Godec; Nikola {\DH}uki\'c; Ajda Pretnar; Vesna Tanko; Lan; \v{Z}agar; Bla\v{z} Zupan

arXiv:2110.00462·cs.IR·October 4, 2021·1 cites

Explainable Point-Based Document Visualizations

Primo\v{z} Godec, Nikola {\DH}uki\'c, Ajda Pretnar, Vesna Tanko, Lan, \v{Z}agar, Bla\v{z} Zupan

PDF

Open Access

TL;DR

This paper explores using keyword extraction methods, especially YAKE!, to label clusters in point-based document visualizations like t-SNE and UMAP, enhancing interpretability of data maps.

Contribution

It introduces a novel approach of applying keyword extraction to label document clusters in visualizations, comparing multiple methods and highlighting YAKE!'s effectiveness.

Findings

01

YAKE! outperformed other keyword extraction methods

02

TF-IDF was more effective than graph and embedding-based techniques

03

Keyword labeling improves interpretability of data maps

Abstract

Two-dimensional data maps can visually reveal information about the relations between data instances. Popular techniques to construct data maps are t-SNE and UMAP. The resulting point-based visualizations, though, provide information only through their interpretation. We here consider a set of abstracts from the articles on longevity to argue for using keyword extraction methods to label clusters of documents in the map. Among the considered approaches, the best results were obtained by recently proposed YAKE!. Surprisingly, a classical TF-IDF term ranking outperformed graph and embedding-based techniques.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Text Analysis Techniques · Data Visualization and Analytics · Time Series Analysis and Forecasting