Use of diverse data sources to control which topics emerge in a science map
Juan Pablo Bascur, Rodrigo Costas, Suzan Verberne

TL;DR
This paper investigates how using various data sources to construct document networks can influence the topics that emerge in science maps, enabling tailored visualizations for different research needs.
Contribution
It demonstrates that diverse data sources can effectively control topic bias in science maps, allowing customization for specific thematic focuses.
Findings
Different data sources favor different topic categories.
Diverse data sources can significantly alter the clustering outcomes.
Geographical entities strongly influence topic emergence in author-based maps.
Abstract
Traditional science maps visualize topics by clustering documents within a network, but they are inherently biased toward clustering certain topics over others. If these topics could be chosen, then the science maps could be tailored for different needs. In this paper, we explore the extent to which the topic bias of a science map can be changed by choosing different data sources to build the document network. We analyze this by evaluating the clustering effectiveness of several topic categories over two sources that are traditionally used for the creation of science maps (citations and text similarity) and six non-traditional data sources, which we found favor different kinds of topics: Health issues for Facebook users, biotechnology topics for patent families, government and social issues for policy documents, food topics for Twitter conversations, nursing topics for Twitter users,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsResearch Data Management Practices
