Building networks of shared research interests by embedding words into a representation space
Art Poon

TL;DR
This paper presents a novel method that uses word embeddings and manifold learning to create a network of researchers based on shared publication topics, facilitating collaboration and discovery of hidden connections.
Contribution
It introduces a workflow combining NLP, UMAP, and Wasserstein distance to map research interests into a visual network, enabling analysis of academic collaborations.
Findings
Clusters align with academic divisions
Identifies untapped research connections
Provides a reproducible Python and R workflow
Abstract
Departments within a university are not only administrative units, but also an effort to gather investigators around common fields of academic study. A pervasive challenge is connecting members with shared research interests both within and between departments. Here I describe a workflow that adapts methods from natural language processing to generate a network connecting members of a university department, or multiple departments within a faculty (), based on common topics in their research publications. After extracting and processing terms from abstracts in the PubMed database, the co-occurrence of terms is encoded in a sparse document-term matrix. Based on the angular distances between the presence-absence vectors for every pair of terms, I use the uniform manifold approximation and projection (UMAP) method to embed the terms into a representational space…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies · Biomedical Text Mining and Ontologies
