Building networks of shared research interests by embedding words into a   representation space

Art Poon

arXiv:2502.07042·cs.SI·February 12, 2025

Building networks of shared research interests by embedding words into a representation space

Art Poon

PDF

Open Access 1 Repo

TL;DR

This paper presents a novel method that uses word embeddings and manifold learning to create a network of researchers based on shared publication topics, facilitating collaboration and discovery of hidden connections.

Contribution

It introduces a workflow combining NLP, UMAP, and Wasserstein distance to map research interests into a visual network, enabling analysis of academic collaborations.

Findings

01

Clusters align with academic divisions

02

Identifies untapped research connections

03

Provides a reproducible Python and R workflow

Abstract

Departments within a university are not only administrative units, but also an effort to gather investigators around common fields of academic study. A pervasive challenge is connecting members with shared research interests both within and between departments. Here I describe a workflow that adapts methods from natural language processing to generate a network connecting $n = 79$ members of a university department, or multiple departments within a faculty ( $n = 278$ ), based on common topics in their research publications. After extracting and processing terms from $n = 16, 901$ abstracts in the PubMed database, the co-occurrence of terms is encoded in a sparse document-term matrix. Based on the angular distances between the presence-absence vectors for every pair of terms, I use the uniform manifold approximation and projection (UMAP) method to embed the terms into a representational space…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

poonlab/tragula
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSemantic Web and Ontologies · Biomedical Text Mining and Ontologies