Mapping scientific communities at scale
Victor Barbier, Eric Jeangirard

TL;DR
This paper presents a comprehensive, scalable framework for mapping and analyzing large-scale scientific communities using advanced data integration, network analysis, and visualization techniques, with applications in science policy and research strategy.
Contribution
It introduces an innovative methodology combining multiple tools and algorithms for detailed, scalable mapping of scientific communities from large bibliometric datasets.
Findings
Effective at national scale for exploring research collaborations.
Enables thematic and community detection with high accuracy.
Tools are open-source and accessible for integration and further research.
Abstract
This study introduces a novel methodology for mapping scientific communities at scale, addressing challenges associated with network analysis in large bibliometric datasets. By leveraging enriched publication metadata from the French research portal scanR and applying advanced filtering techniques to prioritize the strongest interactions between entities, we construct detailed, scalable network maps. These maps are enhanced through systematic disambiguation of authors, affiliations, and topics using persistent identifiers and specialized algorithms. The proposed framework integrates Elasticsearch for efficient data aggregation, Graphology for network spatialization (Force Atltas2) and community detection (Louvain algorithm) and VOSviewer for network vizualization. A Large Language Model (Mistral Nemo) is used to label the communities detected and OpenAlex data helps to enrich the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComplex Network Analysis Techniques · Bioinformatics and Genomic Networks · Web visibility and informetrics
