Linking Global Science Funding to Research Publications
Jacob Aarup Dalsgaard, Filipi Nascimento Silva, Jin AI

TL;DR
This paper introduces a large dataset linking global funding organizations to research publications by disambiguating funding acknowledgment strings, enabling detailed analysis of funding flows and research support patterns.
Contribution
It presents a systematic multi-stage pipeline for matching funding acknowledgment strings to standardized organization identifiers, creating a comprehensive and validated dataset.
Findings
Links 1.9 million funding strings to organizations
Achieves high recall and precision in matching
Supports analysis of global research funding patterns
Abstract
Funding acknowledgments in scholarly publications provide large-scale trace data on organizations that support scientific research. We present a dataset for linking global science funding organizations to research publications by systematically disambiguating unique funding acknowledgment strings extracted from publication metadata. Funder names are matched to standardized organizational identifiers using a multi-stage pipeline that combines lexical normalization, similarity-based clustering, rule-based matching, named entity recognition assistance, and manual validation. The resulting dataset links 1.9 million unique funder strings to canonical organization identifiers and records match types and unresolved cases to support transparency. Technical validation includes paper-level comparisons across bibliometric sources and manual verification against full-text acknowledgment sections,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topicsscientometrics and bibliometrics research · Research Data Management Practices · Academic Publishing and Open Access
