Cross-lingual Comparison of Research Funding Projects with Multilingual Sentence-BERT: Evidence from KAKENHI, NIH, NSF, and UKRI

Miki Kimura-Ida

arXiv:2604.27315·cs.DL·May 1, 2026

Cross-lingual Comparison of Research Funding Projects with Multilingual Sentence-BERT: Evidence from KAKENHI, NIH, NSF, and UKRI

Miki Kimura-Ida

PDF

TL;DR

This study evaluates how multilingual Sentence-BERT embeddings can facilitate cross-lingual comparison of research funding projects, especially between Japanese and English descriptions, highlighting both capabilities and limitations.

Contribution

It demonstrates the effectiveness of multilingual embeddings in aligning Japanese and English research project descriptions and assesses translation effects on semantic similarity.

Findings

01

Japanese and translated English project representations are closer than to native English projects.

02

Limited overlap (average 2.9 out of 10) in nearest neighbors indicates translation impacts local structure.

03

Multilingual embeddings support large-scale cross-national project comparison but are affected by language differences.

Abstract

Cross-national comparison of research funding projects is increasingly important for science policy and strategic planning, but language differences remain a major obstacle. In particular, KAKENHI project descriptions are written primarily in Japanese, whereas projects from major overseas funding agencies, such as NSF, NIH, and UKRI, are documented in English. This study investigates whether multilingual sentence embeddings can support meaningful cross-lingual comparison of research funding projects, with particular attention to the semantic effects of translating Japanese texts into English. For each KAKENHI project, we construct two representations: the original Japanese text and its machine-translated English version, both embedded in a shared semantic space using a multilingual Sentence-BERT model. We then compare their distances and nearest-neighbor relationships with respect to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.