TL;DR
This paper introduces a new cross-lingual biomedical entity linking benchmark across 10 languages, investigates knowledge transfer from resource-rich to resource-poor languages, and proposes methods that improve performance without in-domain data.
Contribution
It establishes the XL-BEL benchmark, analyzes the limitations of existing models, and proposes novel transfer methods leveraging general-domain bitext for resource-efficient knowledge transfer.
Findings
Significant performance gaps between English and other languages in biomedical entity linking.
Cross-lingual transfer methods improve results up to 20 Precision@1 points.
Domain-specific transfer methods work effectively without in-domain data.
Abstract
Injecting external domain-specific knowledge (e.g., UMLS) into pretrained language models (LMs) advances their capability to handle specialised in-domain tasks such as biomedical entity linking (BEL). However, such abundant expert knowledge is available only for a handful of languages (e.g., English). In this work, by proposing a novel cross-lingual biomedical entity linking task (XL-BEL) and establishing a new XL-BEL benchmark spanning 10 typologically diverse languages, we first investigate the ability of standard knowledge-agnostic as well as knowledge-enhanced monolingual and multilingual LMs beyond the standard monolingual English BEL task. The scores indicate large gaps to English performance. We then address the challenge of transferring domain-specific knowledge in resource-rich languages to resource-poor ones. To this end, we propose and evaluate a series of cross-lingual…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
