SynCABEL: Synthetic Contextualized Augmentation for Biomedical Entity Linking

Adam Remaki; Christel G\'erardin; Eul\`alia Farr\'e-Maduell; Martin Krallinger; Xavier Tannier

arXiv:2601.19667·cs.CL·May 19, 2026

SynCABEL: Synthetic Contextualized Augmentation for Biomedical Entity Linking

Adam Remaki, Christel G\'erardin, Eul\`alia Farr\'e-Maduell, Martin Krallinger, Xavier Tannier

PDF

2 Repos

TL;DR

SynCABEL uses large language models to generate synthetic, context-rich training data for biomedical entity linking, achieving state-of-the-art results with less manual annotation.

Contribution

It introduces a novel synthetic data generation framework that reduces the need for expert-annotated data in biomedical entity linking tasks.

Findings

01

Achieves new state-of-the-art on multilingual biomedical benchmarks.

02

Reduces annotation effort by up to 60%.

03

Improves clinically valid prediction rates.

Abstract

We present SynCABEL (Synthetic Contextualized Augmentation for Biomedical Entity Linking), a framework that addresses a central bottleneck in supervised biomedical entity linking (BEL): the scarcity of expert-annotated training data. SynCABEL leverages large language models to generate context-rich synthetic training examples for all candidate concepts in a target knowledge base, providing broad supervision without manual annotation. We demonstrate that SynCABEL, when combined with decoder-only models and guided inference, establishes new state-of-the-art results across three widely used multilingual benchmarks: MedMentions for English, QUAERO for French, and SPACCC for Spanish. Evaluating data efficiency, we show that SynCABEL reaches the performance of full human supervision using up to 60% less annotated data, substantially reducing reliance on labor-intensive and costly expert…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.