LongBEL: Long-Context and Document-Consistent Biomedical Entity Linking
Adam Remaki, Xavier Tannier, Christel G\'erardin

TL;DR
LongBEL is a novel document-level biomedical entity linking framework that leverages full-document context and memory of previous predictions to improve consistency and accuracy across multiple languages.
Contribution
It introduces a generative, memory-augmented approach trained with cross-validated predictions to enhance document-level consistency in biomedical entity linking.
Findings
LongBEL outperforms sentence-level baselines on five biomedical benchmarks.
Largest improvements are on datasets with frequent concept recurrence.
Ensemble methods yield the best overall performance.
Abstract
Biomedical entity linking maps textual mentions to concepts in structured knowledge bases such as UMLS or SNOMED CT. Most existing systems link each mention independently, using only the mention or its surrounding sentence. This ignores dependencies between mentions in the same document and can lead to inconsistent predictions, especially when the same concept appears under different surface forms. We introduce LongBEL, a document-level generative framework that combines full-document context with a memory of previous predictions. To make this memory robust, LongBEL is trained with cross-validated predictions rather than gold labels, reducing the mismatch between training and inference and limiting cascading errors. Experiments on five biomedical benchmarks across English, French, and Spanish show that LongBEL improves over sentence-level generative baselines, with the largest gains on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
