Hybrid X-Linker: Automated Data Generation and Extreme Multi-label Ranking for Biomedical Entity Linking
Pedro Ruas, Fernando Gallego, Francisco J. Veredas, Francisco M. Couto

TL;DR
This paper introduces the hybrid X-Linker pipeline for automated large-scale biomedical entity linking, leveraging automatically generated data and extreme multi-label ranking to improve performance across multiple datasets.
Contribution
It presents a novel automated data generation approach and a hybrid pipeline that enhances biomedical entity linking without relying on manually labeled data.
Findings
Achieved top-1 accuracies up to 0.9511 on several datasets.
Demonstrated superior performance in three datasets compared to existing methods.
Published source code and data for reproducibility and further research.
Abstract
State-of-the-art deep learning entity linking methods rely on extensive human-labelled data, which is costly to acquire. Current datasets are limited in size, leading to inadequate coverage of biomedical concepts and diminished performance when applied to new data. In this work, we propose to automatically generate data to create large-scale training datasets, which allows the exploration of approaches originally developed for the task of extreme multi-label ranking in the biomedical entity linking task. We propose the hybrid X-Linker pipeline that includes different modules to link disease and chemical entity mentions to concepts in the MEDIC and the CTD-Chemical vocabularies, respectively. X-Linker was evaluated on several biomedical datasets: BC5CDR-Disease, BioRED-Disease, NCBI-Disease, BC5CDR-Chemical, BioRED-Chemical, and NLM-Chem, achieving top-1 accuracies of 0.8307, 0.7969,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiomedical Text Mining and Ontologies · Topic Modeling
