Search-Optimized Quantization in Biomedical Ontology Alignment
Oussama Bouaggad, Natalia Grabar

TL;DR
This paper presents a search-optimized quantization method for biomedical ontology alignment using transformer models, achieving significant speed-up and memory reduction while maintaining performance.
Contribution
It introduces a systematic approach combining semantic similarity, search for optimal execution providers, and dynamic quantization for biomedical ontology alignment.
Findings
Achieved a 20x inference speed-up.
Reduced memory usage by approximately 70%.
Set new state-of-the-art results on DEFT 2020 tasks.
Abstract
In the fast-moving world of AI, as organizations and researchers develop more advanced models, they face challenges due to their sheer size and computational demands. Deploying such models on edge devices or in resource-constrained environments adds further challenges related to energy consumption, memory usage and latency. To address these challenges, emerging trends are shaping the future of efficient model optimization techniques. From this premise, by employing supervised state-of-the-art transformer-based models, this research introduces a systematic method for ontology alignment, grounded in cosine-based semantic similarity between a biomedical layman vocabulary and the Unified Medical Language System (UMLS) Metathesaurus. It leverages Microsoft Olive to search for target optimizations among different Execution Providers (EPs) using the ONNX Runtime backend, followed by an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
