Ontology Enrichment from Texts: A Biomedical Dataset for Concept Discovery and Placement
Hang Dong, Jiaoyan Chen, Yuan He, Ian Horrocks

TL;DR
This paper introduces a new biomedical dataset benchmark for automated discovery and placement of complex concepts in ontologies, addressing limitations of existing datasets by including context and out-of-KB mention detection.
Contribution
It adapts the MedMentions dataset with SNOMED CT versions to support out-of-KB mention discovery and complex concept placement, enabling improved biomedical ontology enrichment.
Findings
Demonstrates the dataset's utility for out-of-KB mention detection
Provides baseline evaluations with LLM-based methods
Enhances support for complex concept placement in biomedical ontologies
Abstract
Mentions of new concepts appear regularly in texts and require automated approaches to harvest and place them into Knowledge Bases (KB), e.g., ontologies and taxonomies. Existing datasets suffer from three issues, (i) mostly assuming that a new concept is pre-discovered and cannot support out-of-KB mention discovery; (ii) only using the concept label as the input along with the KB and thus lacking the contexts of a concept label; and (iii) mostly focusing on concept placement w.r.t a taxonomy of atomic concepts, instead of complex concepts, i.e., with logical operators. To address these issues, we propose a new benchmark, adapting MedMentions dataset (PubMed abstracts) with SNOMED CT versions in 2014 and 2017 under the Diseases sub-category and the broader categories of Clinical finding, Procedure, and Pharmaceutical / biologic product. We provide usage on the evaluation with the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiomedical Text Mining and Ontologies · Topic Modeling · Natural Language Processing Techniques
